Difference between revisions of "PTP/designs/remote/sync"

From Eclipsepedia

< PTP‎ | designs‎ | remote
Jump to: navigation, search
(Advantages of GIT)
Line 93: Line 93:
* [http://wiki.eclipse.org/TM_and_RSE_FAQ#How_can_I_use_a_remote_workspace_over_SSH.3F Use rsync to sync the remote workspace to the local machine] FAQ including disadvantages and advantages of using Rsync for a remote project
* [http://wiki.eclipse.org/TM_and_RSE_FAQ#How_can_I_use_a_remote_workspace_over_SSH.3F Use rsync to sync the remote workspace to the local machine] FAQ including disadvantages and advantages of using Rsync for a remote project
=== Phortran ===
=== Photran (Support for Fortran) ===
* [http://wiki.eclipse.org/PTP/photran/rsync_remote_projects PTP/photran/rsync remote projects] wiki page describing the working Photran test implementation
* [http://wiki.eclipse.org/PTP/photran/rsync_remote_projects PTP/photran/rsync remote projects] wiki page describing the working Photran test implementation
* [https://bugs.eclipse.org/bugs/show_bug.cgi?id=313194 Rsync-style Remote Project Support for Photran]
* [https://bugs.eclipse.org/bugs/show_bug.cgi?id=313194 Rsync-style Remote Project Support for Photran]

Latest revision as of 10:25, 7 June 2011


[edit] Introduction

This document attempts to describe the current state of the ideas and designs for a synchronization based file access for remote projects. This document has been started by Roland Schulz based on discussions on bug 316709.

[edit] Rationale

Remote project support in Eclipse (i.e. projects where the source code is located remotely, and the executable is built and run remotely) is an essential requirement for science and engineering applications. Currently, options for using remote projects in Eclipse are:

  • Network filesystems
  • Revision control systems
  • Remote services

The problems with these approaches are discussed below. The approach being proposed has different disadvantages (see below). Thus this approach is not meant to replace the current RDT approach but offer an alternative. This will allow the user to choose the approach with those advantages best aligned to his working environment. Also this approach will reuse parts from RDT (e.g. scanner/indexer) from RDT.

[edit] Network filesystems

Using a network filesystem, such as NFS, to trick Eclipse into thinking the project is local suffers from a number of problems. The indexing and advanced editing features of CDT require accessing all source files in the project. These features already suffer performance related issues when accessing local files (particularly for large projects). When using a network filesystem, they can become unacceptably slow. In addition, network filesystems typically require system administration access to configure locally (although systems such as FUSE can be used to circumvent this) and on the remote system. This technique also requires a continuous network connection in order to access the project.

[edit] Revision control systems

Eclipse provides exemplary integration with a variety of RCS's, including CVS, SVN, and git. One approach to remote projects is to use the features of a RCS, to synchronize with a remote copy of the project. This has the nice feature of already being well integrated with Eclipse, but activities such as building and launching do not support this model since they need to take place on the remote system. In addition, there are other issue with scanner discovery and the remote environment that will be discussed in more detail below.

[edit] Remote services

The Remote Development Tools (RDT) takes the approach of identifying the range of services required for C/C++ projects and creating remote implementations of these. These services can be broken down into the following categories:

  • file access (for editing)
  • managed build
  • make build
  • indexing
  • model builder (e.g. outline view)
  • call hierarchy
  • type hierarchy
  • content assist
  • include browser
  • navigation (e.g. open declaration)
  • search

RDT currently supports either RemoteTools and RSE for providing remote services. Both Remote Tools and RSE use a remote DStore server to provide remote implementations of most of these services. File access is provided using the EFS abstraction in Eclipse. EFS services are provided either via the DStore server for RSE, or a separate SFTP service for Remote Tools (for historical reasons).

This approach has several disadvantages:

  • Although caching can be used in many cases, responsiveness of UI is determined by network speed.
  • A continuous network connection is required for project access
  • A large number of users could overload the remote system with indexing
  • Not all CDT functions can be supported using this model (e.g. refactoring)
  • Remote search currently only works with RSE
  • Not all Eclipse functionality fully supports EFS (e.g. Team services)
  • Only CDT (i.e. C and C++) is supported, other languages, such as Fortran, would require a significant engineering effort

[edit] Responsiveness

The Eclipse core and CDT are doing most of the file operations in the main thread (based on the assumtion that all file operations are low latency). This causes a responsiveness problems with a remote file system.

Because the file operations are in the main thread they block the GUI until the IO operation finishes and thus preventing the user to continue the work while the IO operation is running. It also often prevents IO operations which could run in parallel to do so. See Bugs 160353, 177994, 195997, 218387, 219169 and wiki.eclipse.org/TM_and_RSE_FAQ from the RSE team regarding the same problem for RSE. Their seems to be no work-around for this problem. While it seems in theory to be possible to improve it somewhat by using Display.readAndDispatch, it is not advised and has been removed from RSE (160353). Having a responsive UI is considered by many extremely important thus this is an important point.

It is very unlikely, at least for the medium-term (meaning the next Eclipse release in 2011), that both Eclipse Core and CDT move all file operations into threads and hide latency by doing IO operations in parallel. Therefore a different approach is needed to have an acceptably performing remote IO method.

[edit] Remote Synchronization

As an alternative to the current techniques, we are proposing a new approach for remote projects. This approach relies on maintaining synchronization between two copies of the project: a local copy that exists in the user's workspace; and a remote copy that is used for building and launching the application.

[edit] Advantages

  • A local copy of the project exists in the filesystem, so network latencies are minimized
  • Offline operation is possible
  • All CDT functions are supported
  • Other languages, such as Fortran require minimal effort to support
  • Eclipse features, such as Team support, can also be used

[edit] Disadvantages/Issues

  • The entire project must be copied to the local machine. This only happens once, but could take a very long time for large projects/slow connections.
  • Local indexing is problematic as the local environment will be different from the remote environment, so macros and includes will be incorrect. Running scanner discovery remotely seems to be the obvious way to solve the macro problem, but scanner discovery is hopelessly broken and not even the CDT people seem to know how it works. In addition, the indexer would need to be modified to copy system and library includes from the remote machine as part of the indexing.
  • Some activities, such as building, will always need to be done remotely, so the performance problems will always be evident to some degree.

[edit] Similar/Prior efforts

[edit] Within Eclipse

[edit] RSE

[edit] Photran (Support for Fortran)

The current implementation is a strawman prototype of a rsync-based remotely-synchronized project. It adds a new project wizard which creates a C/Fortran project but replaces the standard CDT build command (make) with a call to a custom shell script which uses rsync to copy the project to a remote server and run make remotely. This was definitely a prototype -- I'm sure the final version won't look anything like it (e.g., our build script makes two or three separate connections to the remote machine) -- but this is *simple* and it works, more or less, which gave us something real to try out.

Remote Include Path Support adds remote (Fortran) INCLUDE paths to Photran. Photran's include paths are configured in the project properties. Traditionally, they'd be paths on the local maching (e.g., /usr/include:/usr/local/include). This replaces them with URIs, so they can be on either the local machine or a remote one (e.g., rse://remotehost/usr/include:file:///usr/include). It also changes the properties page to use a remote file selection dialog box.

[edit] Outside from Eclipse

[edit] Synchronization using GIT

[edit] Remote file systems

[edit] Use Cases


local project 
project in Eclipse workspace, not synchronized
remote project 
project on remote system, not synchronized
synchronized project 
project that exists in workspace and on remote machine, and that has been set up for synchronization

There is an implicit assumption in each of these use cases that they will be able to operate in "offline mode", i.e. where the client is disconnected from the remote system for some period of time. At some point after the client is reconnected (yet to be determined), there will be a resynchronization.

  1. User is developing a new project (managed or make based) for a single remote machine
    • User launches "New remote sync project" wizard
    • User supplies the project name, remote host, username, password, and remote path
    • Empty project (or initial template) created on local and remote machines
    • Wizard sets up builder/scanner discovery/remote paths to work on remote system
  2. User has an existing local project (managed or make based) and wishes to develop on a single remote machine
    • User launches "Convert to remote sync project" wizard
    • User supplies the remote host, username, password, and remote path
    • Wizard sets up builder/scanner discovery/remote paths to work on remote system
    • The project is synchronized with the remote machine
  3. User has an existing remote project (make based) and wishes to develop with Eclipse
    • User launches "Import remote project" wizard
    • User supplies the project name, remote host, username, password, and remote path
    • Wizard sets up builder/scanner discovery/remote paths to work on remote system
    • The project is synchronized with the remote machine
  4. User has an existing local project that has been checked out of revision control system
    • User launches "Convert to remote sync project" wizard
    • User supplies the remote host, username, password, and remote path
    • Wizard sets up builder/scanner discovery/remote paths to work on remote system
    • The project is synchronized with the remote machine
    • RCS control files are not synchronized
  5. User has an existing synchronized project, and wishes to develop for multiple remote systems
    • User launches "Add remote target" wizard (or possibly through project preferences)
    • User supplies the remote host, username, password, and remote path
    • Wizard sets up builder/scanner discovery/remote paths to work on remote system
    • The project is synchronized with the new remote machine

[edit] Design Elements

  1. How will synchronized projects be implemented?
    • Synchronized projects should not interfere with existing project types. i.e. any type of project should be synchronizable
    • A synchronized project will be indicated by a remoteSyncNature
    • When a synchronized project is created (or an existing project converted), a remoteSyncNature will be added to the project configuration
    • A synchronized project will be able to synchronize with multiple synchronization point (target system + project location)
    • There may be more than one synchronized project for each synchronization point
    • A decorator will be used to indicate that the project is a synchronized project (e.g. in the Package Explorer)
    • The nature will add a new property page to the project. This property page will list all the systems that the project will synchronized with (and allow remote systems to be added/removed) and how they will sync
  2. When will synchronization occur?
    • Synchronization will be customizable
    • Prior to a build. The builder will call an ensureSync() method to guarantee that all outstanding synchronization calls have finished
    • After a resource change (save, delete, rename, etc.) Synchronizing after each save should be asynchronous, otherwise the responsiveness problem wouldn't be addressed
  3. Should synchronization be manual or automatic?
    • Build automatically requires automatic synchronization. If the user tries to build manually and automatic sync is deactivated Eclipse should ask the user whether he/she wants to sync (similar to the current question whether he/she wants to save unsaved files)
  4. How does synchronization play with Team support?
    • Synchronization will be independent since it's unlikely that the user will want to have each synchronization event show up in his Team Revision history.
    • Synchronization should work well with any potential Team support (e.g. not synchronize RCS files; or support GIT both for sync and Team while isolating them of each other by e.g. having different branches)
    • For sync methods that use git or other RCS's, we should still be able to use the Repository Exploring perspective to set up the repository and access this information when performing the sync
  5. Where will synchronization information be stored?
    • Service configurations will be used to store connection information
    • Remote system information will be saved in the user's workspace using the service configuration mechanism (so that it ties into the other parts of PTP)
    • Service configurations will also specify the different sync methods (and sync method-specific information)
  6. How will synchronization be controlled?
    • The current plan is to use build configurations to control synchronization
    • Each build configuration will include the target system name (e.g. target1_release, target1_debug, target2_debug, etc.)
    • The "active" configuration will specify the synchronization point (target system + project location) for synchronization
    • Building a non-active configuration will automatically switch the "active" configuration
    • If the user has selected Indexer->"Use active build configuration", changing the "active" configuration will result in a re-synchronization and index rebuild
    • If the user has selected Indexer->"Use fixed build configuration", this "fixed" configuration and its associated target machine is used for the index. With this option selected, the index is not rebuild when switching the active build configuration, but only when changing the "fixed" build configuration.
  7. How will off-line development be supported?
    • TBD

[edit] Possible Back-ends

Rsync and GIT both have advantages and disadvantages. Thus it sounds reasonable to support at least those two and make it easy to add others later on.

[edit] Advantages of Rsync

  • widely available. And thus easy to set-up without installing software on the remote machine
  • others:TBD

[edit] Disadvantages of Rsync

Status of Java implementation is uncertain


  • Using the command line tool instead has several disadvantages:
    • Support for Windows
    • Passing Password/Not possible to share same connection
    • Reliance on external tool
  • C Library is available (librsync) but would require e.g. JNI, NestedVM
  • Java libraries that support file synchronization without being compatible with Rsync (e.g. JFileSync). OK because we could upload Server automatically. But protocol should be fast and reliable.
  • The synchronization is only one-way
  • Problems if clocks are not synchronized between systems

The two-way is important if, either automatically or by the user, remote files get changed. The one-way synchronization of rsync would usually not synchronize changes to the client and would not detect conflicts caused by changes on both sides very well.

[edit] Advantages of GIT

  • It has a java implementation (shipping with Helios)
  • Java can also be used on the server (if native is not available)
  • is known to be extremely fast (including the java implementation)
  • supports two way synchronization.

Of course GIT is not meant as a synchronization tool (but a DCVS) but it works as a synchronization tool extremely well. Using git for synchronization would work both for those users using it also for version control and for those users using some other tool for version control. As an example a remote synchronization of a folder containing ~4000files (1 changed - which unknown to GIT), ~100MB, where GIT detects file changes on both sides, over a remote connection (cable), takes less than one second. The performance is mainly limited by the file system for the tree traversal.

[edit] Implementation Issues with GIT

[edit] To push to a non-bare repository is discouraged

Their are different options

  • Fetch (not good option because it requires SSHD on the client)
  • Push into working branch with post-update hook. Disadvantages: Requires stat of each file on server (slow over NFS) and doesn't allow merge on client side
  • Push to separate bare repository. Disadvantage: Requires 2 repositories
  • Push to remote branch. Seems best option

[edit] Preprocessing & Indexing

Many of the advanced features of CDT and Photran require indexing/parsing the source code. When a project is being built remotely, the source code presented to the user should reflect the environment on the remote system rather than the local system. The main mechanisms that distinguish a remote environment from the local environment are the macros that are predefined by the compiler and the system include files that are included in the source code.

Note that this support is independent of remote synchronization (although may use some remote synchronization functionality to achieve it). A remotely synchronized project can still use a purely local environment without requiring any additional functionality.

[edit] System Include Files

To present an accurate reflection of the remote environment, system include files need to be fetched from the remote system for the preprocessor and indexer. This requires changes to the preprocessor and other parts of CDT/Photran in order to obtain the header file from the correct remote system rather than from the local system. To avoid performance issues, header files should also be cached locally where possible, but this must be an option as licensing issues may prevent it on some systems.

[edit] Scanner Discovery

Compiler (and makefile) defined macros play an important part in determining which header files will be included as well as which parts of the code will be enabled or disabled. CDT attempts to determine these macros using a process known as scanner discovery. This involves running commands on the target system to determine the macros that are defined. This process is inherently complex because every system has different compilers with different options for determining this information. RDT has already provided some remote scanner discovery functionality, so we plan to reuse this for synchronized projects.

[edit] Support for other Remote Tools besides Build


[edit] Milestones

  • Check feasibility of remote include path support in CDT (see #Preprocessing & Indexing)
  • Define a new Synchronization service type (which add synchronization/replication to the running EFS). It would have as public method guranteeSynchronized. The default server (for a purely local project or for remotetools/RSE) would do nothing.
  • Add to all remote operations (compile, remote index, ..) a call to gureanteeSyncronized
  • Implement a GIT based synchronization service (doing the GIT push in the gureanteeSyncronized call)
  • Add the GUI to configure the synronization service (including new project wizard)

Later an EFS which would do the asynchronous GIT push after a file modifcation(e.g. save) and the gureanteeSyncronized would just wait for the push to finish.

Timeline: TBD

[edit] Additional Features

[edit] Remote file view

To run the binary it is required to select it on the remote machine. For that it would be nice to be able to browse the remote machine using RemoteTools/RSE. Preferable the path shown is the path to the remote copy of the local project. Also it would be nice to have a "Project Explorer" View for the files on the remote machines. This would allow to view binaries, object files and other remote files not part of the synchronization. This view should also use RemoteTools/RSE

[edit] Build local and remote from same project

If the remote build is implemented as a builder which can be added to a standard CDT/Photran project, than it is possible to have both a local and remote builder for the same project. CDT already supports several builder configuration. Thus this can be used to switch between local and (potentially several) remote builder. It has to be checked that the indexer (including the remote include files) is updated correctly when the builder configuration switches.