Introduction

This document attempts to describe the current state of the ideas and designs for a synchronization based file access for remote projects. This document has been started by Roland Schulz based on discussions on bug 316709.

Rationale

Remote project support in Eclipse (i.e. projects where the source code is located remotely, and the executable is built and run remotely) is an essential requirement for science and engineering applications. Currently, options for using remote projects in Eclipse are:

Network filesystems
Revision control systems
Remote services

The problems with these approaches are discussed below. The approach being proposed has different disadvantages (see below). Thus this approach is not meant to replace the current RDT approach but offer an alternative. This will allow the user to choose the approach with those advantages best aligned to his working environment. Also this approach will reuse parts from RDT (e.g. scanner/indexer) from RDT.

Network filesystems

Using a network filesystem, such as NFS, to trick Eclipse into thinking the project is local suffers from a number of problems. The indexing and advanced editing features of CDT require accessing all source files in the project. These features already suffer performance related issues when accessing local files (particularly for large projects). When using a network filesystem, they can become unacceptably slow. In addition, network filesystems typically require system administration access to configure locally (although systems such as FUSE can be used to circumvent this) and on the remote system. This technique also requires a continuous network connection in order to access the project.

Revision control systems

Eclipse provides exemplary integration with a variety of RCS's, including CVS, SVN, and git. One approach to remote projects is to use the features of a RCS, to synchronize with a remote copy of the project. This has the nice feature of already being well integrated with Eclipse, but activities such as building and launching do not support this model since they need to take place on the remote system. In addition, there are other issue with scanner discovery and the remote environment that will be discussed in more detail below.

Remote services

The Remote Development Tools (RDT) takes the approach of identifying the range of services required for C/C++ projects and creating remote implementations of these. These services can be broken down into the following categories:

file access (for editing)
managed build
make build
indexing
model builder (e.g. outline view)
call hierarchy
type hierarchy
content assist
include browser
navigation (e.g. open declaration)
search

RDT currently supports either RemoteTools and RSE for providing remote services. Both Remote Tools and RSE use a remote DStore server to provide remote implementations of most of these services. File access is provided using the EFS abstraction in Eclipse. EFS services are provided either via the DStore server for RSE, or a separate SFTP service for Remote Tools (for historical reasons).

This approach has several disadvantages:

Although caching can be used in many cases, responsiveness of UI is determined by network speed.
A continuous network connection is required for project access
A large number of users could overload the remote system with indexing
Not all CDT functions can be supported using this model (e.g. refactoring)
Remote search currently only works with RSE
Not all Eclipse functionality fully supports EFS (e.g. Team services)
Only CDT (i.e. C and C++) is supported, other languages, such as Fortran, would require a significant engineering effort

Responsiveness

The Eclipse core and CDT are doing most of the file operations in the main thread (based on the assumtion that all file operations are low latency). This causes a responsiveness problems with a remote file system.

Because the file operations are in the main thread they block the GUI until the IO operation finishes and thus preventing the user to continue the work while the IO operation is running. It also often prevents IO operations which could run in parallel to do so. See Bugs 160353, 177994, 195997, 218387, 219169 and wiki.eclipse.org/TM_and_RSE_FAQ from the RSE team regarding the same problem for RSE. Their seems to be no work-around for this problem. While it seems in theory to be possible to improve it somewhat by using Display.readAndDispatch, it is not advised and has been removed from RSE (160353). Having a responsive UI is considered by many extremely important thus this is an important point.

It is very unlikely, at least for the medium-term (meaning the next Eclipse release in 2011), that both Eclipse Core and CDT move all file operations into threads and hide latency by doing IO operations in parallel. Therefore a different approach is needed to have an acceptably performing remote IO method.

Remote Synchronization

As an alternative to the current techniques, we are proposing a new approach for remote projects. This approach relies on maintaining synchronization between two copies of the project: a local copy that exists in the user's workspace; and a remote copy that is used for building and launching the application.

Advantages

A local copy of the project exists in the filesystem, so network latencies are minimized
Offline operation is possible
All CDT functions are supported
Other languages, such as Fortran require minimal effort to support
Eclipse features, such as Team support, can also be used

Disadvantages/Issues

The entire project must be copied to the local machine. This only happens once, but could take a very long time for large projects/slow connections.
Local indexing is problematic as the local environment will be different from the remote environment, so macros and includes will be incorrect. Running scanner discovery remotely seems to be the obvious way to solve the macro problem, but scanner discovery is hopelessly broken and not even the CDT people seem to know how it works. In addition, the indexer would need to be modified to copy system and library includes from the remote machine as part of the indexing.
Some activities, such as building, will always need to be done remotely, so the performance problems will always be evident to some degree.

Similar/Prior efforts

Within Eclipse

RSE

rsync file subsystem Bug describing ideas to implement rsync based back-end for RSE
Use rsync to sync the remote workspace to the local machine FAQ including disadvantages and advantages of using Rsync for a remote project

Phortran

PTP/photran/rsync remote projects wiki page describing the working Photran test implementation
Rsync-style Remote Project Support for Photran
Remote Include Path Support

The current implementation is a strawman prototype of a rsync-based remotely-synchronized project. It adds a new project wizard which creates a C/Fortran project but replaces the standard CDT build command (make) with a call to a custom shell script which uses rsync to copy the project to a remote server and run make remotely. This was definitely a prototype -- I'm sure the final version won't look anything like it (e.g., our build script makes two or three separate connections to the remote machine) -- but this is *simple* and it works, more or less, which gave us something real to try out.

Remote Include Path Support adds remote (Fortran) INCLUDE paths to Photran. Photran's include paths are configured in the project properties. Traditionally, they'd be paths on the local maching (e.g., /usr/include:/usr/local/include). This replaces them with URIs, so they can be on either the local machine or a remote one (e.g., rse://remotehost/usr/include:file:///usr/include). It also changes the properties page to use a remote file selection dialog box.

Outside from Eclipse

Synchronization using GIT

Remote file systems

Use Cases

User creates a new project (managed or make based) using "New remote sync project" wizard. User supplies the project name, remote host, username, password, and remote path. Empty project (or initial template) created on local and remote machines. Wizard sets up builder/scanner discovery/remote paths to work on remote system.
Existing local project (managed or make based). Use "Convert to remote project" wizard to create a copy of the project on a remote machine and maintain synchronization between the local and remote copy. User supplies the remote host, username, password, and remote path. Wizard sets up builder/scanner discovery/remote paths to work on remote system.
Existing remote project (make based). Use "Import remote project" wizard to create a local copy of the project and maintain synchronization between local and remote copy. User supplies the project name, remote host, username, password, and remote path. Wizard sets up builder/scanner discovery/remote paths to work on remote system.
Existing local project that has been checked out of revision control system. Use "Convert to remote project" wizard to creates a copy of the project on a remote machine and maintain synchronization between the local and remote copy. User supplies the remote host, username, password, and remote path. Wizard sets up builder/scanner discovery/remote paths to work on remote system. RCS control files are not synchronized.

Design Options

There are a number of design options that need to be addressed. These include:

Should the synchronization be a project type? A project nature?
When should synchronization occur?
- Synchronizing before any remote operation (build, remote index, ...)
- Synchronizing after each save
Should synchronization be manual or automatic?
How does synchronization play with Team support?

The 2nd option shouldn't wait on the sync but do it asynchronous. Otherwise the responsiveness problem (see above) wouldn't be addressed. Each remote operation would need to call a function to guarantee that all outstanding synchronization calls have finished. The same function would initiate the synchronization for option 1. Which option is better depends on the synchronization back-end and the user preferences and should thus be configurable.

Advantages of Sync after each save:

Required for Auto-Build and Indexing on Server
Reduces time to build (because is already synced)

Disadvantages

Causes larger repository (~2k per commit) and more traffic

Possible Back-ends

Rsync and GIT both have advantages and disadvantages. Thus it sounds reasonable to support at least those two and make it easy to add others later on.

Advantages of Rsync

widely available. And thus easy to set-up without installing software on the remote machine
others:TBD

Disadvantages of Rsync

no JAVA implementation is available
the synchronization is only one-way
problems if clocks are not synchronized between systems

The 2nd is important if, either automatically or by the user, remote files get changed. The one-way synchronization of rsync would usually not synchronize changes to the client and would not detect conflicts caused by changes on both sides very well.

Advantages of GIT

It has a java implementation (shipping with Helios)
is known to be extremely fast (including the java implementation)
supports two way synchronization.

Of course GIT is not meant as a synchronization tool (but a DCVS) but it works as a synchronization tool extremely well. Using git for synchronization would work both for those users using it also for version control and for those users using some other tool for version control. As an example a remote synchronization of a folder containing ~4000files (1 changed - which unknown to GIT), ~100MB, where GIT detects file changes on both sides, over a remote connection (cable), takes less than one second. The performance is mainly limited by the file system for the tree traversal.

Implementation Issues with GIT

To push to a non-bare repository is discouraged

Their are different options

Fetch (not good option because it requires SSHD on the client)
Push into working branch with post-update hook. Disadvantages: Requires

stat of each file on server (slow over NFS) and doesn't allow merge on client side

Push to separate bare repository. Disadvantage: Requires 2 repositories
Push to remote branch. Seems best option

Indexing

Local indexing should be supported. Should we also support remote indexing? Local indexing requires remote include files. See above for the support for Phortran. For CDT it has to be checked how feasible it is to support remote include file paths.

Support for other Remote Tools besides Build

TBD

Milestones

Check feasibility of remote include path support in CDT (see PTP/designs/remote/sync#Indexing)
Define a new Synchronization service type (which add synchronization/replication to the running EFS). It would have as public method guranteeSynchronized. The default server (for a purely local project or for remotetools/RSE) would do nothing.
Add to all remote operations (compile, remote index, ..) a call to gureanteeSyncronized
Implement a GIT based synchronization service (doing the GIT push in the gureanteeSyncronized call)
Add the GUI to configure the synronization service (including new project wizard)

Later an EFS which would do the asynchronous GIT push after a file modifcation(e.g. save) and the gureanteeSyncronized would just wait for the push to finish.

Timeline: TBD

Additional Features

Remote file view

To run the binary it is required to select it on the remote machine. For that it would be nice to be able to browse the remote machine using RemoteTools/RSE. Preferable the path shown is the path to the remote copy of the local project. Also it would be nice to have a "Project Explorer" View for the files on the remote machines. This would allow to view binaries, object files and other remote files not part of the synchronization. This view should also use RemoteTools/RSE

Build local and remote from same project

If the remote build is implemented as a builder which can be added to a standard CDT/Photran project, than it is possible to have both a local and remote builder for the same project. CDT already supports several builder configuration. Thus this can be used to switch between local and (potentially several) remote builder. It has to be checked that the indexer (including the remote include files) is updated correctly when the builder configuration switches.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

PTP/designs/remote/sync

Contents

Introduction

Rationale

Network filesystems

Revision control systems

Remote services

Responsiveness

Remote Synchronization

Advantages

Disadvantages/Issues

Similar/Prior efforts

Within Eclipse

RSE

Phortran

Outside from Eclipse

Synchronization using GIT

Remote file systems

Use Cases

Design Options

Possible Back-ends

Advantages of Rsync

Disadvantages of Rsync

Advantages of GIT

Implementation Issues with GIT

To push to a non-bare repository is discouraged

Indexing

Support for other Remote Tools besides Build

Milestones

Additional Features

Remote file view

Build local and remote from same project

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

PTP/designs/remote/sync

Contents

Introduction

Rationale

Network filesystems

Revision control systems

Remote services

Responsiveness

Remote Synchronization

Advantages

Disadvantages/Issues

Similar/Prior efforts

Within Eclipse

RSE

Phortran

Outside from Eclipse

Synchronization using GIT

Remote file systems

Use Cases

Design Options

Possible Back-ends

Advantages of Rsync

Disadvantages of Rsync

Advantages of GIT

Implementation Issues with GIT

To push to a non-bare repository is discouraged

Indexing

Support for other Remote Tools besides Build

Milestones

Additional Features

Remote file view

Build local and remote from same project