Revision as of 11:11, 9 October 2012

Introduction

This document describes the design of the Remote Project Synchronization framework. It is based on the original proposal by Roland Schulz and others

Eclipse projects (including C/C++ projects) are traditionally stored on the local filesystem of the machine on which Eclipse is running. This is adequate for Java, since the bytecode is inherently portable, and for embedded development, since a cross-compiler environment is typically employed. However it does not suit the development of HPC applications because it is generally very difficult to replicate the environment of the target system on the local machine. In order to overcome this limitation, the Remote Development Tools (RDT) component of PTP added support for remote projects. However, the approach taken has a number of serious limitations that preclude it being used to develop many applications. Although the functionality of RDT has improved, the main issues that still remains outstanding is that it only supports C/C++ projects and is not a general solution that can be used for other languages and project types.

Remote Project Synchronization is an alternative to RDT that overcomes many of RDT's inherent limitations. Synchronized projects work by maintaining both local and a remote copies of the source code, and these two copies are kept in synchronization by Eclipse. The advantage of this approach is that Eclipse is able to operate on the local copy as normal, and does not need to be concerned with network delays or other issues. No special changes to the project infrastructure are required, since the project just looks like a normal local project. The remote copy of the code is maintained in order to be able to build the application in the environment present on the target system without needing to copy the entire source tree each time. This allows any native compilers or libraries that are present to be utilized without the need for this environment to be replicated on the local machine.

The three different project types are shown below.

In the local project, the primary Eclipse services, such as editing, indexing, searching, navigating, building etc. operate directly on the local project, which is located on the local filesystem of the machine running Eclipse. For remote projects, these services are proxied by an agent running on the remote machine, which must provide remote equivalents of the services Eclipse uses locally. Because the index, search, and navigation services tend to be very language specific, a different agent is required for each language that is to be supported. Currently an agent is only available for C and C++. In the case of synchronized projects, editing, indexing, etc. operate on the local copy of the project. Only the project build needs to be run on the target machine. Because this service is more generic than the language services, it is available for more languages (currently C, C++, and Fortran). To support other languages, such as python or Java, a remote version of their builders would also need to be provided. This is a much simpler task than adding services for a different language.

Design

A number of principles underpin the design of the synchronization framework. These provide goals that, although they may not be met initially, the implementation can work towards in future revisions. These principles are as follows:

Synchronized projects should not interfere with existing project natures. i.e. any kind of project should be synchronizable. While this goal is desirable, the current implementation only supports synchronization of CDT projects (C, C++, UPC, and Fortran) for a number of reasons that will be discussed in more detail below. Our plan is to eventually support any project nature, however.
The use of synchronized projects should be transparent to the user. This is largely the case for most user interface operations from within Eclipse. However, the project is ultimately located on a remote machine, so there is some degree of configuration that is required to set this up. Also, building the project occurs remotely, so the user generally needs to be aware that this is occurring.
Synchronized projects should be independent of Team support. Although there is a temptation to use synchronization for sharing project code, synchronized support is really orthogonal to Team support since it does not provide the same richness of features of revision control systems. Synchronization is primarily for keeping two copies of code in sync. In the current implementation, synchronized projects can also be shared with any of the Team providers that are available in Eclipse.

The synchronization framework is designed to be extensible, so that different synchronization providers can be provided to suit different environments. This is achieved using the services framework provided by PTP. To add a new synchronization provider, a plugin must supply a class that implements the ISyncServiceProvider interface, then register this class using the org.eclipse.ptp.services.core.providers extension point. The services framework will then manage loading the class at the correct time, persisting data, etc.

Project Nature

When a synchronized project is created, or an existing project is converted to synchronized, a remoteSyncNature is added to the project configuration. Only projects with this nature will be recognized by the synchronization framework. Projects that have this nature are decorated in the Project Explorer view to indicate that they are synchronized.

Synchronization Initiation

The synchronization framework registers a resource change listener for all resources in the workspace. Synchronization will be initiated for any resource delta that is of type IResourceDelta.CHANGED and after an event of type IResourceChangeEvent.POST_BUILD. The first synchronization will ensure that any resources that have been added, modified, or deleted in the local workspace will be reflected on the target system. The second synchronization will ensure that any artifacts that are generated as a result of the build will be copied to the local workspace. Only builds that were started manually will result in a synchronization, however, since automatic builds will occur after any resource change.

In addition to synchronizing on a resource change, projects will also be synchronized prior to a build. For CDT projects, the framework registers a remote builder using the org.eclipse.cdt.managedbuilder.core.buildDefinitions extension point, and adds this builder to the properties for the project. When the project is built, the ICommandLauncher#execute() method is invoked by the CDT managed build framework. PTP provides an implementation of ICommandLauncher called SyncCommandLauncher that will first initiate a synchronization, then issue the appropriate remote commands to build the project.

Multiple Hosts

- A synchronized project will be able to synchronize with multiple synchronization point (target system + project location)
- There may be more than one synchronized project for each synchronization point
- The nature will add a new property page to the project. This property page will list all the systems that the project will synchronized with (and allow remote systems to be added/removed) and how they will sync
When will synchronization occur?
- Synchronization will be customizable

How does synchronization play with Team support?

Where will synchronization information be stored?
- Service configurations will be used to store connection information
- Remote system information will be saved in the user's workspace using the service configuration mechanism (so that it ties into the other parts of PTP)
- Service configurations will also specify the different sync methods (and sync method-specific information)
How will synchronization be controlled?
- The current plan is to use build configurations to control synchronization
- Each build configuration will include the target system name (e.g. target1_release, target1_debug, target2_debug, etc.)
- The "active" configuration will specify the synchronization point (target system + project location) for synchronization
- Building a non-active configuration will automatically switch the "active" configuration
- If the user has selected Indexer->"Use active build configuration", changing the "active" configuration will result in a re-synchronization and index rebuild
- If the user has selected Indexer->"Use fixed build configuration", this "fixed" configuration and its associated target machine is used for the index. With this option selected, the index is not rebuild when switching the active build configuration, but only when changing the "fixed" build configuration.
How will off-line development be supported?
- TBD

Possible Back-ends

Rsync and GIT both have advantages and disadvantages. Thus it sounds reasonable to support at least those two and make it easy to add others later on.

Advantages of Rsync

widely available. And thus easy to set-up without installing software on the remote machine
others:TBD

Disadvantages of Rsync

Status of Java implementation is uncertain

http://sourceforge.net/projects/jarsync

Alternatives

Using the command line tool instead has several disadvantages:
- Support for Windows
- Passing Password/Not possible to share same connection
- Reliance on external tool
C Library is available (librsync) but would require e.g. JNI, NestedVM
Java libraries that support file synchronization without being compatible with Rsync (e.g. JFileSync). OK because we could upload Server automatically. But protocol should be fast and reliable.

The synchronization is only one-way
Problems if clocks are not synchronized between systems

The two-way is important if, either automatically or by the user, remote files get changed. The one-way synchronization of rsync would usually not synchronize changes to the client and would not detect conflicts caused by changes on both sides very well.

Advantages of GIT

It has a java implementation (shipping with Helios)
Java can also be used on the server (if native is not available)
is known to be extremely fast (including the java implementation)
supports two way synchronization.

Of course GIT is not meant as a synchronization tool (but a DCVS) but it works as a synchronization tool extremely well. Using git for synchronization would work both for those users using it also for version control and for those users using some other tool for version control. As an example a remote synchronization of a folder containing ~4000files (1 changed - which unknown to GIT), ~100MB, where GIT detects file changes on both sides, over a remote connection (cable), takes less than one second. The performance is mainly limited by the file system for the tree traversal.

Implementation Issues with GIT

To push to a non-bare repository is discouraged

Their are different options

Fetch (not good option because it requires SSHD on the client)
Push into working branch with post-update hook. Disadvantages: Requires stat of each file on server (slow over NFS) and doesn't allow merge on client side
Push to separate bare repository. Disadvantage: Requires 2 repositories
Push to remote branch. Seems best option

Preprocessing & Indexing

Many of the advanced features of CDT and Photran require indexing/parsing the source code. When a project is being built remotely, the source code presented to the user should reflect the environment on the remote system rather than the local system. The main mechanisms that distinguish a remote environment from the local environment are the macros that are predefined by the compiler and the system include files that are included in the source code.

Note that this support is independent of remote synchronization (although may use some remote synchronization functionality to achieve it). A remotely synchronized project can still use a purely local environment without requiring any additional functionality.

System Include Files

To present an accurate reflection of the remote environment, system include files need to be fetched from the remote system for the preprocessor and indexer. This requires changes to the preprocessor and other parts of CDT/Photran in order to obtain the header file from the correct remote system rather than from the local system. To avoid performance issues, header files should also be cached locally where possible, but this must be an option as licensing issues may prevent it on some systems.

Scanner Discovery

Compiler (and makefile) defined macros play an important part in determining which header files will be included as well as which parts of the code will be enabled or disabled. CDT attempts to determine these macros using a process known as scanner discovery. This involves running commands on the target system to determine the macros that are defined. This process is inherently complex because every system has different compilers with different options for determining this information. RDT has already provided some remote scanner discovery functionality, so we plan to reuse this for synchronized projects.

Support for other Remote Tools besides Build

TBD

Milestones

Check feasibility of remote include path support in CDT (see #Preprocessing & Indexing)
Define a new Synchronization service type (which add synchronization/replication to the running EFS). It would have as public method guranteeSynchronized. The default server (for a purely local project or for remotetools/RSE) would do nothing.
Add to all remote operations (compile, remote index, ..) a call to gureanteeSyncronized
Implement a GIT based synchronization service (doing the GIT push in the gureanteeSyncronized call)
Add the GUI to configure the synronization service (including new project wizard)

Later an EFS which would do the asynchronous GIT push after a file modifcation(e.g. save) and the gureanteeSyncronized would just wait for the push to finish.

Timeline: TBD

Additional Features

Remote file view

To run the binary it is required to select it on the remote machine. For that it would be nice to be able to browse the remote machine using RemoteTools/RSE. Preferable the path shown is the path to the remote copy of the local project. Also it would be nice to have a "Project Explorer" View for the files on the remote machines. This would allow to view binaries, object files and other remote files not part of the synchronization. This view should also use RemoteTools/RSE

Build local and remote from same project

If the remote build is implemented as a builder which can be added to a standard CDT/Photran project, than it is possible to have both a local and remote builder for the same project. CDT already supports several builder configuration. Thus this can be used to switch between local and (potentially several) remote builder. It has to be checked that the indexer (including the remote include files) is updated correctly when the builder configuration switches.

@@ Line 21: / Line 21: @@
 * '''Synchronized projects should be independent of Team support.''' Although there is a temptation to use synchronization for sharing project code, synchronized support is really orthogonal to Team support since it does not provide the same richness of features of revision control systems. Synchronization is primarily for keeping two copies of code in sync. In the current implementation, synchronized projects can also be shared with any of the Team providers that are available in Eclipse.
-= Design =
 The synchronization framework is designed to be extensible, so that different synchronization providers can be provided to suit different environments. This is achieved using the [[PTP/designs/services | services framework]] provided by PTP. To add a new synchronization provider, a plugin must supply a class that implements the <code>ISyncServiceProvider</code> interface, then register this class using the <code>org.eclipse.ptp.services.core.providers</code> extension point. The services framework will then manage loading the class at the correct time, persisting data, etc.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "PTP/designs/remote/sync final"

Revision as of 11:11, 9 October 2012

Contents

Introduction

Design

Project Nature

Synchronization Initiation

Multiple Hosts

Possible Back-ends

Advantages of Rsync

Disadvantages of Rsync

Advantages of GIT

Implementation Issues with GIT

To push to a non-bare repository is discouraged

Preprocessing & Indexing

System Include Files

Scanner Discovery

Support for other Remote Tools besides Build

Milestones

Additional Features

Remote file view

Build local and remote from same project

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "PTP/designs/remote/sync final"

Revision as of 11:11, 9 October 2012

Contents

Introduction

Design

Project Nature

Synchronization Initiation

Multiple Hosts

Possible Back-ends

Advantages of Rsync

Disadvantages of Rsync

Advantages of GIT

Implementation Issues with GIT

To push to a non-bare repository is discouraged

Preprocessing & Indexing

System Include Files

Scanner Discovery

Support for other Remote Tools besides Build

Milestones

Additional Features

Remote file view

Build local and remote from same project