Jump to: navigation, search

PTP/designs/remote/sync final

< PTP‎ | designs‎ | remote
Revision as of 18:23, 9 October 2012 by G.watson.computer.org (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

This document describes the design of the Remote Project Synchronization framework. It is based on the original proposal by Roland Schulz and others

Eclipse projects (including C/C++ projects) are traditionally stored on the local filesystem of the machine on which Eclipse is running. This is adequate for Java, since the bytecode is inherently portable, and for embedded development, since a cross-compiler environment is typically employed. However it does not suit the development of HPC applications because it is generally very difficult to replicate the environment of the target system on the local machine. In order to overcome this limitation, the Remote Development Tools (RDT) component of PTP added support for remote projects. However, the approach taken has a number of serious limitations that preclude it being used to develop many applications. Although the functionality of RDT has improved, the main issues that still remains outstanding is that it only supports C/C++ projects and is not a general solution that can be used for other languages and project types.

Remote Project Synchronization is an alternative to RDT that overcomes many of RDT's inherent limitations. Synchronized projects work by maintaining both local and a remote copies of the source code, and these two copies are kept in synchronization by Eclipse. The advantage of this approach is that Eclipse is able to operate on the local copy as normal, and does not need to be concerned with network delays or other issues. No special changes to the project infrastructure are required, since the project just looks like a normal local project. The remote copy of the code is maintained in order to be able to build the application in the environment present on the target system without needing to copy the entire source tree each time. This allows any native compilers or libraries that are present to be utilized without the need for this environment to be replicated on the local machine.

The three different project types are shown below.

Project types.png

In the local project, the primary Eclipse services, such as editing, indexing, searching, navigating, building etc. operate directly on the local project, which is located on the local filesystem of the machine running Eclipse. For remote projects, these services are proxied by an agent running on the remote machine, which must provide remote equivalents of the services Eclipse uses locally. Because the index, search, and navigation services tend to be very language specific, a different agent is required for each language that is to be supported. Currently an agent is only available for C and C++. In the case of synchronized projects, editing, indexing, etc. operate on the local copy of the project. Only the project build needs to be run on the target machine. Because this service is more generic than the language services, it is available for more languages (currently C, C++, and Fortran). To support other languages, such as python or Java, a remote version of their builders would also need to be provided. This is a much simpler task than adding services for a different language.

Design

A number of principles underpin the design of the synchronization framework. These provide goals that, although they may not be met initially, the implementation can work towards in future revisions. These principles are as follows:

  • Synchronized projects should not interfere with existing project natures. i.e. any kind of project should be synchronizable. While this goal is desirable, the current implementation only supports synchronization of CDT projects (C, C++, UPC, and Fortran) for a number of reasons that will be discussed in more detail below. Our plan is to eventually support any project nature, however.
  • The use of synchronized projects should be transparent to the user. This is largely the case for most user interface operations from within Eclipse. However, the project is ultimately located on a remote machine, so there is some degree of configuration that is required to set this up. Also, building the project occurs remotely, so the user generally needs to be aware that this is occurring.
  • Synchronized projects should be independent of Team support. Although there is a temptation to use synchronization for sharing project code, synchronized support is really orthogonal to Team support since it does not provide the same richness of features of revision control systems. Synchronization is primarily for keeping two copies of code in sync. In the current implementation, synchronized projects can also be shared with any of the Team providers that are available in Eclipse.

Synchronization Service

The synchronization framework is designed to be extensible, so that different synchronization providers can be provided to suit different environments. This is achieved by adding a new service called org.eclipse.ptp.rdt.sync.core.SyncService to the services framework provided by PTP. To add a new synchronization provider, a plugin must supply a class that implements the ISyncServiceProvider interface, then registers this class using the org.eclipse.ptp.services.core.providers extension point. The services framework will then manage loading the class at the correct time, persisting data, etc.

APIs

The ISyncServiceProvider provides the following methods:

public String getLocation() 
Get the build location specified by this sync service provider.
public IRemoteConnection getRemoteConnection() 
Get the remote connection used by this sync service provider.
public void synchronize(IProject, BuildScenario, IResourceDelta, SyncFileFilter, IProgressMonitor, EnumSet<SyncFlag>) 
Synchronize a project (represented by the IProject object) using the BuildScenario and filters specified by SyncFileFilter. If an IResourceDelta is supplied, then only resources in the delta will be synchronized. The SyncFlags control the behavior of the synchronization.
public Set<IPath> getMergeConflictFiles(IProject project, BuildScenario buildScenario) 
Get the current list of merge-conflicted files for the project and build scenario
public String[] getMergeConflictParts(IProject project, BuildScenario buildScenario, IFile file) 
Get the three parts of the merge-conflicted file (left, right, and ancestor, respectively)
public void setMergeAsResolved(IProject project, BuildScenario buildScenario, IPath[] paths) 
Set the given file paths as resolved (merge conflict does not exist)
public void checkout(IProject project, BuildScenario buildScenario, IPath[] paths) 
Replace the current contents of the given paths with the previous versions in the repository
public void checkoutRemoteCopy(IProject project, BuildScenario buildScenario, IPath[] paths) 
Replace the current contents of the given paths with the current local copies of the remote (not necessarily the same as what is on the remote site). This is useful in merge-conflict resolution.
public void close(IProject project) 
Close any resources that were open by the sync provider for the given project. Resources not open by the provider should not be touched. This is called, for example, when a project is about to be deleted.

User Interface

The synchronization user interface is provided using the context menu on a project, as shown below.

Context menu.png

The "Synchronization" menu provides the following options and commands:

Auto-Sync 
Enable/disable automatic synchronization for all projects (global option). If enabled, synchronization will take place when a resource is modified, prior to a build, and immediately after a build. If disabled, synchronization will only happen if the user selects one of the synchronization actions.
Project Auto-Sync Settings 
Sets the auto synchronization settings for the current project. If set to Sync-Active only the active configuration will be synchronized. If set to Sync-All, all configurations will be synchronized in sequence. If set to Sync-None the no configurations will be synchronized. This is the same as disabling Auto-Sync, but for a single project only.
Sync Active Now 
This command will immediately synchronize using the active configuration of the current project.
Sync All Now 
This command will immediately synchronize all configurations of the current project.
Filter... 
Open the synchronization filter dialog to configure filters for this project. Filtering is described in more detail in Filtering.

Project Nature

When a synchronized project is created, or an existing project is converted to synchronized, a remoteSyncNature is added to the project configuration. Only projects with this nature will be recognized by the synchronization framework. Projects that have this nature are decorated in the Project Explorer view to indicate that they are synchronized.

Synchronization Initiation

The synchronization framework registers a resource change listener for all resources in the workspace. Synchronization will be initiated for any resource delta that is of type IResourceDelta.CHANGED and after an event of type IResourceChangeEvent.POST_BUILD. The first synchronization will ensure that any resources that have been added, modified, or deleted in the local workspace will be reflected on the target system. The second synchronization will ensure that any artifacts that are generated as a result of the build will be copied to the local workspace. Only builds that were started manually will result in a synchronization, however, since automatic builds will occur after any resource change.

In addition to synchronizing on a resource change, projects will also be synchronized prior to a build. For CDT projects, the framework registers a remote builder using the org.eclipse.cdt.managedbuilder.core.buildDefinitions extension point, and adds this builder to the properties for the project. When the project is built, the ICommandLauncher#execute() method is invoked by the CDT managed build framework. PTP provides an implementation of ICommandLauncher called SyncCommandLauncher that will first initiate a synchronization, then issue the appropriate remote commands to build the project.

Multiple Hosts

Synchronization to multiple hosts is supported using the BuildScenario class. An object of this class is passed to the ISyncServiceProvider#synchronize() method along with the project reference. A BuildScenario encapsulates the connection and location information for the project on the remote target system. By maintaining multiple BuildScenarios, it is possible to perform synchronization to multiple systems.

For CDT projects, the BuildScenario information is stored in the project build configuration (IConfiguration interface). The framework provides the BuildConfigurationManager class for managing the interface between the synchronization framework and CDT's build configurations.

CDT provides the notion of an "active" build configuration, so the synchronization framework uses this as the default configuration when synchronizing the project. The user interface also provides a means of synchronizing all build configurations, which will cause the project to be synchronized with each target system that has been configured.

Multi sync.png

In the above diagram, a project is being synchronized to three remote hosts, A, B, and C. The "active" configuration is currently set to host A, so when a build or synchronize is performed, it will automatically use that host.The user can override this by use the "Build All" or "Synchronize All Now" menus.

Wizards

Synchronized projects are created using one of two available wizards. A new synchronized project can be created from scratch, or from an existing remote project, using the Synchronized Project wizard. An existing local project can also be converted to a synchronized project (and hence synchronized to a remote system) using the Convert To Synchronized Project wizard.

Synchronized Project Wizard

This wizard is used to create a new synchronized project from scratch, or to synchronize with an existing remote project. It comes in two flavors, one for creating C/C++ projects, and one for creating Fortran projects. The only difference between these is that the latter wizard adds a Fortran nature to the project, so should be used if the project is likely to contain any Fortran code. The user interface is shown below.

New wizard.png

The wizard is similar to the normal C Project or C++ Project wizards in that it allows the user to specify a project name, location, select the project type, and choose the toolchain. In addition to these however, the wizard allows the user to select a remote location for the project, which is the location that will be used to synchronize the remote files, and a remote toolchain that specifies the toolchain that will be used on the remote machine. If the remote project location already contains resources, then these will be synchronized with the local project immediately upon completion of the wizard.

Convert To Synchronized Project Wizard

As its name implies, this wizard is used to convert an existing C, C++, or Fortran project to be synchronized, which requires adding a remote location and remote toolchain to the project. When the wizard is first invoked, it will scan the workspace for any non-synchronized CDT projects. The user is then able to select the projects for conversion from a list, and once a remote location is chosen, the project will be converted. This process duplicates the selected configurations and adds the remote location to them. If a different toolchain needs to be selected for the remote configurations, the user must manually edit the build configuration for each remote configuration. The user interface for this wizard is shown below.

Convert wizard.png

Filtering

Being able to specify which files are synchronized is essential, especially for projects that contain large numbers of object files or large data files that are not necessary for application development. Filtering is provided on a per-project basis, and can be used to specify exactly which resources are included or excluded from the synchronization. Default filters can also be set in the Eclipse settings, and these are inherited by a synchronized project when it is created. By default, the following resources are excluded: .ptp-sync, .settings, .cproject, and .project.

APIs

Resource filtering is provided using the SyncFileFilter class which is passed to the ISyncProvider#synchronize() method, which maintains a list of patterns (which extend the abstract base class ResourcePattern) that are to be included in the the synchronization and list that are to be excluded from the synchronization.

In the current implementation, two types of filter patterns are available:

PathResourceMatcher 
This filter tests resources against a specific directory path.
RegexPatternMatcher 
This filter tests resources against a regular expression.

User Interface

Filtering can be configured when creating a synchronized project or converting a project to use synchronization. Filters can also be configured at any time using the synchronization context menu on the project. The following user interface is used for configuration filtering on a project.

Filter config.png

The File View area is used to include or exclude resources directly (using the checkbox next to the resource), and will also show which files are being included or excluded based on the patterns. The Pattern View shows the currently defined patterns and allows the pattern ordering to be altered. Patterns are applied in order from top to bottom. A new path or regex pattern can be added by entering the information in the appropriate text field and selecting either the Exclude or Include buttons as required.

Off-line Operation

One of the advantages of synchronized projects over remote projects is the ability to operate without a network connection. Off-line operation is automatically managed by the use of asynchronous synchronize operations. If a connection to the target system is unavailable, the operation will be queued in order until network connectivity is restored, at which time the operations will be performed. The user does not need to take any action for this automatic behavior to be enabled.

A caveat to off-line operation is that building remotely while off-line is not possible. If the user attempts to build while off-line, then the build will fail and any subsequent synchronization operations will be discarded. [verify this behavior]

Scanner Discovery & Include Files

Many of the advanced features of CDT and Photran require indexing/parsing the source code. When a project is being built remotely, the source code presented to the user should reflect the environment on the remote system rather than the local system. The main mechanisms that distinguish a remote environment from the local environment are the macros that are predefined by the compiler and the system include files that are included in the source code.

Note that this support is independent of remote synchronization (although may use some remote synchronization functionality to achieve it). A remotely synchronized project can still use a purely local environment without requiring any additional functionality.

To present an accurate reflection of the remote environment, system include files need to be fetched from the remote system for the preprocessor and indexer. This requires changes to the preprocessor and other parts of CDT/Photran in order to obtain the header file from the correct remote system rather than from the local system. To avoid performance issues, header files should also be cached locally where possible, but this must be an option as licensing issues may prevent it on some systems.

Compiler (and makefile) defined macros play an important part in determining which header files will be included as well as which parts of the code will be enabled or disabled. CDT attempts to determine these macros using a process known as scanner discovery. This involves running commands on the target system to determine the macros that are defined. This process is inherently complex because every system has different compilers with different options for determining this information. RDT has already provided some remote scanner discovery functionality, so we plan to reuse this for synchronized projects.

Synchronization Provider Implementations

Currently the only mature synchronization provider uses the Git protocol to manage synchronization. Some work has been undertaken to develop a provider based on Rsync, but to date this has not been completed. The Git provider uses the JGit feature to provide a Java implementation of Git for Eclipse. Git also has the advantage of being very fast, even for a large number of small commits.

It is outside the scope of this document to describe the details of the Git implementation.

Support for Non-CDT Projects

The current implementation of synchronized projects relies on the build configuration properties of CDT projects for a number of reasons. Project building and synchronization are closely related, since the purpose of synchronizing a project is generally to support remote building. CDT's build configurations provide a simple mechanism for supporting synchronization to multiple hosts, since each host can be associated with a build configuration, and the configuration augmented to include the location of the remote project. The "active" synchronization host then automatically corresponds to the "active" build configuration. Another advantage of using CDT's build configurations is that the configuration information is stored with the project, so moving the project to a different workspace automatically preserves the synchronization information.

In order to support the synchronization of other projects, such as Java or Python, the synchronization configuration information needs to be maintained in a different manner. This could be either using a project-specific location (such as CDT's build configurations) or as generic information in the project properties. In addition, building these projects remotely requires a different remote builder implementation, since they are unlikely to use Makefiles as their primary build mechanism.