Jump to: navigation, search

EMF Compare/Specifications/LogicalMergeCommandLine

Part 1: analysis of CGit and JGit

Within this part, we will describe the existing state of CGit and JGit concerning command line merging and their customization possibilities, then try and outline evolution axes to provide logical model support for these merge operations.

This part focuses on two different command line git providers: CGit, the most widely used front end to git, written in C, and JGit, the java alternative developed within the Eclipse foundation. We will not consider other potential command line providers. This document will call logical model a set of inter-related physical files that should always be considered together to read a unit of information.

Problem statement

We need to be able to launch merge operations from the command line while having support for logical model merging. When launching operations with inner merges from either of the two front ends, the standard behavior will be to rely on textual, file-by-file merging. This may cause issues with logical models where text merges might fail with conflicts while there are no logical conflict, or where text merges might end successfully even though there were logical conflicts (in which case the end model will end up corrupted and unloadable).

Git operations that will involve file merging are the following:

  • merge
  • cherry-pick
  • pull
  • rebase
  • revert
  • stash apply
  • submodule update

The merge operation is in charge of modifying the files and adding the necessary information in the index when conflicts are detected so that the merge tool can then be called on these files to solve said conflicts. Currently, git merge operations allow for the customization of the merge drivers.

A merge driver's responsibility is to handle the merge of a single file when it is not considered to be a trivial operation by the merge strategy. As such, drivers will be called on a file basis and only during non-trivial merges. With regard to logical model merging, this is far from sufficient since:

  1. it works on a file-by-file basis (a merge driver can only modify the single file it has been called on, and thus cannot account for larger logical models), and
  2. this mechanism is only called for non-trivial merges... in the textual sense. For example, deleting a file is considered a trivial merge by the default merging strategies available from git, whereas deleting part of a logical model mandates changes in other parts of said model, not to mention a lot of potential logical conflicts that are not reflected as textual conflicts.

The merge tool will be also be called file-by-file, and only on the files that have been marked as conflicting by the merge operations. It is thus too late to handle the logical models as a whole since trivial merges may already have corrupted the logical models, and the conflicts detected where... textual conflicts that did not take models into account at all.

However, even if we were to be able to merge multiple conflicting files at once through an individual merge tool, the “git mergetool” command would still fail since it could then iterate on files that have already been merged through the logical model of another. We'll come back on this issue in the specification section. We thus need to plug ourselves before the merge tool can kick in, and at a higher level than what the merge drivers allow. The only potential candidates for such a pluggable behavior are the merge strategies themselves, since they are what's in charge of deciding what a trivial merge is, and what is not.

Customizing merge strategies

Current Possibilities

CGit

CGit offers five distinct merge strategies available by default:

  • recursive
  • octopus
  • ours
  • resolve
  • subtree

Though all five strategies have their own specific uses, all are textual and neither handles logical models. The recursive strategy is the default when merging two distinct commits, octopus being the default when merging more than two. The other three are only available, by name, through specific options of the commands. For example:

 git merge -s ours <commit1> <commit2>

or

 git cherry-pick --strategy ours <commit>

Trying to use any other strategy than the default five will end in a failure from the command line:

 $ git merge -s unknown <commit1> <commit2>
 Could not find merge strategy 'unknown'.
 Available strategies are: octopus ours recursive resolve subtree.

However, even though that is undocumented, CGit allows users to add new, customized merge strategies to the list of available ones under the following conditions:

  • the merge strategy is implemented in shell,
  • the shell script implementing the strategy is named with the convention of using the git-merge- prefix followed by the name to use for this strategy, and
  • that shell script is available either in the same folder as the git command itself, or within the current user's PATH.
JGit

JGit provides four merge strategies available by default:

  • recursive
  • resolve
  • ours
  • theirs

Once again, none of these four strategies can handle logical models since they all operate on a textual level. recursive is always the default regardless of the merge operation that is to be performed. The other three are only available through specific options of the merge-involving commands. For example:

 jgit.sh merge -s ours <commit1> <commit2>

JGit does not allow for customized merge strategies to be used by the commands. Furthermore, it does not provide most of the commands involving merge operations. Of the seven such operations we previously listed, only “merge” is provided by JGit.

Specifications

First and foremost, take note that none of this can be contributed back to either cgit or JGit since it involves too deeply-rooted and eclipse-specific changes.

CGit
Implement a custom merge strategy

Since providing customized merge strategies is already possible, what we need to do here boils down to implementing a custom merge strategy in Shell.

In order to determine whether files are part of a larger logical model while remaining compatible with previous developments and existing tools, we need these files to be part of an Eclipse workspace. As such, this strategy must be able to check whether there are eclipse projects in the repository on which it has been called, then launch a headless Eclipse with a temporary workspace containing these projects.

Any file for which there is an existing logical merger will then be handled from within this Eclipse container, while the merging strategy should be able to fall back to the default (recursive) strategy for all other files.

This cannot and will not handle octopus merges.

Implement a custom merge tool

The merge tool will be called on each file which merge failed with conflicts. Since we've used our own custom merge strategy, these conflicts will have been properly detected as either logical conflicts (merge handled by the logical model merger) or textual conflicts (for files which didn't have a model merger). The standard git mergetool command will execute the individual merge tools (specified through git attributes) sequentially on each file in conflict on the repository. When we detected conflicts on logical model, we have set all files constituting a single logical model has being in conflict; which would mean that the merge tool would be launched three times in a row on that given logical model.

We thus cannot rely on the standard git mergetool command. What we propose here is to implement a new git command that will provide the same functionality as the standard merge tool while being capable of handling logical models. This new command could be called logicalmergetool and it would thus be callable through the command git logicalmergetool. This must be implemented in Shell.

Once again, the custom merge tool will need to check if the underlying repository contains Eclipse projects, but this time it will need to launch a full-fledged Eclipse (not a simple headless application) with a temporary workspace containing these files. The user will have to manually launch EGit's merge tool (Right-click > Team > Merge Tool) on the files in conflict to solve the issues and Add (Right-click > Team > Add) the resolved file to the index from there.

Any file that is in conflict but that is not contained in an Eclipse project will be handled by the custom merge tool command through a fall back to the standard individual merge tool defined for this file by the gitattributes.

JGit
Implement the missing merge-involving commands

There are already implementations of most of these commands in JGit, though the wrappers that allow these commands to be called from the command line JGit front end are missing. We need to implement wrappers for:

  • cherry-pick
  • pull
  • rebase
  • revert
  • stash apply
  • submodule update
Implement a mean for command line users to register custom merge strategies

JGit only looks up its own registry for the merge strategies, without allowing a user to register new ones in there.

We need to either implement a new look up for JGit to search for strategies within the user's PATH or for the user to register new strategies, either through the repository's configuration or from the command line itself.

Implement a custom merge strategy

Once we have a mean to register it, we'll need to implement a new custom strategy for JGit. The requirements for this will be very similar to what we previously outlined for the CGit variant.

This strategy needs to check whether the target repository contains Eclipse projects, then launch an headless Eclipse with a temporary workspace containing these projects. From there, it will look up for the files' specific model merger and use it, or fall back to standard git merging for any file which are not part of a logical model, or which model do not provide a custom logical merger.

This cannot and will not handle octopus merge, especially so since JGit does not support them natively.

Implement a custom merge tool

JGit does not provide a mergetool command yet, though we still propose to use a distinct name than mergetool since this will not be contributed back to the project and they will most likely implement one in the future to reflect what exists in other git front ends.

This task will be very similar to the same one that could be undertaken with CGit, with the same constraints to uphold apart from the coding language, since this one can be implemented directly in Java.

Part 2: general workflow and initial prototype

When a user wants to compare or merge EMF models from a command line, he needs to do that in an Eclipse environment similar to the one he used to create these models. As such, the environment requires some plugins to be installed but it may also requires some preferences to be set, some perspective to be activated etc.. Among these plugins, there are the mandatory ones that will be use to do the compare/merge operation: EMF Compare and EGit. Several options are possible to provision such an environment.

The first one is the manual way. It is necessary to download the Eclipse environment and install all the required plugins. Then, the git repository(-ies) that contains the models have to be cloned and binded to the Eclipse environment.. All these tasks have to be done manually, on each computer that wants to execute a comparison or a merge. Finally, it is necessary to write a program that allows to launch and manage the comparison/merge from the command line interface.

The second one is the programmatic way. All the tasks done manually in the first method have to be done programmatically on this one. That means we need to find a way to allow to the user to specify what he wants to provision in an Eclipse environment. It can be a very long and fastidious development that involves a lot of various APIs. The advantage of this method is there just to execute the final program on each computer that wants to execute a comparison or a merge, there is no further manual tasks.

Eclipse Oomph is a technology that allows to provision a set of plugins in an Eclipse IDE, clone Git repositories, bind Git repositories to this IDE, checking out projects, setting workspace preferences... The configuration is model driven, with files called Oomph setup model files. As such, Oomph seems to be a good framework on top of which we could implement the compare and merge command line. We only have to call the Oomph APIs instead of call a lot and various APIs from a lot of technologies. We think the Eclipse Oomph technology is the most appropriate for this need in terms of costs, time, maintainability, reliability and performances.

New shell commands

We will initially develop new shell scripts that will add new commands to git:

  • git logicalmerge
  • git logicaldiff
  • git logicalmergetool

These scripts must be added on each computer that need to do logical git operations from command line interface, to enable them.

On linux systems, to create a new git command named logicalmerge, the script must be named git-logicalmerge.sh. Then, the scripts have to be reachable from your PATH and have execution permissions.

Basically, each command will mimic its non logical counterpart. They will take a additional mandatory parameter: an Eclipse Oomph setup model file describing the environment into which the compare/merge operation should be handled. In a first time, we will handle only a subset of standard parameters of counterpart commands.

REQ

  • REQ_GEN_000: If any logical command is run out of a git repository range (not at the root neither in a subfolder) then the software you display:
    fatal: Can't find git repository
  • REQ_GEN_010: If any logical command is run with a non existing setup file then the software should display:
    fatal: {$PATH_TO_CMD_LINE} setup file does not exist
  • REQ_GEN_020: The soltware should provide a way to specify the path to a git repository (if the command is not run inside a repository) using option --git-dir.
  • REQ_GEN_030: If the logical command is called with too many arguments then the software should display:
    fatal: Too many arguments: {$EXTRA_ARG} in: + display the usage of the command
  • REQ_GEN_040: If the logical command is called with a corrupted setup file (unable to load setup resource) the software should display:
    fatal: Corrupted set up file.
  • REQ_GEN_050: If the logical command is called with an incorrect number of arguments the software should display a message explaining the problem and displaying the usage of the command.
  • REQ_GEN_060: All logical command should provide an implementation for the --help option.

git logicalmerge

The logicalmerge command is the logical version of the git merge command. To see a full description of the git merge command, please visit http://git-scm.com/docs/git-merge.

The command is specified as below:

git logicalmerge <setup> <commit>

Assume the following history exists and the current branch is master:
	  A---B---C topic
	 /
    D---E---F---G master
Then git logicalmerge mySetupModel.setup topic will replay the changes made on the topic branch since it diverged from master (i.e., E) until its current commit (C) on top of master, and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.
	  A---B---C topic
	 /         \
    D---E---F---G---H master

You can also replace the topic branch name by his commit id: git logicalmerge mySetupModel.setup 87ad5ff

REQ

Logical merge command line signature is 'git logicalmerge <setup> <commitID> (-m "Merge message")?'

REQ_CMD_LM_010: The usage of logical merge command is:
 logicalmerge <setup> <commit> [--help (-h)] [-m message]   
 				  
 				  <setup>     : Path to the setup file. The setup file is a Oomph model   
 				  <commit>    : Commit ID or branch name to merge   
 				  --help (-h) : Dispays help for this command   
 				  -m message  :  Set the commit message to be used for the merge commit (in case   
 				                one is created).

REQ_CMD_LM_020: If a merge message has been provided then it should be used for the merge commit.

REQ_CMD_LM_030: If no message has been provided to the command line then an automatic message should be generated.

REQ_GEN_040: If the logical merge command is called with incorrect commitID the software should diplay:
	fatal: {$COMMIT_ID} - not valid reference.
REQ_CMD_LM_050: If no merge has been done because it's already up to date then software should display:
	Already up-to-date.
REQ_CMD_LM_060: If the merge has no conflicts and terminated correctly then software should display:
	Merge made by the '{$UsedStrategy}' strategy.
		{$ListOfMergedFile}"
REQ_CMD_LM_070: Here is the description of resulting message depending of the merge result:
	case NORMAL_MERGE:
		Merge made by {$STRATEGY} strategy.
	case ALREADY_UP_TO_DATE:
		Already up-to-date
	case CONFLICTING FILES:
		IF AUTOMERGING FAIL:
			FOR ALL CONFLICTIN FILE:
				Auto-merging failed in  {$CONFILCTING_FILE}
			Automatic merge failed; fix conflicts and then commit the result.
		IF AUTOMERGING SUCCEED
			Auto-merging {$CONFILCTING_FILE}
			Merge made by the '{$STRATEGY}' strategy.
	Case FAST_FORWARD:
		"Updating {$OLD_HEAD}..{$NEW_HEAD}"
		Fast-forward
	Case DIRTY_WORK_TREE:
		error: Your local changes to the following files would be overwritten by merge:
			{$LIST_OF_DIRTY_FILE}
		Please, commit your changes or stash them before you can merge.
		Aborting
REQ_CMD_LM_080: Here is the list of return value that the software should return:
	IF "Merge succeed and complete":
		return 0
	IF "Merge did not success. An action is required from user. For example a manual merge. Still this return does not mean an error":
		return 1
	IF "Error":
		return 128

git logicaldiff

The logicaldiff command is the logical version of the git diff command. To see a full description of the git diff command, please visit http://git-scm.com/docs/git-diff.

The command is specified as below: git logicaldiff <setup> <commit> [<commit>] [-- <path>]

To see the changes between a revision and the HEAD revision, you should omit the second commit.

git logicaldiff <setup> <commit> [--] [<path>...]

In all cases, [– <path>] option allows to filter the diff command only on files that match the <path>.

In all cases, <commit> can refers to a branch name or a commit id.

REQ
REQ_CMD_LD_000 : The usage of the logical diff is:
 	logicaldiff <setup> <commit> [<compareWithCommit>] [-- <path...>] [--help (-h)]
 				  
 		  <setup>             : Path to the setup file. The setup file is a Oomph model  
 		  <commit>            : Commit ID or branch name  
 		  <compareWithCommit> : Commit ID or branch name. This is to view the changes
 				                         between <commit> and <compareWithCommit> or HEAD if not specified.  
 		  -- <path...>        : This is used to limit the diff to the named paths (you  
 			                        can give directory names and get diff for all files  
 			                        under them). 
 		  --help (-h)         : Dispays help for this command

REQ_CMD_LD_010 : If there is no differences between the different commit the console should not display anything.

REQ_CMD_LD_020: If the logical merge command is called with incorrect commitID the software should diplay:
	fatal: {$COMMIT_ID} - not valid reference.

git logicalmergetool

The logicalmergetool command is the logical version of the git mergetool command. To see a full description of the git mergetool command, please visit http://git-scm.com/docs/git-mergetool. Here is the constructions allowing for the git logicalmergetool:

git logicalmergetool <setup>

Run logical merge conflict resolution tools to resolve logical merge conflicts. In our case, it means run Eclipse and call the EGit merge tool on file(s) in conflict(s).

REQ

REQ_CMD_LMT_000: The usage of the logical merge tool command is:

logicalmergetool <setup> [--help (-h)]
				  
				  <setup>     : Path to the setup file. The setup file is a Oomph model   
				  --help (-h) : Dispays help for this command

git logicalcherry-pick

The logicalcherry-pick command is the logical version of the git cherry pick command. To see a full description of the git cherry-pick command, please visit [1].

The command is specified as below:

git logicalcherry-pick <setup> <commit>...

Assume the following history exists and the current branch is master:
	  A---B---C topic
	 /
    D---E---F---G master,HEAD
Then git logicalcherry-pick mySetupModel.setup A B C will pick A,B and C commits on top of the current HEAD (in the order defined by the command).
	  A---B---C topic
	 /
    D---E---F---G---A---B---C master,HEAD

In all cases, <commit> can refers to a branch name or a commit id.

REQ

Logical cherry-pick command line signature is git logicalcherry-pick <setup> <commit>...

REQ_CMD_LCP_010: The usage of logical cherry-pick command is
logicalcherry-pick <setup> [<commit> ...<nowiki>]</nowiki> [--abort] [--continue] [--debug (-d)] [--git-dir gitFolderPath] [--help (-h)] [--quit] [--show-stack-trace]

 <setup>                 : Path to the setup file. The setup file is a Oomph
                           model.
 <commit>                : Commit IDs to cherry pick.
 --abort                 : Use this option to abort a in going cherry-pick
 --continue              : Use this option to continue a in going cherry-pick
 --debug (-d)            : Launched the provisionned eclipse in debug mode.
 --git-dir gitFolderPath : Path to the .git folder of your repository.
 --help (-h)             : Dispays help for this command.
 --quit                  : Use this option to quit a in going cherry-pick
 --show-stack-trace      : Use this option to display java stack trace in
                           console on error.

REQ_CMD_LCP_020: If no commit ids have been provided, the software should display the usage of the command.

REQ_CMD_LCP_030: If the logical cherry-pick command is called with incorrect commit ids the software should diplay:
	fatal: bad revision '{$COMMIT_ID}'
REQ_CMD_LCP_040: If the cherry pick has no conflicts and terminated correctly then software should display:
	FOR ALL SUCCESSFUL CHERRY PICKED REVISIONS:
		Applied: [{$NewRevisionID}] {$ShortRevisionMessage}
REQ_CMD_LCP_050: If the cherry pick ends up on a conflict state, the software should display:
	error: could not apply [{$ShortRevisionId}]... {$ShortRevisionMessage}
	hint: to resolve the conflict use git logicalmergetool command
	hint: after resolving the conflicts, mark the corrected paths
	hint: by adding them to the index (Team > Add to index) or
	hint: by removing them from then index (Team > Remove from index).
	hint: Then commit the result and close the mergetool.
REQ_CMD_LCP_060: Here is the list of return values that the software should return:
	IF "Cherry-pick succeed and complete":
		return 0
	IF "Cherry-pick is not complete. An action is required from user. For example the resolution of a conflict":
		return 1
	IF "Error":
		return 128

Workflow

Each shell script will wrapper of an Eclipse standalone application (provided by the EMF Compare project). This standalone application will itself call some Oomph API.

First, Oomph will provision an Eclipse with all appropriate plugins to launch the logical git operation. These plugins are EGit, EMF Compare and their dependencies. If the Oomph setup model provided as parameter contains other plugins (represented by the name of the repository and the name of the plugin/feature), they will be provisioned too.

For a given Oomph setup model file, the provisioning operation is executed only once. Indeed, if you launch again a git logical operation with the same Oomph setup model file, then the already provisioned Eclipse IDE corresponding to the setup model will be reused. It avoids to execute this potentially costly task each time.

In order to retrieve the Eclipse associated to a given Oomph setup model file, we will store all provisioned Eclipses in the temporary folder of the system. We will use a hash function on the Oomph setup model file to generate/retrieve a unique id. This unique id will be the name of the folder containing the provisioned Eclipse.

Then, in this provisioned Eclipse, the list of tasks contained in the Oomph setup model will be executed.

This Oomph setup model will contain, at least:

  • The path where the workspace will be created.
  • The git repository(-ies) to clone/bind with the Eclipse IDE.
  • The project(s) (represented by his path on the computer) to import in the workspace associated with the Eclipse IDE.

Once all Oomph tasks executed, EMF Compare will call the logical git operation with the others parameters provided in the command line interface. Once the git logical operation has been executed, the user can see the results in his command line tool.

If the result shows conflict(s) on involved model(s), the user will call the git logicalmergetool command. This command will launch a full-fledged Eclipse IDE (not a simple headless application) with a workspace containing these files. This full-fledged Eclipse IDE is the same as the one provisioned previously by Oomph. The user will have to manually launch EGit's merge tool on the files in conflict to solve the issues, and then manually close the Eclipse to properly finish the process.

As an axis of evolution, in case of conflict(s), when the full-fledged Eclipse IDE has been launched, the EGit's merge tool could be automatically launched on file(s) in conflict(s).

Here is a schema representing the workflow of the process for the logical merge command (the workflow is nearly the same for the logical diff):

(Gray steps done by the user, blue steps done automatically)

EMFCompare GitLogicalMerge Workflow.png

An initial prototype of such a workflow is available on the EMF Compare's gerrit: https://git.eclipse.org/r/#/c/29889/

Detailed steps

Step 1: git logicalmerge

see git logicalmerge for more details

Step 2: provision an Eclipse IDE

After the step 1, the arguments passed with the command have been validated. Then, the step 2 will have to provision an Eclipse environment. The variables elements in this step are the installation path, the workspace path, and the additionals plugins to install in the environment. All these variables elements can be found in the setup model file.

EMFCompare GitLogicalMerge Workflow Step02.png
  • If the installation path is not defined in the user setup model, then a temporary folder (located in the temp folder of the system) will be used. If the temporary folder already contains an Eclipse environment, this environment has to be already provisioned by the logical merge command line tool. This ensure to have an eclipse environment that contains the required plugins (EGit, EMF, EMF Compare, Oomph).
  • If the workspace path is not defined in the user setup model, then a temporary folder will be used.
  • If the installation path found in the user setup model already contains an Eclipse environment, then no further plugins installation will be done, except for Oomph. Why ? Because in the step 4, the tasks executed require Oomph.
  • If the installation path found in the user setup model doesn't contains an Eclipse environment, then an environment will be provisioned with :
    • the Eclipse Luna release
    • EMF 2.10 release
    • EMF Compare 3.0 release
    • EGit 3.4.0 release
    • Oomph x.y release (to be determined)

As an axis of evolution, profiles could be added as arguments of command line. A profile would be a specific release of Eclipse (Luna, Kepler, Mars, ...) with the appropriate plugins versions. As an axis of evolution, a user could provide his own environment (orange boxes in picture above). In this case, he will have the possibility to continue with this environment, and then, in this case, Oomph will be installed in this environment if needed.

Step 3: launch Eclipse as an headless application

After the step 2, we have a valid Eclipse environment and a clean workspace. Then, the step 3 will have to launch this Eclipse environment as an headless application.

The headless application launched will be one of the three existing: logicalmerge, logicaldiff or logicalmergetool, according to the command line typed by the user.

Step 4: execute tasks from setup model

After the step 3, the Eclipse is launched as an headless application. Then, the step 4 will have to execute some Oomph tasks. The variables elements in this step are the list of projects to import in the workspace. All these variables elements can be found in the setup model file.

All the projects (represented by their path) found in the user setup model file will be imported in the workspace.

If there is no project in the user setup model file, then all projects found in the git repository will be imported in the workspace.

Step 5: call the logical merge operation

After the step 4, the projects have been imported in the workspace. Then, the EGit merge operation can be called with the arguments passed from the command line. These arguments are the <commit> (id or branch), and eventually a message (with the -m option).

Step 6: git logicalmergetool

see git logicalmergetool for more details

Step 7: launch Eclipse (with GUI)

After the step 6, the merge command has return conflict(s). Then, the step 3 will have to launch the same Eclipse environment than the step 3 but this time with the GUI.

Step 8: call the EGit merge tool on file(s) in conflict(s)

After the step 7, the Eclipse is launched with a GUI. EGit is installed in this environment, so the user will have the possibility to call the Merge Tool on file(s) in conflict(s).

Step 9: resolve conflict(s) manually

In case of a conflict on model, the Merge Tool will launched EMF Compare. In other cases, the standard Merge Tool will be launched.

Step 10: close Eclipse manually

Once all conflict(s) have been resolved, the user have to close the Eclipse to end the process.

Step 11: end of process