Difference between revisions of "EclipseLink/Development/Process/Git"
|Line 147:||Line 147:|
==== How do patches in Git differ from in SVN? ====
==== How do patches in Git differ from in SVN? ====
Revision as of 13:37, 22 March 2012
EclipseLink Development in Git
This page is for the Git usage portion of the dev process. It does not discuss issues with the build in Git, for more information on that please see: wiki.eclipse.org/EclipseLink/Build/Git .
This page is a work in progress, posing the questions that need to be answered. If you feel that you have more questions, please post them, if you can answer a question, please do.
EclipseLink Git FAQ
Brief Overview and History of Git
Git is a Distributed Version Control System (DVCS), which means there is no single point-of-failure and one can do useful work without a server. The ability to work while disconnected is very useful if a server is down or if the network connection to a server is unreliable, slow or firewalled.
The distributed nature of Git means that the source code is inherently backed-up across all the various 'clones' that may exist 'out there.' In addition, Git supports types of work-flows that are different from those supported by Subversion; these work-flows, while unfamiliar, are quite powerful and can 'overlap' - one developer may prefer a 'golden repository' work-flow while another likes 'trusted-lieutenants'; both can be supported simultaneously by the same repository.
Git was created by Linus Torvalds in 2005 to handle the source control requirements of the Linux kernel project. Linus previously used a for-pay DVCS called BitKeeper and grew to like the 'trusted -lieutenants' work-flow; however, the special license grant that let him use BitKeeper for free for kernel development changed. Shortly thereafter, he created Git. By 2008, other major open-source projects (Ruby-on-Rails, Android, etc.) moved to it as well.
How do I get started?
As shown in the picture above, graphical clients are typically not shipped with the 'core' Git distribution. There are a number of use-cases that only make sense (or only work!) from the command-line.
The 'core' Git distribution can be downloaded from the Git download site (http://git-scm.com/download) - there are links for a variety of operating systems (Linux, Mac OS X, Windows). Once you have the Git tools, you can always get the latest_&_greatest version of Git directly from its own repository:
prompt > git clone git://github.com/gitster/git.git
If you have problems connecting (Git uses port 9418), you can try to access the repository over the HTTP protocol
(typically most Git server administrators set up HTTP access as read-only):
prompt > git clone http://github.com/gitster/git.git
The central Git web-site holds the documentation for the 'core' Git distribution. In addition, there are also links to docs written by others. I would like to highlight one particular resource as very useful
"Pro Git - professional version control" by Scott Chacon, CIO of GitHub.
The unfortunate truth is that Git - both its 'core' footprint as well as Windows-specific GUI clients like TortoiseGit - is a second-class citizen on Windows. Much of the 'plumbing' (see picture above) was originally written as shell-scripts. Even though most of Git is now written in 'C', the basic 'world-view' is that directories use the forward-slash '/' separator, files can be mixed-case and symbolic links are used to implement a number of useful Git features (sub-modules, multiple-branch view working directories, etc.) Because of this, Windows versions of Git tools are marked as 'beta/preview' (and probably always will be).
For those using Windows (XP or 7), I recommend using TortoiseGit
- pre-requisite: install Windows command-line msysgit Git tools first "Full installer for official Git for Windows"
- Install Wizard (accept most defaults)
- Command Line: Use Git Bash Only
- Choosing the SSH executable: Use TortoisePLink (comes from Putty, integrates well with Windows)
- Tortoise Git Install Wizard
- asks about SSH: use same answer as above
Getting started - first step: Git Committer Identity
When a commit is made in Git, the commit has metadata identifying two things:
- the author (name and email): who created the change, and
- the committer (name and email): who committed the change to the repository
(of course for many commits the author IS the committer so only the author is identified)
The Eclipse Foundation uses these fields as part of its IP process - only committers to a project can change source stored on a Foundation's server. However, a committer may make changes on behalf of others - this enables collaboration with parties that have not gone through the Eclipse IP due diligence process. This especially is useful if say the third-party just wanted to contribute a few one-of patches: the administrative overhead of the Eclipse IP due diligence process would likely 'scare-off' most contributions (for more information, please see Handling Git Contributions)
First setup your ~/.gitconfig file:
[user] # email address linked to my EclipseLink committer id minorman email = firstname.lastname@example.org name = Mike Norman
NB. Windows often has difficulty with 'dot-files' in its 'home' directory - you may have to create this file from the command-line. In addition, if your 'home' directory is on a UNC fileshare directory, the msysgit Windows-version of git tools may not be able to read or write it. I solved this issue by redefining two Windows environment variables (HOME and HOMEDRIVE):
Getting started - second step: SSH identity
As mentioned above, most Git servers are set up so that HTTP access is read-only - in order to be able to commit, one must connect to the Eclipse Foundation Git servers over SSH:
prompt > git clone ssh://email@example.com/gitroot/eclipselink/oracleddlparser.git
Creating an SSH identity
You need to generate an SSH public/private key-pair from the command prompt:
prompt > ssh-keygen -t rsa -C "firstname.lastname@example.org" -f committerid Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in commiterid Your public key has been saved in commiterid.pub. prompt > ls committerid* commiterid commiterid.pub
The file ending in .pub is the public portion of the key-pair. Send an email message asking 'email@example.com' to place it in the appropriate place in your home directory on the Eclipse Foundation's server. The private portion must be moved to your local home directory ~/.ssh/committer.ppk. Now set permissions on your private key:
prompt > chmod 700 ~/.ssh prompt > chmod 600 ~/.ssh/committer.ppk
Creating an SSH identify on Windows
- run PUTTYGEN.EXE - make sure that SSH-2 RSA/1024 is selected
- In the 'Key comment' field, replace the entry starting with 'rsa-key ...' with firstname.lastname@example.org
- Press 'Generate'
- Save the private key to %HOMEDIR%\.ssh\committer.ppk
- Save the public key to %HOMEDIR%\.ssh\committer_puttygen.pub
- Save the text from the box "Public key for pasting into OpenSSH authorized_keys file' into committer.pub
The last step is necessary because PUTTYGEN has a custom format for the public portion of the key-pair that will not work when uploaded to the Eclipse Foundation's server. You must save the text from the box 'for pasting' - the key is in the same format as generated by ssh-keygen
Getting through a firewall
In your home ~/.ssh/ directory, you must create a config file
prompt > cd ~/.ssh prompt > touch config prompt > chmod 600 config
The documentation for the ~/.ssh/config file can be found here. The particular features to focus upon are:
- the ability to specify a particular identity file, and
- the ability to specify a command to be run whenever we attempt to connect to a specific host:
Host git.eclipse.org Hostname git.eclipse.org User committer IdentityFile ~/.ssh/committer.ppk ProxyCommand /c/windows/connect.exe -H firewall_host:firewall_port %h %p or ProxyCommand /usr/local/bin/corkscrew firewall_host firewall_port %h %p
How does Git differ from SVN?
Beyond the obvious distributed vs. central repository concept, one of the great strengths of Git is its on-disk representation of a repository. When you 'checkout' from SVN, you get a working-directory tree of files and folders that represent the 'tip' of that particular SVN branch. If you need to do some operation, say examine the commit history for some file, you must go back to the server to get the metadata for those commits. When you 'clone' a Git repository you (also) get a working-directory tree of files and folders but also the complete commit history for the entire repository. Operations to compare commits or revert the working-directory to a previous state all take place a the speed of the local filesystem without any network round-trips. Further, the history for the repository is stored in an efficient 'packed' representation. For example, the current EclipseLink SVN 'trunk' tree takes ~350Mb of diskspace. An experimental Git repository of 'trunk' takes - for all ~10K commits - 3Gb, small enough to fit on a USB key.
A Tale of Three Trees
- working directory: regular directory where work is done
- staging area: 'special' directory/cache with potential commits
- repository: committed data
How is merging different between Git and SVN
One fundamental difference between Git and SVN is branching - a branch in SVN is a very expensive artifact while in Git it is extremely lightweight - the additional cost to a local repository clone or a remote server is negligible. This leads to workflows where one creates a new branch for each feature or each refactoring or even each bug. When you are done, it is easy to merge changes from a branch into 'master' (like SVN 'trunk'). Merging can be done either manually or through tools (e.g. Beyond Compare) that can be integrated into the Git merge process.
prompt > git clone git://somewhere.com/proj.git
prompt > svn checkout: http://somewhere.com/repository/proj/trunk