EGit/User Guide/2. Concepts
Git is built on a few simple and very powerful ideas. Knowing them helps to understand more easily how git works.
The Repository or Object Database stores all objects which make up the history of the project. All objects in this database are identified through a secure 20 byte SHA-1 hash of the object content. This has several advantages:
- comparing two objects boils down to comparing two SHA-1 hashes
- since object names are computed from the object content in the same way in every git repository the same object will be stored under the same name in all repositories which happen to contain this object
- repository corruption can easily be detected by checking if the SHA-1 object name still is the secure hash of the object's content
Git has four object types :
- A Blob object stores file content
- A Tree object stores the directory structure and contains Blob objects and other Tree objects together with their file system names and modes
- A Commit object represents a snapshot of the directory structure at the time of the commit and has links to its predecessor and successor Commit objects which form an acyclic graph of the repository revisions forming the repository history.
- A Tag object is a symbolic named link to another repository object which contains the object name and type of the referenced object and optionally information about the one who created the tag and signing information.
The object database is stored in the
.git/objects directory. Objects are either stored as loose objects or in a pack format efficiently packing many objects into a single file to enable efficient storage and transport of objects.
Git provides a built-in trust chain through secure SHA-1 hashes which allow to verify if objects obtained from a (potentially untrusted) source are correct and have not been modified since they have been created.
If you get the signed tag for e.g. a project release which you can verify with e.g. the tagger's (e.g. the project lead's) public signing key git ensures that the chain of trust covers the following:
- the signed tag identifies a commit object
- the commit object represents exactly one project revision including its content and history
- the commit object contains the tree of blob objects and other tree objects representing the directory structure of the project revision
- the blob objects contain the file contents for this project revision
All the involved object names can be checked for consistency using the SHA-1 algorithm to ensure the correctness of the project revision and this way ensure that the entire history can be trusted.
The Git Index is a binary file stored in the
.git/index directory containing a sorted list of file names, file modes, file meta data used to efficiently detect file modifications and the SHA-1 object name of blob objects.
It has the following important properties:
- The index contains all information necessary to generate a single uniquely defined tree object. E.g. a commit operation generates this tree, stores it in the object database and associates it with the commit.
- The index enables fast comparison of the tree it defines with the current working directory. This is achieved by storing additional meta data about the involved files in the index data.
- The index can efficiently store information about merge conflicts between the trees involved in the merge so that for each pathname there is enough information about the involved trees to enable a three-way merge.
The working directory is the directory used to modify files for the next commit. By default it is located one level above the .git directory. Making a new commit involves typically the following steps :
- Checkout the branch the new commit shall be based on, this changes the working directory so that it reflects the HEAD revision of the branch.
- Do modifications in the working directory
- Tell git about these modifications (add modified files). This transfers the modified file contents into the object database and prepares the tree to be committed in the index.
- Commit the tree prepared in the index into the object database.
- The result is a new commit object and the HEAD of the current branch moves to the new commit.
Recording Changes in the Repository
You start from a fresh checkout of a branch of a local repository. You want to do some changes and record snapshots of these changes in the repository whenever you reach a state of your changes you want to record.
Each file in the working directory can either be tracked or untracked.
- Tracked files are those which were in the last snapshot or files which have been newly staged into the index. They can be unmodified, modified, or staged.
- Untracked files are all other files which were not in the last snapshot and have not been added to the index.
When you first clone a repository all files in the working directory will be tracked and unmodified since they have been freshly checked out and you didn't start editing them yet.
As you edit files git will recognize they are modified since you have modified them since the last commit. You stage the modified files into the index and then commit the staged changes and the cycle repeats.
This lifecycle is illustrated here