Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "JGit/User Guide"

(Changes in JGit 0.9)
Line 210: Line 210:
 
= Advanced Topics =
 
= Advanced Topics =
  
See the section on [[JGit/User_Guide/Advanced_Topics| advanced topics]].
+
== Reducing memory usage with RevWalk ==
 +
 
 +
The revision walk interface and the RevWalk and RevCommit
 +
classes are designed to be light-weight. However, when used
 +
with any repository of considerable size they may still
 +
require a lot of memory. This section provides hints on what
 +
you can do to reduce memory when walking the revision graph.
 +
 
 +
=== Restrict the walked revision graph ===
 +
 
 +
Try to walk only the amount of the graph you
 +
actually need to walk.  That is, if you are looking for the commits in
 +
refs/heads/master not yet in refs/remotes/origin/master, make sure you
 +
markStart() for refs/heads/master and markUninteresting()
 +
refs/remotes/origin/master.  The RevWalk traversal will only parse the
 +
commits necessary for it to answer you, and will try to avoid looking
 +
back further in history.  That reduces the size of the internal object
 +
map, and thus reduces overall memory usage.
 +
 
 +
<source lang="java">
 +
RevWalk walk = new RevWalk(repository);
 +
ObjectId from = repository.resolve("refs/heads/master");
 +
ObjectId to = repository.resolve("refs/remotes/origin/master");
 +
 
 +
walk.markStart(walk.parseCommit(from));
 +
walk.markUninteresting(walk.parseCommit(to));
 +
 
 +
// ...
 +
</source>
 +
 
 +
=== Discard the body of a commit ===
 +
 
 +
There is a setRetainBody(false) method you can use to discard the body
 +
of a commit if you don't need the author, committer or message
 +
information during the traversal.  Examples of when you don't need
 +
this data is when you are only using the RevWalk to compute the merge
 +
base between branches, or to perform a task you would have used `git
 +
rev-list` with its default formatting for.
 +
 
 +
<source lang="java">
 +
RevWalk walk = new RevWalk(repository);
 +
walk.setRetainBody(false);
 +
// ...
 +
</source>
 +
 
 +
If you do need the body, consider extracting the data you need and
 +
then calling dispose() on the RevCommit, assuming you only need the
 +
data once and can then discard it. If you need to hang onto the data,
 +
you may find that JGit's internal representation uses less overall
 +
memory than if you held onto it yourself, especially if you want the
 +
full message. This is because JGit uses a byte[] internally to store the
 +
message in UTF-8. Java String storage would be bigger using UTF-16,
 +
assuming the message is mostly US-ASCII data.
 +
 
 +
<source lang="java">
 +
RevWalk walk = new RevWalk(repository);
 +
// more setup
 +
Set<String> authorEmails = new HashSet<String>();
 +
 
 +
for (RevCommit commit : walk) {
 +
// extract the commit fields you need, for example:
 +
authorEmails.add(commit.getAuthorIdent().getEmailAddress());
 +
 
 +
commit.dispose();
 +
}
 +
</source>
 +
 
 +
=== Subclassing RevWalk and RevCommit ===
 +
 
 +
If you need to attach additional data to a commit, consider
 +
subclassing both RevWalk and RevCommit, and using the createCommit()
 +
method in RevWalk to consruct an instance of your RevCommit subclass.
 +
Put the additional data as fields in your RevCommit subclass, so that
 +
you don't need to use an auxiliary HashMap to translate from RevCommit
 +
or ObjectId to your additional data fields.
 +
 
 +
<source lang="java">
 +
public class ReviewedRevision extends RevCommit {
 +
 
 +
private final Date reviewDate;
 +
 
 +
private ReviewedRevision(AnyObjectId id, Date reviewDate) {
 +
super(id);
 +
this.reviewDate = reviewDate;
 +
}
 +
 
 +
public List<String> getReviewedBy() {
 +
return getFooterLines("Reviewed-by");
 +
}
 +
 
 +
public Date getReviewDate() {
 +
return reviewDate;
 +
}
 +
 
 +
public static class Walk extends RevWalk {
 +
 
 +
public Walk(Repository repo) {
 +
super(repo);
 +
}
 +
 
 +
@Override
 +
protected RevCommit createCommit(AnyObjectId id) {
 +
return new ReviewedRevision(id, getReviewDate(id));
 +
}
 +
 
 +
private Date getReviewDate(AnyObjectId id) {
 +
// ...
 +
}
 +
 
 +
}
 +
}
 +
</source>
 +
 
 +
=== Cleaning up after a revision walk ===
 +
 
 +
A RevWalk cannot shrink its internal object map.  If you have just
 +
done a huge traversal of say all history of the repository, that will
 +
load everything into the object map, and it cannot be released.  If
 +
you don't need this data in the near future, it may be a good idea to
 +
throw away the RevWalk and allocate a new one for your next traversal.
 +
That will let the GC reclaim everything and make it available for
 +
another use.  On the other hand, reusing an existing object map is
 +
much faster than building a new one from scratch.  So you need to
 +
balance the reclaiming of memory against the user's desire to perform
 +
fast updates of an existing repository view.
 +
 
 +
<source lang="java">
 +
RevWalk walk = new RevWalk(repository);
 +
// ...
 +
for (RevCommit commit : walk) {
 +
// ...
 +
}
 +
walk.dispose();
 +
</source>

Revision as of 11:34, 13 September 2010

Porcelain API

While JGit contains a lot of low level code to work with Git repositories, it also contains a higher level API that mimics some of the Git porcelain commands in the org.eclipse.jgit.api package.

Most users of JGit should start here.

AddCommand (git-add)

AddCommand allows you to add files to the index and has options available via its setter methods.

  • addFilepattern()

Here's a quick example of how to add a set of files to the index using the porcelain API.

Git git = new Git(db);
AddCommand add = git.add();
add.addFilepattern("someDirectory").call();

CommitCommand (git-commit)

CommitCommand allows you to perform commits and has options available via its setter methods.

  • setAuthor()
  • setCommitter()
  • setAll()

Here's a quick example of how to commit using the porcelain API.

Git git = new Git(db);
CommitCommand commit = git.commit();
commit.setMessage("initial commit").call();

TagCommand (git-tag)

TagCommand supports a variety of tagging options through its setter methods.

  • setName()
  • setMessage()
  • setTagger()
  • setObjectId()
  • setForceUpdate()
  • setSigned() - not supported yet, will throw exception

Here's a quick example of how to tag a commit using the porcelain API.

Git git = new Git(db);
RevCommit commit = git.commit().setMessage("initial commit").call();
RevTag tag = git.tag().setName("tag").call();

LogCommand (git-log)

LogCommand allows you to easily walk a commit graph.

  • add(AnyObjectId start)
  • addRange(AnyObjectId since, AnyObjectId until)

Here's a quick example of how get some log messages.

Git git = new Git(db);
LogCommand log = git.log().call();

MergeCommand (git-merge)

TODO

API

Repository

A Repository holds all objects and refs used for managing source code.

To build a repository, you invoke flavors of RepositoryBuilder.

FileRepositoryBuilder builder = new RepositoryBuilder();
Repository repository = builder.setGitDir("/my/git/directory")
.readEnvironment() // scan environment GIT_* variables
.findGitDir() // scan up the file system tree
.build();

Git Objects

All objects are represented by a SHA-1 id in the Git object model. In JGit, this is represented by the AnyObjectId and ObjectId classes.

There are four types of objects in the Git object model:

  • blob
    • is used to store file data
  • tree
    • can be thought of as a directory; it references other trees and blobs
  • commit
    • a commit points to a single tree
  • tag
    • marks a commit as special; generally used to mark specific releases

To resolve an object from a repository, simply pass in the right revision string.

ObjectId head = repository.resolve("HEAD");

Ref

A ref is a variable that holds a single object identifier. The object identifier can be any valid Git object (blob, tree, commit, tag).

For example, to query for the reference to head, you can simply call

Ref HEAD = repository.getRef("refs/heads/master");

RevWalk

A RevWalk walks a commit graph and produces the matching commits in order.

RevWalk walk = new RevWalk(repository);

TODO talk about filters

RevCommit

A RevCommit represents a commit in the Git object model.

To parse a commit, simply use a RevWalk instance:

RevWalk walk = new RevWalk(repository);
RevCommit commit = walk.parseCommit(objectIdOfCommit);

RevTag

A RevTag represents a tag in the Git object model.

To parse a tag, simply use a RevWalk instance:

RevWalk walk = new RevWalk(repository);
RevTag tag = walk.parseTag(objectIdOfTag);

RevTree

A RevTree represents a tree in the Git object model.

To parse a commit, simply use a RevWalk instance:

RevWalk walk = new RevWalk(repository);
RevTree tree = walk.parseTree(objectIdOfTree);

API Changes

JGit is still in incubation hence we sometimes do incompatible API changes to reach a better stable API.


These are listed here as a reference for other projects depending on the JGit API.

Changes in JGit 0.9

RepositoryConfig removed: Use Config or FileBasedConfig instead. To replace getCore(), getTransfer() and getUserConfig() methods, use config.get(CoreConfig.KEY), config.get(TransferConfig.KEY), and config.get(UserConfig.KEY). To replace getAuthorName() and friends, use the UserConfig object returned by config.get(UserConfig.KEY).

Repository class is abstract: Use a RepositoryBuilder.

Repository constructors removed: To create a Repository instance, use a RepositoryBuilder. If you know it must be a classical local file system based Repository (as opposed to other types that JGit will support in the future), you can use the FileRepositoryBuilder instead to ensure its a FileRepository that is returned.

Repository.getDirectory() can return null: It is no longer a requirement that every Repository instance has a java.io.File associated with it. In the future some types of Git repositories that are not on the local filesystem will be supported, and those types will return null.

Repository.getWorkDir() renamed: The method is now called getWorkTree().

Repository.openObject(), openBlob(), renamed: To read an object, use Repository.open() or repository.newObjectReader() to get a reader and use the reader's open() method. An ObjectReader is preferred if the application will access several objects in a short time span (e.g. in response to the current UI event, or the current network connection).

Repository.hasObject() renamed: To check if an object exists, use has(AnyObjectId) instead.

Repository.open() checks existence, type: The open object methods on Repository, ObjectDatabase and ObjectReader now check that an object exists, and if not throws ObjectNotFoundException. If the type hint is supplied, they also validate that the object is actually of the type hint, or throw IncorrectObjectTypeException. This simplifies most application code, as null is no longer a valid return value.

Repository event listener changes: Repository events are now delivered through a completely different API. Each event type has a corresponding Listener interface to receive that event, and listeners must be registered through the Repository.getListenerList().add*Listener().

RepositoryState value BARE added: To correctly denote a bare repository whose work tree state is undefined, the enum RepositoryState returned by repository.getRepositoryState() returns BARE when isBare() is true or getDirectory() returns null.

WindowCursor removed: Instead use repository.newObjectReader(), and examine objects through the methods on the returned ObjectReader. Please note that an ObjectReader must be released with its release() method after it is no longer useful to the application.

RevWalk requires release: RevWalk now embeds an ObjectReader, and therefore must be released through its release() method when it is no longer required by the application that created it. Optionally the caller can now specify the ObjectReader the walker should use, allowing the caller to more explicitly manage the release.

TreeWalk requires release: TreeWalk now embeds an ObjectReader, and therefore must be released through its release() method when it is no longer required by the application that created it. Optionally the caller can now specify the ObjectReader the walker should use, allowing the caller to more explicitly manage the release.

ObjectWriter deprecated: ObjectWriter will be removed in a future version of JGit. Applications are strongly encouraged to switch to the ObjectInserter API, which can be obtained from repository.newObjectInserter(). Like the ObjectReader, an ObjectInserter must be released through its release() method after use.

NoWorkTreeException thrown: A bare repository (one without a working directory) will throw NoWorkTreeException if its getIndexFile(), getIndex(), getWorkTree(), readCommitMessage(), or readMergeMessage() is called on it, or if its corresponding DirCache is read or locked. This is a RuntimeException so applications need to be careful about knowing what the return value of repository.isBare() is for any given repository they operate on.

DirCache read(), lock() moved: The methods were moved to Repository, to better permit a specific repository implementation to manage how their DirCache should be accessed.

ObjectLoader getCachedBytes(), getBytes() can throw LargeObjectException: If an object is bigger than core.streamFileThreshold, it cannot be accessed as a contiguous array of bytes. LargeObjectException is thrown from the accessors, and applications must use openStream() or copyTo() on the ObjectLoader to obtain the object's contents. By default core.streamFileThreshold is 1 MiB, but is capped at no more than 1/4 of the JVM maximum heap size.

RevObject.equals() by identity check replaced by AnyObjectId.equals() by value comparison: equals() was not consistent across AnyObjectId's class hierarchy which defines value object semantics by using instanceof type check and value comparison while RevObject which is a subclass of AnyObjectId overrode equals() with identity check semantics. This broke the symmetry and transititivity properties of the equals contract defined in javadoc for Object.equals(). To fix that RevObject.equals() was backed out and AnyObjectId.equals() made final. Applications that were depending on reference equality of RevObjects should now use == and not .equals().

MyersDiff(SequenceComparator<S> cmp, S a, S b) constructor changed: comparison function is now external to MyersDiff,

Advanced Topics

Reducing memory usage with RevWalk

The revision walk interface and the RevWalk and RevCommit classes are designed to be light-weight. However, when used with any repository of considerable size they may still require a lot of memory. This section provides hints on what you can do to reduce memory when walking the revision graph.

Restrict the walked revision graph

Try to walk only the amount of the graph you actually need to walk. That is, if you are looking for the commits in refs/heads/master not yet in refs/remotes/origin/master, make sure you markStart() for refs/heads/master and markUninteresting() refs/remotes/origin/master. The RevWalk traversal will only parse the commits necessary for it to answer you, and will try to avoid looking back further in history. That reduces the size of the internal object map, and thus reduces overall memory usage.

RevWalk walk = new RevWalk(repository);
ObjectId from = repository.resolve("refs/heads/master");
ObjectId to = repository.resolve("refs/remotes/origin/master");
 
walk.markStart(walk.parseCommit(from));
walk.markUninteresting(walk.parseCommit(to));
 
// ...

Discard the body of a commit

There is a setRetainBody(false) method you can use to discard the body of a commit if you don't need the author, committer or message information during the traversal. Examples of when you don't need this data is when you are only using the RevWalk to compute the merge base between branches, or to perform a task you would have used `git rev-list` with its default formatting for.

RevWalk walk = new RevWalk(repository);
walk.setRetainBody(false);
// ...

If you do need the body, consider extracting the data you need and then calling dispose() on the RevCommit, assuming you only need the data once and can then discard it. If you need to hang onto the data, you may find that JGit's internal representation uses less overall memory than if you held onto it yourself, especially if you want the full message. This is because JGit uses a byte[] internally to store the message in UTF-8. Java String storage would be bigger using UTF-16, assuming the message is mostly US-ASCII data.

RevWalk walk = new RevWalk(repository);
// more setup
Set<String> authorEmails = new HashSet<String>();
 
for (RevCommit commit : walk) {
	// extract the commit fields you need, for example:
	authorEmails.add(commit.getAuthorIdent().getEmailAddress());
 
	commit.dispose();
}

Subclassing RevWalk and RevCommit

If you need to attach additional data to a commit, consider subclassing both RevWalk and RevCommit, and using the createCommit() method in RevWalk to consruct an instance of your RevCommit subclass. Put the additional data as fields in your RevCommit subclass, so that you don't need to use an auxiliary HashMap to translate from RevCommit or ObjectId to your additional data fields.

public class ReviewedRevision extends RevCommit {
 
	private final Date reviewDate;
 
	private ReviewedRevision(AnyObjectId id, Date reviewDate) {
		super(id);
		this.reviewDate = reviewDate;
	}
 
	public List<String> getReviewedBy() {
		return getFooterLines("Reviewed-by");
	}
 
	public Date getReviewDate() {
		return reviewDate;
	}
 
	public static class Walk extends RevWalk {
 
		public Walk(Repository repo) {
			super(repo);
		}
 
		@Override
		protected RevCommit createCommit(AnyObjectId id) {
			return new ReviewedRevision(id, getReviewDate(id));
		}
 
		private Date getReviewDate(AnyObjectId id) {
			// ...
		}
 
	}
}

Cleaning up after a revision walk

A RevWalk cannot shrink its internal object map. If you have just done a huge traversal of say all history of the repository, that will load everything into the object map, and it cannot be released. If you don't need this data in the near future, it may be a good idea to throw away the RevWalk and allocate a new one for your next traversal. That will let the GC reclaim everything and make it available for another use. On the other hand, reusing an existing object map is much faster than building a new one from scratch. So you need to balance the reclaiming of memory against the user's desire to perform fast updates of an existing repository view.

RevWalk walk = new RevWalk(repository);
// ...
for (RevCommit commit : walk) {
	// ...
}
walk.dispose();

Back to the top