MemoryAnalyzer/Adding a new heap dump format to the Memory Analyzer

Extension points

The Memory Analyzer provides you with several extension points that you can use in your Eclipse Plugins. If you have a binary distribution of the Memory Analyzer, you can find information about the extension points in the help (Help -> Help Contents). The extensions points are described in the API Reference (Memory Analyzer -> Reference -> API Reference).

You can also find general information in the forum thread in the 4th post in the thread [1] in the old forum. An overview picture for the APIs can be found in the slides for the graduation review [2] on page 11.

Adding a new heap dump format to the Memory Analyzer

I added a new heap dump format to the Memory Analyzer for my plugin, so I will try to explain how this can be done. The information here is from April 2010 and may be outdated. I hope it will be of use nonetheless.

To add support for a new heap dump format to the Memory Analyzer, you will need to create an Eclipse plug-in and use the provided extension points.

The HPROF and DTFJ plugins can be used as a reference if you do not know how something can be done. Personally, I found HPROF the most helpful.

Relevant Extension Points

The relevant extensions points for a new heap dump format include:

the parser extension point (for parsing the new format)
the query extension point (to add new queries)
the trigger heap dump extension point (to enable the user to trigger a heap dump from the VM with MAT)

I mainly used the parser extension point. I also used the query extension point, to add a very simple query. I did not use the trigger heap dump extension point so I cannot provide hints for it.

When MAT reads a new heap dump, the parse method in the class SnapshotFactoryImpl will be called. It handles the reading of a new heap dump (new means that no indexes for this heap dump exist). This method calls the index builder (provided by the parser extension point), a SnapshotImplBuilder and the GarbageCleaner. The GarbageCleaner is used to purge unreachable objects from the heap dump. The array returned by its clean methods can be used to remove unreachable objects from the indexes. After the parse method is done, MAT will have a SnapshotImpl for the heap dump, which contains the most important information.

The parser extension point

Using the parser extension points requires you to provide implementations for 2 interfaces:

IIndexBuilder
IObjectReader

IIndexBuilder

As the API reference says, the index builder is responsible for reading the structural information of the heap and building indexes out of it. This information is required to be able to use MAT, so the IndexBuilder is the first thing you will need to get working.

The main work that has to be done in the index builder consists of parsing your new heap dump format and filling MAT's data structures. Your implementation of the IndexBuilder will fill in the data into an IIPreliminaryIndex. Implementations of this interface provide methods to fill the respective data structures. The data structures are:

Identifiers - This data structures holds the long addresses for all objects present in the heap dump. ALL addresses must be contained and there must not be duplicates. After collecting all addresses, sort() needs to be called on the identifiers data structure. This will enable getting an integer id for each address by calling reverse(address) on the identifier. The id is necessary for the other data structures. Negative numbers are not valid ids. If a negative number is returned, a call to sort() may be missing or the address is not present in the identifiers data structure.
ClassesById - Maps an id to a ClassImpl containing information about this class. The comments in ClassImpl should prove sufficient to understand what's going on. If you have questions about UsedHeapSize the 4th and 5th post in the thread [3] in the old forum may help you.
ObjectToId - Maps the id of an object to the id corresponding to the ClassImpl of the object's class.
gcRoots - Maps the id of a garbage collection root to information about the garbage collection root (e.g. what type of root it is). It is very important that you do not miss any roots because the GarbageCleaner will purge unreachable objects from your dump and discard the information.
array2size - maps an id of an object (not necessarily an array) to the size of that object, in bytes. This data structure must contain an entry for every array in your dump. It may contain an entry for a non-array object, if that object's size differs from the instance size set in the corresponding ClassImpl (this can be the case if Adress-bashed hashing is used).
outbound - maps an id of an object to its outbound references. Similarly to gcRoots, missing references may cause objects in your dump to appear as unreachable.
thread2objects2roots - no idea what this does, my format did not contain explicit thread information. This is used to show garbage collection roots associated with a thread. It is a hash map going from thread id to another hash map. The second hash map maps all the object ids referenced by the thread to a list of GC Root information for each object, holding the reason why the object is referenced, such as a Java local variable, JNI Local, reference from a native stack. The thread itself is the main GC root, and these maps are used to annotate references from the thread. Objects referenced via a thread do not need to be included in the gcRoots map unless they are also global GC roots.

There are some constraints on the indexes that must be met. For example, the first outbound reference logged for each object must be to the object's class. More information on these constraints can be found in the thread [4] in the old forum. Take care that the references for the objects in the dump are correct because the GarbageCleaner will remove unreachable objects. If unreachable objects should be kept, the "keep_unreachable_objects" can be set (see HPROF or DTFJ for how this can be done).

IObjectReader

As the API reference says, the object reader provides detailed information about objects, e.g. values of instance fields. To do so, random access of the heap dump is needed. Luckily, the developers of MAT provide the classes BufferedRandomAccessInputStream and PositionInputStream. They can be used like this: new PositionInputStream(new BufferedRandomAccessInputStream(new RandomAccessFile(fileName)))

There are several kinds of Objects that the read method can return:

InstanceImpl for normal objects
ClassloaderImpl for classloaders
ObjectArrayImpl for non-primitive arrays
PrimitiveArrayImpl for primitive arrays

The query extension point

Implementing a query is pretty simple: Create a class for the query, implement IQuery and register it with the extension points. You will need to use annotations in your query to inject data, e.g. an ISnapshot. The execute method of a query will return an IResult. I can only give two pointers:

Do not iterate over the objects with something like for (int i = 0; i < identifiers.size; i ++). This can be extremely slow. Use the methods provided by ISnapshot. For example, to iterate over all objects of a class you can get the class and use getObjectIDs().

A very simple result is the ITextresult, which will just display a String as result.

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.