MemoryAnalyzer/Adding a new heap dump format to the Memory Analyzer
To add support for a new heap dump format to the Memory Analyzer, you will need to create an Eclipse plug-in and use the provided extension points.
The HPROF and DTFJ plugins can be used as a reference if you do not know how something can be done. You can also find general information in the forum thread in the 4th post in the thread  in the old forum. An overview picture for the APIs can be found in the slides for the graduation review  on page 11.
Relevant Extension Points
The relevant extensions points for a new heap dump format include:
- the parser extension point (for parsing the new format)
- the trigger heap dump extension point (to enable the user to trigger a heap dump from the VM with MAT)
When MAT reads a new heap dump, the parse method in the class SnapshotFactoryImpl will be called. It handles the reading of a new heap dump (new means that no indexes for this heap dump exist). This method calls the index builder (provided by the parser extension point), a SnapshotImplBuilder and the GarbageCleaner. The GarbageCleaner is used to purge unreachable objects from the heap dump. The array returned by its clean methods can be used to remove unreachable objects from the indexes. After the parse method is done, MAT will have a SnapshotImpl for the heap dump, which contains the most important information.
The parser extension point
Using the parser extension points requires you to provide implementations for 2 interfaces:
As the API reference says, the index builder is responsible for reading the structural information of the heap and building indexes out of it. This information is required to be able to use MAT, so the IndexBuilder is the first thing you will need to get working.
The main work that has to be done in the index builder consists of parsing your new heap dump format and filling MAT's data structures. Your implementation of the IndexBuilder will fill in the data into an IIPreliminaryIndex. Implementations of this interface provide methods to fill the respective data structures. The data structures are:
- Identifiers - This data structures holds the long addresses for all objects present in the heap dump. ALL addresses must be contained and there must not be duplicates. After collecting all addresses, sort() needs to be called on the identifiers data structure. This will enable getting an integer id for each address by calling reverse(address) on the identifier. The id is necessary for the other data structures. Negative numbers are not valid ids. If a negative number is returned, a call to sort() may be missing or the address is not present in the identifiers data structure.
- ClassesById - Maps an id to a ClassImpl containing information about this class. The comments in ClassImpl should prove sufficient to understand what's going on. If you have questions about UsedHeapSize the 4th and 5th post in the thread  in the old forum may help you.
- ObjectToId - Maps the id of an object to the id corresponding to the ClassImpl of the object's class.
- gcRoots - Maps the id of a garbage collection root to information about the garbage collection root (e.g. what type of root it is). It is very important that you do not miss any roots because the GarbageCleaner will purge unreachable objects from your dump and discard the information.
- array2size - maps an id of an object (not necessarily an array) to the size of that object, in bytes. This data structure must contain an entry for every array in your dump. It may contain an entry for a non-array object, if that object's size differs from the instance size set in the corresponding ClassImpl (this can be the case if Adress-bashed hashing is used).
- outbound - maps an id of an object to its outbound references. Similarly to gcRoots, missing references may cause objects in your dump to appear as unreachable.
- thread2objects2roots - This is used to show garbage collection roots associated with a thread. It is a hash map going from thread id to another hash map. The second hash map maps all the object ids referenced by the thread to a list of GC Root information for each object, holding the reason why the object is referenced, such as a Java local variable, JNI Local, reference from a native stack. The thread itself is the main GC root, and these maps are used to annotate references from the thread. Objects referenced via a thread do not need to be included in the gcRoots map unless they are also global GC roots.
There are some constraints on the indexes that must be met. For example, the first outbound reference logged for each object must be to the object's class. More information on these constraints can be found in the thread  in the old forum. Take care that the references for the objects in the dump are correct because the GarbageCleaner will remove unreachable objects. If unreachable objects should be kept, the "keep_unreachable_objects" can be set (see HPROF or DTFJ for how this can be done).
As the API reference says, the object reader provides detailed information about objects, e.g. values of instance fields. To do so, random access of the heap dump is needed. Luckily, the developers of MAT provide the classes BufferedRandomAccessInputStream and PositionInputStream. They can be used like this: new PositionInputStream(new BufferedRandomAccessInputStream(new RandomAccessFile(fileName)))
There are several kinds of Objects that the read method can return:
- InstanceImpl for normal objects
- ClassloaderImpl for classloaders
- ObjectArrayImpl for non-primitive arrays
- PrimitiveArrayImpl for primitive arrays
The trigger heap dump extension point