Jump to: navigation, search

Difference between revisions of "MemoryAnalyzer/Adding a new heap dump format to the Memory Analyzer"

m (IIndexBuilder)
m (IIndexBuilder)
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Extension points ==
+
== Introduction ==
  
The Memory Analyzer provides you with several extension points that you can use in your Eclipse Plugins. If you have a binary distribution of the Memory Analyzer, you can find information about the extension points in the help (''Help -> Help Contents''). The extensions points are described in the API Reference (''Memory Analyzer -> Reference -> API Reference'').  
+
To add support for a new heap dump format to the Memory Analyzer, you will need to create an Eclipse plug-in and use the provided extension points.
  
You can also find general information in the forum thread in the 4th post in the thread [http://www.eclipse.org/forums/index.php?t=msg&th=153571&start=0&S=86b5235a33dd47bfed74cb351e531fbf] in the old forum. An overview picture for the APIs can be found in the slides for the graduation review [http://www.eclipse.org/project-slides/Helios/MAT_Helios_Release.pdf] on page 11.
+
The HPROF and DTFJ plugins can be used as a reference if you do not know how something can be done. You can also find general information in the forum thread in the 4th post in the thread [http://www.eclipse.org/forums/index.php/mv/msg/153571/486076/#msg_486076] in the old forum. An overview picture for the APIs can be found in the slides for the graduation review [http://www.eclipse.org/project-slides/Helios/MAT_Helios_Release.pdf] on page 11.  
 
+
<br>
+
 
+
== Adding a new heap dump format to the Memory Analyzer  ==
+
 
+
I added a new heap dump format to the Memory Analyzer for my plugin, so I will try to explain how this can be done. The information here is from April 2010 and may be outdated. I hope it will be of use nonetheless.
+
 
+
To add support for a new heap dump format to the Memory Analyzer, you will need to create an Eclipse plug-in and use the provided extension points.
+
  
The HPROF and DTFJ plugins can be used as a reference if you do not know how something can be done. Personally, I found HPROF the most helpful.
+
[[Category:Memory Analyzer]]
  
=== Relevant Extension Points  ===
+
== Relevant Extension Points  ==
  
 
The relevant extensions points for a new heap dump format include:  
 
The relevant extensions points for a new heap dump format include:  
  
 
*the parser extension point (for parsing the new format)  
 
*the parser extension point (for parsing the new format)  
*the query extension point (to add new queries)
 
 
*the trigger heap dump extension point (to enable the user to trigger a heap dump from the VM with MAT)
 
*the trigger heap dump extension point (to enable the user to trigger a heap dump from the VM with MAT)
 
<br> I mainly used the parser extension point. I also used the query extension point, to add a very simple query. I did not use the trigger heap dump extension point so I cannot provide hints for it.
 
  
 
When MAT reads a new heap dump, the parse method in the class '''SnapshotFactoryImpl''' will be called. It handles the reading of a new heap dump (new means that no indexes for this heap dump exist). This method calls the index builder (provided by the parser extension point), a '''SnapshotImplBuilder''' and the '''GarbageCleaner'''. The GarbageCleaner is used to purge unreachable objects from the heap dump. The array returned by its clean methods can be used to remove unreachable objects from the indexes. After the parse method is done, MAT will have a '''SnapshotImpl''' for the heap dump, which contains the most important information.  
 
When MAT reads a new heap dump, the parse method in the class '''SnapshotFactoryImpl''' will be called. It handles the reading of a new heap dump (new means that no indexes for this heap dump exist). This method calls the index builder (provided by the parser extension point), a '''SnapshotImplBuilder''' and the '''GarbageCleaner'''. The GarbageCleaner is used to purge unreachable objects from the heap dump. The array returned by its clean methods can be used to remove unreachable objects from the indexes. After the parse method is done, MAT will have a '''SnapshotImpl''' for the heap dump, which contains the most important information.  
  
==== The parser extension point  ====
+
=== The parser extension point  ===
  
 
Using the parser extension points requires you to provide implementations for 2 interfaces:  
 
Using the parser extension points requires you to provide implementations for 2 interfaces:  
Line 34: Line 23:
 
*'''IObjectReader'''
 
*'''IObjectReader'''
  
===== IIndexBuilder  =====
+
==== IIndexBuilder  ====
  
 
As the API reference says, the index builder is responsible for reading the structural information of the heap and building indexes out of it. This information is required to be able to use MAT, so the IndexBuilder is the first thing you will need to get working.  
 
As the API reference says, the index builder is responsible for reading the structural information of the heap and building indexes out of it. This information is required to be able to use MAT, so the IndexBuilder is the first thing you will need to get working.  
Line 41: Line 30:
  
 
*Identifiers - This data structures holds the '''long''' addresses for all objects present in the heap dump. ALL addresses must be contained and there must not be duplicates. After collecting all addresses, '''sort()''' needs to be called on the identifiers data structure. This will enable getting an '''integer''' id for each address by calling '''reverse(address)''' on the identifier. The id is necessary for the other data structures. Negative numbers are not valid ids. If a negative number is returned, a call to '''sort()''' may be missing or the address is not present in the identifiers data structure.  
 
*Identifiers - This data structures holds the '''long''' addresses for all objects present in the heap dump. ALL addresses must be contained and there must not be duplicates. After collecting all addresses, '''sort()''' needs to be called on the identifiers data structure. This will enable getting an '''integer''' id for each address by calling '''reverse(address)''' on the identifier. The id is necessary for the other data structures. Negative numbers are not valid ids. If a negative number is returned, a call to '''sort()''' may be missing or the address is not present in the identifiers data structure.  
*ClassesById - Maps an id to a ClassImpl containing information about this class. The comments in ClassImpl should prove sufficient to understand what's going on. If you have questions about UsedHeapSize the 4th and 5th post in the thread [http://www.eclipse.org/forums/index.php?t=msg&th=163200&start=0&S=86b5235a33dd47bfed74cb351e531fbf] in the old forum may help you.  
+
*ClassesById - Maps an id to a ClassImpl containing information about this class. The comments in ClassImpl should prove sufficient to understand what's going on. If you have questions about UsedHeapSize the [http://www.eclipse.org/forums/index.php/mv/msg/163200/517929/#msg_517929 4th] and [http://www.eclipse.org/forums/index.php/mv/msg/163200/518191/#msg_518191 5th] post in the thread [http://www.eclipse.org/forums/index.php/mv/msg/163200/518191/] in the old forum may help you.  
 
*ObjectToId - Maps the id of an object to the id corresponding to the '''ClassImpl''' of the object's class.  
 
*ObjectToId - Maps the id of an object to the id corresponding to the '''ClassImpl''' of the object's class.  
 
*gcRoots - Maps the id of a garbage collection root to information about the garbage collection root (e.g. what type of root it is). It is very important that you do not miss any roots because the GarbageCleaner will purge unreachable objects from your dump and discard the information.  
 
*gcRoots - Maps the id of a garbage collection root to information about the garbage collection root (e.g. what type of root it is). It is very important that you do not miss any roots because the GarbageCleaner will purge unreachable objects from your dump and discard the information.  
*array2size - maps an id of an object (not necessarily an array) to the size of that object, in bytes. This data structure must contain an entry for every array in your dump. It may contain an entry for a non-array object, if that object's size differs from the instance size set in the corresponding '''ClassImpl''' (this can be the case if Adress-bashed hashing is used).  
+
*array2size - maps an id of an object (not necessarily an array) to the size of that object, in bytes. This data structure must contain an entry for every array in your dump. It may contain an entry for a non-array object, if that object's size differs from the instance size set in the corresponding '''ClassImpl''' (this can be the case if Address-bashed hashing is used).  
 
*outbound - maps an id of an object to its outbound references. Similarly to gcRoots, missing references may cause objects in your dump to appear as unreachable.  
 
*outbound - maps an id of an object to its outbound references. Similarly to gcRoots, missing references may cause objects in your dump to appear as unreachable.  
*thread2objects2roots - no idea what this does, my format did not contain explicit thread information. This is used to show garbage collection roots associated with a thread. It is a hash map going from thread id to another hash map. The second hash map maps all the object ids referenced by the thread to a list of GC Root information for each object, holding the reason why the object is referenced, such as a Java local variable, JNI Local, reference from a native stack. The thread itself is the main GC root, and these maps are used to annotate references from the thread. Objects referenced via a thread do not need to be included in the gcRoots map unless they are also global GC roots.
+
*thread2objects2roots - This is used to show garbage collection roots associated with a thread. It is a hash map going from thread id to another hash map. The second hash map maps all the object ids referenced by the thread to a list of GC Root information for each object, holding the reason why the object is referenced, such as a Java local variable, JNI Local, reference from a native stack. The thread itself is the main GC root, and these maps are used to annotate references from the thread. Objects referenced via a thread do not need to be included in the gcRoots map unless they are also global GC roots.
 +
* threads index file - This is used to show where the thread locals are in the stack frames.
 +
This is just a text file named "{prefix}.threads".
 +
The threads file format is multiple sections as follows:
 +
<pre>
 +
Thread 0x7ffe04c1890
 +
 
 +
at java.lang.Object.wait(JI)V (Native Method)
 +
at java.lang.Object.wait()V (Object.java:167)
 +
at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent()Lorg/eclipse/osgi/framework/eventmgr/EventManager$EventThread$Queued; (EventManager.java:397)
 +
at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run()V (EventManager.java:333)
 +
 
 +
locals:
 +
objecId=0x7ffe04c1890, line=0
 +
objecId=0x7ffe04c1890, line=2
 +
objecId=0x7ffe04c1890, line=2
 +
objecId=0x7ffe04c1890, line=3
 +
 
 +
</pre>
 +
**"Thread" is matched to find the start of a section for a thread.
 +
**The thread address is optional – but if omitted then none of the information is stored for that thread.
 +
**The stack frame data is just text, but should be in the same format as a Java stack trace.
 +
**A blank line ends the stack trace.
 +
**"locals" starts the local variable information
 +
**The line number in the stack trace (0-based) is matched by the decimal number following the "line=".
 +
**If the line number is found then the the object id is matched using the "0x" and the "," comma to delimit the hex address of the object on the stack frame.
 +
**A blank line ends the local variable section.
  
 
There are some constraints on the indexes that must be met. For example, the first outbound reference logged for each object must be to the object's class. More information on these constraints can be found in the thread [http://www.eclipse.org/forums/index.php?t=msg&th=163200&start=0&S=86b5235a33dd47bfed74cb351e531fbf] in the old forum. Take care that the references for the objects in the dump are correct because the GarbageCleaner will remove unreachable objects. If unreachable objects should be kept, the "keep_unreachable_objects" can be set (see HPROF or DTFJ for how this can be done).
 
There are some constraints on the indexes that must be met. For example, the first outbound reference logged for each object must be to the object's class. More information on these constraints can be found in the thread [http://www.eclipse.org/forums/index.php?t=msg&th=163200&start=0&S=86b5235a33dd47bfed74cb351e531fbf] in the old forum. Take care that the references for the objects in the dump are correct because the GarbageCleaner will remove unreachable objects. If unreachable objects should be kept, the "keep_unreachable_objects" can be set (see HPROF or DTFJ for how this can be done).
  
===== IObjectReader  =====
+
Memory Analyzer 1.2 will be able to check indices for any parser. Either start Memory Analyzer from inside Eclipse using the run configuration trace option:
 +
org.eclipse.mat.parser debug enabled
 +
or create a file .options containing
 +
org.eclipse.mat.parser/debug=true
 +
and start Memory Analyzer with the -debug option. See [[FAQ_How_do_I_use_the_platform_debug_tracing_facility%3F]].
 +
 
 +
==== IObjectReader  ====
  
 
As the API reference says, the object reader provides detailed information about objects, e.g. values of instance fields. To do so, random access of the heap dump is needed. Luckily, the developers of MAT provide the classes '''BufferedRandomAccessInputStream''' and '''PositionInputStream'''. They can be used like this: '''new PositionInputStream(new BufferedRandomAccessInputStream(new RandomAccessFile(fileName)))'''  
 
As the API reference says, the object reader provides detailed information about objects, e.g. values of instance fields. To do so, random access of the heap dump is needed. Luckily, the developers of MAT provide the classes '''BufferedRandomAccessInputStream''' and '''PositionInputStream'''. They can be used like this: '''new PositionInputStream(new BufferedRandomAccessInputStream(new RandomAccessFile(fileName)))'''  
Line 58: Line 79:
 
* '''ClassloaderImpl''' for classloaders  
 
* '''ClassloaderImpl''' for classloaders  
 
* '''ObjectArrayImpl''' for non-primitive arrays  
 
* '''ObjectArrayImpl''' for non-primitive arrays  
* '''PrimitiveArrayImpl''' for primitive arrays  
+
* '''PrimitiveArrayImpl''' for primitive arrays
 
+
==== The query extension point  ====
+
 
+
Implementing a query is pretty simple: Create a class for the query, implement '''IQuery''' and register it with the extension points. You will need to use annotations in your query to inject data, e.g. an '''ISnapshot'''. The '''execute''' method of a query will return an '''IResult'''. I can only give two pointers:
+
  
*Do not iterate over the objects with something like '''for (int i = 0; i &lt; identifiers.size; i ++)'''. This can be ''extremely'' slow. Use the methods provided by '''ISnapshot'''. For example, to iterate over all objects of a class you can get the class and use '''getObjectIDs()'''.
+
=== The trigger heap dump extension point ===
  
* A very simple result is the '''ITextresult''', which will just display a '''String''' as result.
+
TODO

Latest revision as of 06:19, 7 June 2013

Introduction

To add support for a new heap dump format to the Memory Analyzer, you will need to create an Eclipse plug-in and use the provided extension points.

The HPROF and DTFJ plugins can be used as a reference if you do not know how something can be done. You can also find general information in the forum thread in the 4th post in the thread [1] in the old forum. An overview picture for the APIs can be found in the slides for the graduation review [2] on page 11.

Relevant Extension Points

The relevant extensions points for a new heap dump format include:

  • the parser extension point (for parsing the new format)
  • the trigger heap dump extension point (to enable the user to trigger a heap dump from the VM with MAT)

When MAT reads a new heap dump, the parse method in the class SnapshotFactoryImpl will be called. It handles the reading of a new heap dump (new means that no indexes for this heap dump exist). This method calls the index builder (provided by the parser extension point), a SnapshotImplBuilder and the GarbageCleaner. The GarbageCleaner is used to purge unreachable objects from the heap dump. The array returned by its clean methods can be used to remove unreachable objects from the indexes. After the parse method is done, MAT will have a SnapshotImpl for the heap dump, which contains the most important information.

The parser extension point

Using the parser extension points requires you to provide implementations for 2 interfaces:

  • IIndexBuilder
  • IObjectReader

IIndexBuilder

As the API reference says, the index builder is responsible for reading the structural information of the heap and building indexes out of it. This information is required to be able to use MAT, so the IndexBuilder is the first thing you will need to get working.

The main work that has to be done in the index builder consists of parsing your new heap dump format and filling MAT's data structures. Your implementation of the IndexBuilder will fill in the data into an IIPreliminaryIndex. Implementations of this interface provide methods to fill the respective data structures. The data structures are:

  • Identifiers - This data structures holds the long addresses for all objects present in the heap dump. ALL addresses must be contained and there must not be duplicates. After collecting all addresses, sort() needs to be called on the identifiers data structure. This will enable getting an integer id for each address by calling reverse(address) on the identifier. The id is necessary for the other data structures. Negative numbers are not valid ids. If a negative number is returned, a call to sort() may be missing or the address is not present in the identifiers data structure.
  • ClassesById - Maps an id to a ClassImpl containing information about this class. The comments in ClassImpl should prove sufficient to understand what's going on. If you have questions about UsedHeapSize the 4th and 5th post in the thread [3] in the old forum may help you.
  • ObjectToId - Maps the id of an object to the id corresponding to the ClassImpl of the object's class.
  • gcRoots - Maps the id of a garbage collection root to information about the garbage collection root (e.g. what type of root it is). It is very important that you do not miss any roots because the GarbageCleaner will purge unreachable objects from your dump and discard the information.
  • array2size - maps an id of an object (not necessarily an array) to the size of that object, in bytes. This data structure must contain an entry for every array in your dump. It may contain an entry for a non-array object, if that object's size differs from the instance size set in the corresponding ClassImpl (this can be the case if Address-bashed hashing is used).
  • outbound - maps an id of an object to its outbound references. Similarly to gcRoots, missing references may cause objects in your dump to appear as unreachable.
  • thread2objects2roots - This is used to show garbage collection roots associated with a thread. It is a hash map going from thread id to another hash map. The second hash map maps all the object ids referenced by the thread to a list of GC Root information for each object, holding the reason why the object is referenced, such as a Java local variable, JNI Local, reference from a native stack. The thread itself is the main GC root, and these maps are used to annotate references from the thread. Objects referenced via a thread do not need to be included in the gcRoots map unless they are also global GC roots.
  • threads index file - This is used to show where the thread locals are in the stack frames.

This is just a text file named "{prefix}.threads". The threads file format is multiple sections as follows:

Thread 0x7ffe04c1890

 at java.lang.Object.wait(JI)V (Native Method)
 at java.lang.Object.wait()V (Object.java:167)
 at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.getNextEvent()Lorg/eclipse/osgi/framework/eventmgr/EventManager$EventThread$Queued; (EventManager.java:397)
 at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run()V (EventManager.java:333)

locals:
objecId=0x7ffe04c1890, line=0
objecId=0x7ffe04c1890, line=2
objecId=0x7ffe04c1890, line=2
objecId=0x7ffe04c1890, line=3

    • "Thread" is matched to find the start of a section for a thread.
    • The thread address is optional – but if omitted then none of the information is stored for that thread.
    • The stack frame data is just text, but should be in the same format as a Java stack trace.
    • A blank line ends the stack trace.
    • "locals" starts the local variable information
    • The line number in the stack trace (0-based) is matched by the decimal number following the "line=".
    • If the line number is found then the the object id is matched using the "0x" and the "," comma to delimit the hex address of the object on the stack frame.
    • A blank line ends the local variable section.

There are some constraints on the indexes that must be met. For example, the first outbound reference logged for each object must be to the object's class. More information on these constraints can be found in the thread [4] in the old forum. Take care that the references for the objects in the dump are correct because the GarbageCleaner will remove unreachable objects. If unreachable objects should be kept, the "keep_unreachable_objects" can be set (see HPROF or DTFJ for how this can be done).

Memory Analyzer 1.2 will be able to check indices for any parser. Either start Memory Analyzer from inside Eclipse using the run configuration trace option: org.eclipse.mat.parser debug enabled or create a file .options containing org.eclipse.mat.parser/debug=true and start Memory Analyzer with the -debug option. See FAQ_How_do_I_use_the_platform_debug_tracing_facility?.

IObjectReader

As the API reference says, the object reader provides detailed information about objects, e.g. values of instance fields. To do so, random access of the heap dump is needed. Luckily, the developers of MAT provide the classes BufferedRandomAccessInputStream and PositionInputStream. They can be used like this: new PositionInputStream(new BufferedRandomAccessInputStream(new RandomAccessFile(fileName)))

There are several kinds of Objects that the read method can return:

  • InstanceImpl for normal objects
  • ClassloaderImpl for classloaders
  • ObjectArrayImpl for non-primitive arrays
  • PrimitiveArrayImpl for primitive arrays

The trigger heap dump extension point

TODO