Difference between revisions of "Performance Bloopers"

From Eclipsepedia

Jump to: navigation, search
Line 5: Line 5:
 
The set of bloopers is (sadly) always growing. This site is intended as a resource for developers to consult to build their general knowledge of problems, techniques, etc. Check back often and contribute your own bloopers.
 
The set of bloopers is (sadly) always growing. This site is intended as a resource for developers to consult to build their general knowledge of problems, techniques, etc. Check back often and contribute your own bloopers.
  
== Blooper: String.substring() ==
+
== String.substring() ==
 
The <tt>java.lang.String.substring(...)</tt> method is usually implemented by creating a new <tt>String</tt> object that points back to the same underlying <tt>char[]</tt> as the receiver, but with different offset and length values.  Therefore, if you  take a very large string, create a substring of length 1, then discard the large string, the little substring may still hold onto the very large <tt>char[]</tt>.
 
The <tt>java.lang.String.substring(...)</tt> method is usually implemented by creating a new <tt>String</tt> object that points back to the same underlying <tt>char[]</tt> as the receiver, but with different offset and length values.  Therefore, if you  take a very large string, create a substring of length 1, then discard the large string, the little substring may still hold onto the very large <tt>char[]</tt>.
  
Line 14: Line 14:
 
In situations where you know you are creating a small substring and then throwing the large string away, force a copy of the string to be created by calling <tt>new String(substring)</tt>.  This seems counter-intuitive from a performance perspective because it creates extra objects, but it can be worthwhile if the substring is being retained for a long period. In one particular case in the Eclipse JDT plugins, copying the substring yielded a 550KB space savings.  Not bad for a one line fix!
 
In situations where you know you are creating a small substring and then throwing the large string away, force a copy of the string to be created by calling <tt>new String(substring)</tt>.  This seems counter-intuitive from a performance perspective because it creates extra objects, but it can be worthwhile if the substring is being retained for a long period. In one particular case in the Eclipse JDT plugins, copying the substring yielded a 550KB space savings.  Not bad for a one line fix!
  
== Blooper: Unbufferred I/O ==
+
== Unbufferred I/O ==
 
With most flavours of <tt>java.io.InputStream</tt> and <tt>java.io.OutputStream</tt>, buffering doesn't come for free.  This means that every single read and write call may result in disk or network I/O.  Similarly in Eclipse, the streams returned by methods such as <tt>org.eclipse.core.resources.IFile#getContents</tt>, or created by opening an <tt>InputStream</tt> on an Eclipse <tt>URL</tt> are not buffered.
 
With most flavours of <tt>java.io.InputStream</tt> and <tt>java.io.OutputStream</tt>, buffering doesn't come for free.  This means that every single read and write call may result in disk or network I/O.  Similarly in Eclipse, the streams returned by methods such as <tt>org.eclipse.core.resources.IFile#getContents</tt>, or created by opening an <tt>InputStream</tt> on an Eclipse <tt>URL</tt> are not buffered.
  
Line 21: Line 21:
 
The solution in this case is simple.  Just wrap the stream in a <tt>java.io.BufferedInputStream</tt> or <tt>BufferedOutputStream</tt>. If you have a good idea of the amount of bytes that need reading or writing, you can even set the stream's buffer size appropriately.
 
The solution in this case is simple.  Just wrap the stream in a <tt>java.io.BufferedInputStream</tt> or <tt>BufferedOutputStream</tt>. If you have a good idea of the amount of bytes that need reading or writing, you can even set the stream's buffer size appropriately.
  
== Blooper: Strings in the plug-in registry ==
+
== Strings in the plug-in registry ==
 
In Eclipse 2.0.* and before, it was generally assumed that there would be hundreds of plugins and that the plugin registry, while sizeable, could reasonably be held in memory. As Eclipse-based products came to market we discovered that developers were taking the plugin model to heart and were creating hundreds of plugins for one produce. Our assumptions were being tested...
 
In Eclipse 2.0.* and before, it was generally assumed that there would be hundreds of plugins and that the plugin registry, while sizeable, could reasonably be held in memory. As Eclipse-based products came to market we discovered that developers were taking the plugin model to heart and were creating hundreds of plugins for one produce. Our assumptions were being tested...
  
Line 50: Line 50:
 
In Eclipse 3.1, the fourth approach was taken.  All registry data structures are now loaded from disk on demand, and flushed from memory when not in use by employing soft references (<tt>java.lang.ref.SoftReference</tt>).  For an application that is not consulting the registry, memory usage for the extension registry has effectively been reduced to zero.
 
In Eclipse 3.1, the fourth approach was taken.  All registry data structures are now loaded from disk on demand, and flushed from memory when not in use by employing soft references (<tt>java.lang.ref.SoftReference</tt>).  For an application that is not consulting the registry, memory usage for the extension registry has effectively been reduced to zero.
  
== Blooper: Excessive crawling of the extension registry ==
+
== Excessive crawling of the extension registry ==
 
As described in the previous blooper, the Eclipse extension registry is now loaded from disk on demand, and discarded when no longer referenced.  This speed/space trade-off has created the possibility of a whole new category of performance blooper for clients of the registry.  For example, here is a block of code that was actually discovered in a third-party plugin:
 
As described in the previous blooper, the Eclipse extension registry is now loaded from disk on demand, and discarded when no longer referenced.  This speed/space trade-off has created the possibility of a whole new category of performance blooper for clients of the registry.  For example, here is a block of code that was actually discovered in a third-party plugin:
  
Line 73: Line 73:
  
 
Some extra shortcut methods were added in Eclipse 3.1 to help clients avoid unnecessary registry access.  For example, to find the plugin ID (namespace) for a configuration element, clients would previously call    <tt>IConfigurationElement.getDeclaringExtension().getNamespace()</tt>. It is much more efficient to call the new <tt>IConfigurationElement.getNamespace()</tt> method directly, saving the <tt>IExtension</tt> object from potentially being loaded from disk.
 
Some extra shortcut methods were added in Eclipse 3.1 to help clients avoid unnecessary registry access.  For example, to find the plugin ID (namespace) for a configuration element, clients would previously call    <tt>IConfigurationElement.getDeclaringExtension().getNamespace()</tt>. It is much more efficient to call the new <tt>IConfigurationElement.getNamespace()</tt> method directly, saving the <tt>IExtension</tt> object from potentially being loaded from disk.
 +
 +
== Message catalog keys ==
 +
The text messages required for a particular plugin are typically contained in one or more Java properties files. These message bundles have key-value pairs where the key is some useful token that humans can read and the value is the translated text of the message. Plugins are responsible for loading and maintaining the messags. Typically this is done on demand either when the plugin is started or when the first message from a particular bundle is needed. Loading one message typically loads all messages in the same bundle.
 +
 +
There are several problems with this situation:
 +
# Again we have the inefficient use of Strings as identifiers. Other than readability in the properties file, having human readable keys is not particularly compelling. Assuming the use of constants, int values would be just as functional.
 +
# Similarly, the use of String keys requires the use of Hashtables to store the loaded message bundles. Some array based structure would be more efficient.
 +
# The Eclipse SDK contains tooling which helps users &quot;externalize&quot; their Strings. That is, it replaces embedded Strings with message references and builds the entries in the message bundles. This tool can generate the keys for the messages as they are discovered. Unfortunately, the generated keys are based on the fully qualified class/method name where the string was discovered. This makes for quite long keys (e.g., keys greater than 90 characters long were discovered in some of the Debug plugins).
 +
 +
'''Avoidance Techniques:'''
 +
There are several facets to this problem but the basic lesson here is to understand the space you are using. Long keys are not particularly useful and just waste space. String keys are good for developers but end-users pay the space cost. Mechanisms like bundle loading/management which are going to be used through out the entire system should be well thought out and supplied to developers rather than leaving it up to each to do their own (inefficient) implementation.
 +
 +
With that in mind, below are some of the many possible alternatives:
 +
# Shorter keys: Clearly the message keys should be useful but not excessively long.
 +
# Use the Eclipse 3.1 ["http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/documents/3.1/message_bundles.html message bundle] facility, <tt>org.eclipse.osgi.util.NLS</tt>. This API binds each message in your catalog to a Java field, eliminating the notion
 +
  of keys entirely, yielding a huge memory improvement over the basic Java <tt>PropertyResourceBundle</tt>.
 +
  
 
----
 
----

Revision as of 14:52, 13 June 2006

Below is a collection of goofs, mistakes, and bad decisions made in developing plugins for the Eclipse platform. Many are standard Java programming problems, some are specific to Eclipse. The intent here is not to pick on the perpetrators (in most cases they in fact eagerly contributed the blooper information!) but rather to help other developers avoid similar pitfalls. The bloopers have been contributed from the community and may have been discovered in Eclipse SDK code or in third party plugins. Since all of Eclipse is implemented as plugins, the issues are usually generally relevant.

Each blooper is structured as a statement of the scenario followed by suggested techniques for avoiding the problem(s). In some cases there are clear steps, in others there really is no solution except to follow the advice of a wise doctor and "don't do that".

The set of bloopers is (sadly) always growing. This site is intended as a resource for developers to consult to build their general knowledge of problems, techniques, etc. Check back often and contribute your own bloopers.

Contents

String.substring()

The java.lang.String.substring(...) method is usually implemented by creating a new String object that points back to the same underlying char[] as the receiver, but with different offset and length values. Therefore, if you take a very large string, create a substring of length 1, then discard the large string, the little substring may still hold onto the very large char[].

A nasty variant of this blooper is when the substring is later interned by calling String.intern(). On some VMs, this means the large char[] object is now held onto forever by the VM's intern pool. Kiss that memory good-bye, because there's no way to free it again.

Avoidance techniques:

In situations where you know you are creating a small substring and then throwing the large string away, force a copy of the string to be created by calling new String(substring). This seems counter-intuitive from a performance perspective because it creates extra objects, but it can be worthwhile if the substring is being retained for a long period. In one particular case in the Eclipse JDT plugins, copying the substring yielded a 550KB space savings. Not bad for a one line fix!

Unbufferred I/O

With most flavours of java.io.InputStream and java.io.OutputStream, buffering doesn't come for free. This means that every single read and write call may result in disk or network I/O. Similarly in Eclipse, the streams returned by methods such as org.eclipse.core.resources.IFile#getContents, or created by opening an InputStream on an Eclipse URL are not buffered.

Avoidance techniques:

The solution in this case is simple. Just wrap the stream in a java.io.BufferedInputStream or BufferedOutputStream. If you have a good idea of the amount of bytes that need reading or writing, you can even set the stream's buffer size appropriately.

Strings in the plug-in registry

In Eclipse 2.0.* and before, it was generally assumed that there would be hundreds of plugins and that the plugin registry, while sizeable, could reasonably be held in memory. As Eclipse-based products came to market we discovered that developers were taking the plugin model to heart and were creating hundreds of plugins for one produce. Our assumptions were being tested...

One of the key failings in this area was the use of Strings. (Note this is actually a more general problem but reared its ugly head in a very tangible way here) All aspects of plugins (extensions, extension points, plugins/fragments, ...) are defined in terms of String identifiers. When the platform starts it parses the plugin.xml/fragment.xml files and builds a regstry. This registry is essentially a complete parse tree of all parsed files (i.e., a mess of Strings). In general the String identifiers are not needed for human readability but rather code based access and matching. Unfortunately, Strings are one of the least space efficient data forms in Java (e.g., a 25 character string requires approximately 90 bytes of storage).

Further, typical code patterns for registry access involve the declaration of some constant, for example:

   public static final String ID = "org.eclipse.core.resources";

And then the use of this constant to access the registry:

   Platform.getRegistry().getPlugin(ID);

In this case, the character sequence "org.eclipse.core.resources" (26 characters) is stored as UTF8 in constant pool of each class using the constant and, in typical JVMs, on first use, the UTF8 encoding is used to create and intern a real String object. Note that this String object is equal but not identical to the one created during registry parsing. The net result is that the total space required for this identifier usecase is:

   (space for "org.eclipse.core.resources" * 2) + (space for UTF8 * number, N, of loaded referencing classes)
   ((44 + 2 * 26) * 2) + (26 * N) = 192 + 26 = 218bytes (where N > 1)

Obviously as platform installs move from hundreds to thousands of plugins this approach does not scale.

Avoidance Techniques:

The first thing to observe is that this was a design flaw. The initial design should not have relied on the registry being omni-present. The second observation is that Strings as identifiers are easy to read but terribly inefficient. Third, changing the behaviour in a fundamental way is difficult as much of the implementation is dictated by API (which cannot be changed).

With those points in mind, there are several possible approaches for better performance.

  1. Intern the registry strings: This is perhaps the easiest to implement. Since the strings used in the methods are intern()'d in the system's symbol table, the registry can share the strings by intern()'ing its strings there as well. This costs a little more at parse time but saves one copy of the string or (44 + 2 * M) bytes. One side effect of this is the performance degradation of intern(). On some JVM implementations the performance of intern() degrades dramatically. Interning the registry strings eagerly and early seeds the intern() table increasing the collision rate.
  2. Use a private intern table: Within the registry there are many duplicate strings. These can be eliminated without overloading the system's intern() table by using a secondary table. The duplication between the strings in the code and those in the registry would not be eliminated.
  3. Avoid strings: In general the ids are used for matching/looking elements in the registry. The only compelling reason to use Strings is so they are humanly readable in the plugin.xml files. Some sort of mechanism which retains the needed information but uses primitive types (e.g., int) as keys would address the issue without losing the useability. Unfortunately, this approach is very attractive but difficult after the fact as most of the platform runtime's API is specified in terms of string ids.
  4. Swap out the registry: The registry is typically used only when plugins are activated. As such, most or all of it could be written to disk and reread on demand.

In Eclipse 3.1, the fourth approach was taken. All registry data structures are now loaded from disk on demand, and flushed from memory when not in use by employing soft references (java.lang.ref.SoftReference). For an application that is not consulting the registry, memory usage for the extension registry has effectively been reduced to zero.

Excessive crawling of the extension registry

As described in the previous blooper, the Eclipse extension registry is now loaded from disk on demand, and discarded when no longer referenced. This speed/space trade-off has created the possibility of a whole new category of performance blooper for clients of the registry. For example, here is a block of code that was actually discovered in a third-party plugin:

IExtensionRegistry registry = Platform.getExtensionRegistry(); IExtensionPoint[] points = registry.getExtensionPoints(); for (int i = 0; i < points.length; i++) { IExtension[] extensions = points[i].getExtensions(); for (int j = 0; j < extensions.length; j++) { IConfigurationElement[] configs = extensions[j].getConfigurationElements(); for (int k = 0; k < configs.length; k++) { if (configs[k].getName().equals("some.name")) //do something with this config } } }

Prior to Eclipse 3.1, the above code was actually not that terrible. Alhough the extension registry has been loaded lazily since Eclipse 2.1, it always stayed in memory once loaded. If the above code ran after the registry was in memory, most of the registry API calls were quite fast. This is no longer true. In Eclipse 3.1, the above code will now cause the entire extension registry, several megabytes for a large Eclipse-based product, to be loaded into memory. While this is an extreme case, there are plenty of examples of code that is performing more registry access than necessary. These inefficiences were not apparent with a memory-resident extension registry.

Avoidance techniques:

Avoid calling extension registry API when not needed. Use shortcuts as much as possible. For example, directly call IExtensionRegistry.getExtension(...) rather than IExtensionRegistry.getExtensionPoint(...).getExtension(...).

Some extra shortcut methods were added in Eclipse 3.1 to help clients avoid unnecessary registry access. For example, to find the plugin ID (namespace) for a configuration element, clients would previously call IConfigurationElement.getDeclaringExtension().getNamespace(). It is much more efficient to call the new IConfigurationElement.getNamespace() method directly, saving the IExtension object from potentially being loaded from disk.

Message catalog keys

The text messages required for a particular plugin are typically contained in one or more Java properties files. These message bundles have key-value pairs where the key is some useful token that humans can read and the value is the translated text of the message. Plugins are responsible for loading and maintaining the messags. Typically this is done on demand either when the plugin is started or when the first message from a particular bundle is needed. Loading one message typically loads all messages in the same bundle.

There are several problems with this situation:

  1. Again we have the inefficient use of Strings as identifiers. Other than readability in the properties file, having human readable keys is not particularly compelling. Assuming the use of constants, int values would be just as functional.
  2. Similarly, the use of String keys requires the use of Hashtables to store the loaded message bundles. Some array based structure would be more efficient.
  3. The Eclipse SDK contains tooling which helps users "externalize" their Strings. That is, it replaces embedded Strings with message references and builds the entries in the message bundles. This tool can generate the keys for the messages as they are discovered. Unfortunately, the generated keys are based on the fully qualified class/method name where the string was discovered. This makes for quite long keys (e.g., keys greater than 90 characters long were discovered in some of the Debug plugins).

Avoidance Techniques: There are several facets to this problem but the basic lesson here is to understand the space you are using. Long keys are not particularly useful and just waste space. String keys are good for developers but end-users pay the space cost. Mechanisms like bundle loading/management which are going to be used through out the entire system should be well thought out and supplied to developers rather than leaving it up to each to do their own (inefficient) implementation.

With that in mind, below are some of the many possible alternatives:

  1. Shorter keys: Clearly the message keys should be useful but not excessively long.
  2. Use the Eclipse 3.1 ["http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/documents/3.1/message_bundles.html message bundle] facility, org.eclipse.osgi.util.NLS. This API binds each message in your catalog to a Java field, eliminating the notion
 of keys entirely, yielding a huge memory improvement over the basic Java PropertyResourceBundle.



Back to Performance

Back to Eclipse Project