Performance Bloopers

From Eclipsepedia

Jump to: navigation, search

Below is a collection of goofs, mistakes, and bad decisions made in developing plugins for the Eclipse platform. Many are standard Java programming problems, some are specific to Eclipse. The intent here is not to pick on the perpetrators (in most cases they in fact eagerly contributed the blooper information!) but rather to help other developers avoid similar pitfalls. The bloopers have been contributed from the community and may have been discovered in Eclipse SDK code or in third party plugins. Since all of Eclipse is implemented as plugins, the issues are usually generally relevant.

Each blooper is structured as a statement of the scenario followed by suggested techniques for avoiding the problem(s). In some cases there are clear steps, in others there really is no solution except to follow the advice of a wise doctor and "don't do that".

The set of bloopers is (sadly) always growing. This site is intended as a resource for developers to consult to build their general knowledge of problems, techniques, etc. Check back often and contribute your own bloopers.

Blooper: String.substring()

The java.lang.String.substring(...) method is usually implemented by creating a new String object that points back to the same underlying char[] as the receiver, but with different offset and length values. Therefore, if you take a very large string, create a substring of length 1, then discard the large string, the little substring may still hold onto the very large char[].

A nasty variant of this blooper is when the substring is later interned by calling String.intern(). On some VMs, this means the large char[] object is now held onto forever by the VM's intern pool. Kiss that memory good-bye, because there's no way to free it again.

Avoidance techniques:

In situations where you know you are creating a small substring and then throwing the large string away, force a copy of the string to be created by calling new String(substring). This seems counter-intuitive from a performance perspective because it creates extra objects, but it can be worthwhile if the substring is being retained for a long period. In one particular case in the Eclipse JDT plugins, copying the substring yielded a 550KB space savings. Not bad for a one line fix!

Blooper: Unbufferred I/O

With most flavours of java.io.InputStream and java.io.OutputStream, buffering doesn't come for free. This means that every single read and write call may result in disk or network I/O. Similarly in Eclipse, the streams returned by methods such as org.eclipse.core.resources.IFile#getContents, or created by opening an InputStream on an Eclipse URL are not buffered.

Avoidance techniques:

The solution in this case is simple. Just wrap the stream in a java.io.BufferedInputStream or BufferedOutputStream. If you have a good idea of the amount of bytes that need reading or writing, you can even set the stream's buffer size appropriately.

Blooper: Strings in the plug-in registry

In Eclipse 2.0.* and before, it was generally assumed that there would be hundreds of plugins and that the plugin registry, while sizeable, could reasonably be held in memory. As Eclipse-based products came to market we discovered that developers were taking the plugin model to heart and were creating hundreds of plugins for one produce. Our assumptions were being tested...

One of the key failings in this area was the use of Strings. (Note this is actually a more general problem but reared its ugly head in a very tangible way here) All aspects of plugins (extensions, extension points, plugins/fragments, ...) are defined in terms of String identifiers. When the platform starts it parses the plugin.xml/fragment.xml files and builds a regstry. This registry is essentially a complete parse tree of all parsed files (i.e., a mess of Strings). In general the String identifiers are not needed for human readability but rather code based access and matching. Unfortunately, Strings are one of the least space efficient data forms in Java (e.g., a 25 character string requires approximately 90 bytes of storage).

Further, typical code patterns for registry access involve the declaration of some constant, for example:

   public static final String ID = "org.eclipse.core.resources";

And then the use of this constant to access the registry:

   Platform.getRegistry().getPlugin(ID);

In this case, the character sequence "org.eclipse.core.resources" (26 characters) is stored as UTF8 in constant pool of each class using the constant and, in typical JVMs, on first use, the UTF8 encoding is used to create and intern a real String object. Note that this String object is equal but not identical to the one created during registry parsing. The net result is that the total space required for this identifier usecase is:

   (space for "org.eclipse.core.resources" * 2) + (space for UTF8 * number, N, of loaded referencing classes)
   ((44 + 2 * 26) * 2) + (26 * N) = 192 + 26 = 218bytes (where N > 1)

Obviously as platform installs move from hundreds to thousands of plugins this approach does not scale.

Avoidance Techniques:

The first thing to observe is that this was a design flaw. The initial design should not have relied on the registry being omni-present. The second observation is that Strings as identifiers are easy to read but terribly inefficient. Third, changing the behaviour in a fundamental way is difficult as much of the implementation is dictated by API (which cannot be changed).

With those points in mind, there are several possible approaches for better performance.

  1. Intern the registry strings: This is perhaps the easiest to implement. Since the strings used in the methods are intern()'d in the system's symbol table, the registry can share the strings by intern()'ing its strings there as well. This costs a little more at parse time but saves one copy of the string or (44 + 2 * M) bytes. One side effect of this is the performance degradation of intern(). On some JVM implementations the performance of intern() degrades dramatically. Interning the registry strings eagerly and early seeds the intern() table increasing the collision rate.
  2. Use a private intern table: Within the registry there are many duplicate strings. These can be eliminated without overloading the system's intern() table by using a secondary table. The duplication between the strings in the code and those in the registry would not be eliminated.
  3. Avoid strings: In general the ids are used for matching/looking elements in the registry. The only compelling reason to use Strings is so they are humanly readable in the plugin.xml files. Some sort of mechanism which retains the needed information but uses primitive types (e.g., int) as keys would address the issue without losing the useability. Unfortunately, this approach is very attractive but difficult after the fact as most of the platform runtime's API is specified in terms of string ids.
  4. Swap out the registry: The registry is typically used only when plugins are activated. As such, most or all of it could be written to disk and reread on demand.

In Eclipse 3.1, the fourth approach was taken. All registry data structures are now loaded from disk on demand, and flushed from memory when not in use by employing soft references (java.lang.ref.SoftReference). For an application that is not consulting the registry, memory usage for the extension registry has effectively been reduced to zero.


Back to Performance

Back to Eclipse Project