Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Performance Bloopers

Below is a collection of goofs, mistakes, and bad decisions made in developing plugins for the Eclipse platform. Many are standard Java programming problems, some are specific to Eclipse. The intent here is not to pick on the perpetrators (in most cases they in fact eagerly contributed the blooper information!) but rather to help other developers avoid similar pitfalls. The bloopers have been contributed from the community and may have been discovered in Eclipse SDK code or in third party plugins. Since all of Eclipse is implemented as plugins, the issues are usually generally relevant.

Each blooper is structured as a statement of the scenario followed by suggested techniques for avoiding the problem(s). In some cases there are clear steps, in others there really is no solution except to follow the advice of a wise doctor and "don't do that".

The set of bloopers is (sadly) always growing. This site is intended as a resource for developers to consult to build their general knowledge of problems, techniques, etc. Check back often and contribute your own bloopers.

String.substring()

The java.lang.String.substring(...) method is usually implemented by creating a new String object that points back to the same underlying char[] as the receiver, but with different offset and length values. Therefore, if you take a very large string, create a substring of length 1, then discard the large string, the little substring may still hold onto the very large char[].

A nasty variant of this blooper is when the substring is later interned by calling String.intern(). On some VMs, this means the large char[] object is now held onto forever by the VM's intern pool. Kiss that memory good-bye, because there's no way to free it again.

Avoidance techniques:

In situations where you know you are creating a small substring and then throwing the large string away, force a copy of the string to be created by calling new String(substring). This seems counter-intuitive from a performance perspective because it creates extra objects, but it can be worthwhile if the substring is being retained for a long period. In one particular case in the Eclipse JDT plugins, copying the substring yielded a 550KB space savings. Not bad for a one line fix!

Unbufferred I/O

With most flavours of java.io.InputStream and java.io.OutputStream, buffering doesn't come for free. This means that every single read and write call may result in disk or network I/O. Similarly in Eclipse, the streams returned by methods such as org.eclipse.core.resources.IFile#getContents, or created by opening an InputStream on an Eclipse URL are not buffered.

Avoidance techniques:

The solution in this case is simple. Just wrap the stream in a java.io.BufferedInputStream or BufferedOutputStream. If you have a good idea of the amount of bytes that need reading or writing, you can even set the stream's buffer size appropriately.

Strings in the plug-in registry

In Eclipse 2.0.* and before, it was generally assumed that there would be hundreds of plugins and that the plugin registry, while sizeable, could reasonably be held in memory. As Eclipse-based products came to market we discovered that developers were taking the plugin model to heart and were creating hundreds of plugins for one produce. Our assumptions were being tested...

One of the key failings in this area was the use of Strings. (Note this is actually a more general problem but reared its ugly head in a very tangible way here) All aspects of plugins (extensions, extension points, plugins/fragments, ...) are defined in terms of String identifiers. When the platform starts it parses the plugin.xml/fragment.xml files and builds a regstry. This registry is essentially a complete parse tree of all parsed files (i.e., a mess of Strings). In general the String identifiers are not needed for human readability but rather code based access and matching. Unfortunately, Strings are one of the least space efficient data forms in Java (e.g., a 25 character string requires approximately 90 bytes of storage).

Further, typical code patterns for registry access involve the declaration of some constant, for example:

   public static final String ID = "org.eclipse.core.resources";

And then the use of this constant to access the registry:

   Platform.getRegistry().getPlugin(ID);

In this case, the character sequence "org.eclipse.core.resources" (26 characters) is stored as UTF8 in constant pool of each class using the constant and, in typical JVMs, on first use, the UTF8 encoding is used to create and intern a real String object. Note that this String object is equal but not identical to the one created during registry parsing. The net result is that the total space required for this identifier usecase is:

   (space for "org.eclipse.core.resources" * 2) + (space for UTF8 * number, N, of loaded referencing classes)
   ((44 + 2 * 26) * 2) + (26 * N) = 192 + 26 = 218bytes (where N > 1)

Obviously as platform installs move from hundreds to thousands of plugins this approach does not scale.

Avoidance Techniques:

The first thing to observe is that this was a design flaw. The initial design should not have relied on the registry being omni-present. The second observation is that Strings as identifiers are easy to read but terribly inefficient. Third, changing the behaviour in a fundamental way is difficult as much of the implementation is dictated by API (which cannot be changed).

With those points in mind, there are several possible approaches for better performance.

  1. Intern the registry strings: This is perhaps the easiest to implement. Since the strings used in the methods are intern()'d in the system's symbol table, the registry can share the strings by intern()'ing its strings there as well. This costs a little more at parse time but saves one copy of the string or (44 + 2 * M) bytes. One side effect of this is the performance degradation of intern(). On some JVM implementations the performance of intern() degrades dramatically. Interning the registry strings eagerly and early seeds the intern() table increasing the collision rate.
  2. Use a private intern table: Within the registry there are many duplicate strings. These can be eliminated without overloading the system's intern() table by using a secondary table. The duplication between the strings in the code and those in the registry would not be eliminated.
  3. Avoid strings: In general the ids are used for matching/looking elements in the registry. The only compelling reason to use Strings is so they are humanly readable in the plugin.xml files. Some sort of mechanism which retains the needed information but uses primitive types (e.g., int) as keys would address the issue without losing the useability. Unfortunately, this approach is very attractive but difficult after the fact as most of the platform runtime's API is specified in terms of string ids.
  4. Swap out the registry: The registry is typically used only when plugins are activated. As such, most or all of it could be written to disk and reread on demand.

In Eclipse 3.1, the fourth approach was taken. All registry data structures are now loaded from disk on demand, and flushed from memory when not in use by employing soft references (java.lang.ref.SoftReference). For an application that is not consulting the registry, memory usage for the extension registry has effectively been reduced to zero.

Excessive crawling of the extension registry

As described in the previous blooper, the Eclipse extension registry is now loaded from disk on demand, and discarded when no longer referenced. This speed/space trade-off has created the possibility of a whole new category of performance blooper for clients of the registry. For example, here is a block of code that was actually discovered in a third-party plugin:

IExtensionRegistry registry = Platform.getExtensionRegistry(); IExtensionPoint[] points = registry.getExtensionPoints(); for (int i = 0; i < points.length; i++) { IExtension[] extensions = points[i].getExtensions(); for (int j = 0; j < extensions.length; j++) { IConfigurationElement[] configs = extensions[j].getConfigurationElements(); for (int k = 0; k < configs.length; k++) { if (configs[k].getName().equals("some.name")) //do something with this config } } }

Prior to Eclipse 3.1, the above code was actually not that terrible. Alhough the extension registry has been loaded lazily since Eclipse 2.1, it always stayed in memory once loaded. If the above code ran after the registry was in memory, most of the registry API calls were quite fast. This is no longer true. In Eclipse 3.1, the above code will now cause the entire extension registry, several megabytes for a large Eclipse-based product, to be loaded into memory. While this is an extreme case, there are plenty of examples of code that is performing more registry access than necessary. These inefficiences were not apparent with a memory-resident extension registry.

Avoidance techniques:

Avoid calling extension registry API when not needed. Use shortcuts as much as possible. For example, directly call IExtensionRegistry.getExtension(...) rather than IExtensionRegistry.getExtensionPoint(...).getExtension(...).

Some extra shortcut methods were added in Eclipse 3.1 to help clients avoid unnecessary registry access. For example, to find the plugin ID (namespace) for a configuration element, clients would previously call IConfigurationElement.getDeclaringExtension().getNamespace(). It is much more efficient to call the new IConfigurationElement.getNamespace() method directly, saving the IExtension object from potentially being loaded from disk.

Message catalog keys

The text messages required for a particular plugin are typically contained in one or more Java properties files. These message bundles have key-value pairs where the key is some useful token that humans can read and the value is the translated text of the message. Plugins are responsible for loading and maintaining the messags. Typically this is done on demand either when the plugin is started or when the first message from a particular bundle is needed. Loading one message typically loads all messages in the same bundle.

There are several problems with this situation:

  1. Again we have the inefficient use of Strings as identifiers. Other than readability in the properties file, having human readable keys is not particularly compelling. Assuming the use of constants, int values would be just as functional.
  2. Similarly, the use of String keys requires the use of Hashtables to store the loaded message bundles. Some array based structure would be more efficient.
  3. The Eclipse SDK contains tooling which helps users "externalize" their Strings. That is, it replaces embedded Strings with message references and builds the entries in the message bundles. This tool can generate the keys for the messages as they are discovered. Unfortunately, the generated keys are based on the fully qualified class/method name where the string was discovered. This makes for quite long keys (e.g., keys greater than 90 characters long were discovered in some of the Debug plugins).

Avoidance Techniques: There are several facets to this problem but the basic lesson here is to understand the space you are using. Long keys are not particularly useful and just waste space. String keys are good for developers but end-users pay the space cost. Mechanisms like bundle loading/management which are going to be used through out the entire system should be well thought out and supplied to developers rather than leaving it up to each to do their own (inefficient) implementation.

With that in mind, below are some of the many possible alternatives:

  1. Shorter keys: Clearly the message keys should be useful but not excessively long.
  2. Use the Eclipse 3.1 ["http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/documents/3.1/message_bundles.html message bundle] facility, org.eclipse.osgi.util.NLS. This API binds each message in your catalog to a Java field, eliminating the notion
 of keys entirely, yielding a huge memory improvement over the basic Java PropertyResourceBundle.

Eager preference pages

The JDT UI plugin has a number of preference pages each represented by a class. Each set of preferences has a set of default values. The preference pages have methods which set the preferences to their default value. In Eclipse 2.1, when the JDT UI plugin started, it called the preference initialization method on the various preference page classes. As a result, the preference page classes were loaded. It turns out that a) there are many preference pages and b) the classes sometimes contain extensive UI code. The net result is some 250Kb of code is loaded and typically never used since users rarely consult preferences pages once acceptable values are set.

Avoidance Techniques:

Refactor the code to move the preference initialization code onto dedicated or pre-existing classes. Preference page classes can then be loaded on demand by the workbench's lazy loading mechanism.

Note: This problem has been seen in other plugins. Likely as a result of cut and paste coding with JDT as a base.

Too much work on activation

Plugins are activated as needed. Typically this means that a plugin is activated the first time one of its classes is loaded. On activation, the plugin's runtime class (aka plugin class) is loaded and instantiated and the startup() lifecycle method called. This gives the plugin a chance to do rudimentary initialization and hook itself into the platform more tightly than is allowed by the extension mechanisms in the plugin.xmls.

Unfortunately, developers seize the opportunity and do all manner of work. Also unfortunate is the fact that activation is done in a context free manner. For example, at activation time the JDT Core plugin, for example, does not know why it is being activated. It might be because someone is trying to compile/build some Java, or it might be because class C in some other plugin subclasses a JDT class and C is being loaded. In the former case it would be reasonable for JDT Core to load/initialize required state, create new structures etc. In the latter this would be completely unreasonable.

We have seen cases where literally hundreds of classes and megabytes of code have been loaded (not to mention all the objects created) just to check and see that there was nothing to do.

This behavior impacts platform startup time if the plugins in question contribute to the current UI/project structure or imposes lengthy delays in the user's workflow when they suddenly (often unknowingly) invoke some new function requiring the errant plugin to be activated.</p>

Avoidance Techniques:

The platform provides lazy activation of plugins. Plugins are responsible for efficiently creating their internal structures according to the function required. The startup() method is not the time or place to be doing large scale initialization.

Decorators

The UI plugin provides a mechanism for decorating resources with icons and text (e.g., adding the little 'J' on Java projects or the CVS version number to the resource label). Plugins contribute decorators by extending a UI extension point specifying the kind of element they would like to decorate. When a resource of the identified is displayed, all installed decorators are given a chance to add their bit to the visual presentation. This model/mechanism is simple and clean.

There are performance consequences however:

  • Early plugin activation In many scenarios, plugins get activated well before their function is actually needed. Further, because of the "Too much work at activation" blooper, the activated plugins often did way more work than was required. In many cases whether or not a resource should be decorated is predicated on a simple test (e.g., does it have a particular persistant property). These require almost no code and certainly no complicated domain/model structures.
  • Resource leaks The mechanism can leak images even if individual decorators are careful. decorateImage() wants to return an image. If a decorator simply creates a new image and returns it (i.e., without remembering it) then there is no way of disposing it. To counter this, decorators typically maintain a list of the images they have provided. Unfortunately, this list is monotonically increasing if they still create (but remember) a new image for every decoration request. To counter this, well-behaved decorators cache the images they supply based on some key. The key is typically a combination of the base image provided and the decoration they add. This key then allows decorators to return an already allocated image if the net result of the requested decoration is the same as some previous result. Since decorators are chained, all decorators must have this good behaviour. If just one decorator in a chain returns a new image, then the caching strategies of all following decorators are foiled and once again resources are leaked.
  • Threading Decorators run in foreground which causes problems for some people (e.g., CVS). To workaround this, heavy-weight decorators have a background thread which computes the decorations and then issues a label change event to update the UI. This does not scale. When a label changed event is posted, all decorators are run again. This allows the decorators following the heavy-weight contributor to add their decoration. The net result is a flurry of label change events, decoration requests and UI updates, most of which do little or nothing. Further, the problem gets worse quickly as heavy-weight decorators are added.
  • Code complexity While this is not directly a performance problem, it does lead to performance issues as the code here is complex and hard to test. To do decorators correctly, plugin writers have to write their own caching code as well as their own threading code (assuming they have heavy decorator logic). Both chunks of code are complicated, error prone and likely very much the same from plugin to plugin. Prime candidates for inclusion in the base mechanism.

Avoidance techniques: The UI team tackled this problem by providing more decorator infrastructure.

  • The semantic level of the decorator API was raised so that decorators described their decorations rather than directly acting. This allows the UI mechanisms to manage a central image cache and create fewer intermediate image results by applying all decorations at once.
  • The Workbench also manages a background decoration thread. All heavy-weight decorators are run together in the background and their results combined and presented in one label changed event.
  • Static decoration information can now be declared in the plugin.xml. This allows plugins to contribute decorators without loading/running any of their code (a big win!!). The plugin describes the conditions for decoration (based on the existence of properties, resource types, etc) and the decoration image and position. The Workbench does the rest.

PDE cycle detection

PDE Core used to have a linear list of plug-in models generated by parsing manifest files. Meanwhile, manifest editor has a small 'Issues and Action Items' area in the Overview page. Among other things, this area shows problems related to the plug-in to which the manifest file belongs. One of the problems that can be detected is cyclical plug-in dependencies. When opened, this section will initiate a cycle detection computation.

Cycle detection computation follows the dependency graphs trying to find closures. It follows the graph by looping through the plug-in IDs, looking up plug-in models that match the IDs, then recursively follows their dependencies. In the original implementation, each ID->model lookup was done linearly (by iterating over the flat list of models).

Avoidance techniques: In a large product with 600 plug-ins and convoluted dependency tree, we got complaints that manifest editor takes 3 minutes to open in some cases!! After performance analysis, we changed the linear lookup with a hash table (using plug-in ID as the lookup key). The opening time was reduced to 3 seconds (worst case scenario) !!!! And we already had this table in place for other purposes. The actual fix took 2 minutes to do.

Too many resource change listeners

PRE_AUTO_BUILD and POST_AUTO_BUILD resource change listeners have a non-trivial cost associated with them. This is because a new tree layer must be created to allow the listener to make changes to the tree. It was discovered that of the five BUILD listeners that were typically running, four of them were from the org.eclipse.team.cvs plug-ins. See the bug report for more details.

Avoidance techniques: Minimize use of these listeners. Some ideas:

  • POST_CHANGE listeners have trivial cost... switch to POST_CHANGE where possible
  • Two listeners cost more than one. Try to create just one and delegate the work from there.
  • Consider removing listeners when they are not applicable. For example, if you are listening for changes on a particular file or directory, you may be able to remove that listener when the applicable resource is not present.

Back to Performance

Back to Eclipse Project

Back to the top