Jump to: navigation, search

Update Site Optimization

Revision as of 16:06, 7 May 2009 by Jpljpl.gmx.de (Talk | contribs) (Be more specific about what the created "digest" is)

The Problem

The Eclipse Install/Update design concept includes grouping artifacts called features which are published on an Update Site located on a remote server. A feature consists of the feature manifest file and other resources placed in a single JAR archive. When directed at the update site, Eclipse Update Manager must download each of these JARs and parse the manifest in order to perform activities such as site browsing, searching, dependency checking etc.

This approach works reasonably well for moderate update sites, but does not scale well for large sites like Callisto. Each of the feature JARs is small, but opening a connection and downloading this small JAR is costly and adds up. Even worse, users need to pay this price BEFORE they even decide if they want to install anything from the site. A solution is needed to reduce the number of connections simply to browse or search the update site.

Once the features to install have been selected, Update needs to physically download plug-in JARs onto user's machine. At this point, payload size ceases to be trivial - a full Callisto download is several hundred megabytes. A technique to reduce the payload size would benefit users who are downloading the full Callisto set.

The Solution

The solution comes in two parts: the site digest, and the use of Pack200. The site digest is produced by merging all the information needed for browsing and searching a site into one file that is archived for size and can be downloaded using one connection instead the many separate connections needed to download the features. Pack200 is a jar compression utility that is part of J2SE 5.0 that will reduce the size of the jars significantly.

Both these solutions require enhancements of the Install/Update code to make Update capable of consuming these artifacts. However these performance enhancements are optional and Install/Update should continue to perform as normal in their absence.

Builds, Update Sites and the Site Optimizer

There are two sides to this solution, steps that must be taken during a component's build, and steps that are taken on the update site itself.

To ensure that the jars downloaded from an update site are the same as jars downloaded in a zip distribution, the jars need to be normalized (or repacked) during the build process (see Conditioning build results below). This is especially true if the jars will be signed. If the jars are being sent to the Eclipse Foundation to be signed, then this repacking will be done at that time. The actual build of the digest and packing of the jars can be considered a separate step and can be done on the update site itself.

The Site Optimizer

The org.eclipse.update.core bundle provides an application extension named org.eclipse.update.core.siteOptimizer which can be invoked from the command line.

java -jar /eclipse/startup.jar -application org.eclipse.update.core.siteOptimizer [options]

If your Eclipse installation does not contain startup.jar, use org.eclipse.equinox.launcher_version.jar from the plugins directory instead. The site optimizer application exposes the digest builder and the jar processor. The digest builder is the tool that creates the actual site digest, the jar processor is a tool that can repack, sign, pack or unpack a jar and all its nested jars recursively.

The Update Site

If the update site is going to contain packed jars, then the site.xml file should specify that it supports pack200 by setting the pack200 attribute: <site pack200="true">. This lets the Update Manager know that the site contains packed jars, and it will look for a .jar.pack.gz file beside the .jar file that it would normally download. If the .jar.pack.gz file is found, it will be downloaded and unpacked, otherwise the .jar file is downloaded as normal.

The site optimizer is used on the update site to build the digest.zip file and do the actual packing of the jars:

java -jar /eclipse/startup.jar -application org.eclipse.update.core.siteOptimizer -digestBuilder
  -digestOutputDir=/eclipse/digest -siteXML=/eclipse/site/site.xml  -jarProcessor -pack -outputDir /eclipse/site /eclipse/site

This command will build the digest and traverse the /eclipse/site directory structure to pack all the conditioned jars it finds. By default the jarProcessor will only process jar files that have been previously marked as conditioned. The output of a pack is a .pack.gz file, so the result is that beside each conditioned jar, there will be a jar.pack.gz file.

Conditioning Build Results with the JarProcessor

By default the JarProcessor will only process jar files that have been marked as conditioned. This means that the siteOptimizer can be run to pack an entire update site and only those jars that have been previously conditioned during their build will get packed. The JarProcessor marks a given jar as conditioned by placing a META-INF/eclipse.inf file in the jar.

Because the JarProcessor will skip unmarked jars by default, in order to condition and mark jars in the first place, the argument -processAll must be provided to tell the JarProcessor to process the unmarked jars.

Exactly when the conditioning should be performed depends on how the build is organized. If the build first builds update jars that are repackaged into the download zips, then the optimizer should be run on those update jars before they are repackaged. If the build produces the download zips first, then the optimizer should be run on the download zips. In both cases, we have either a zip full of jars, or a zip full of directories that contain jars. The site optimizer can take this zip as input and output a similarly shaped zip containing the repacked (and optionally signed) jars.

Below is an example of an ant call to perform the conditioning during a build. It invokes java 1.5 on the startup.jar located in ${baseLocation}. The org.eclipse.update.core.siteOptimizer application is run with the -jarProcessor option. The input is the ${buildDirectory}/${buildLabel}/${archiveName} file and the output is an archive with the same name located in ${outSite}.

        <java jar="${baseLocation}/startup.jar" 
            <arg line="-application org.eclipse.update.core.siteOptimizer" /> 
            <arg line="-jarProcessor -verbose -processAll -repack -outputDir ${outSite}" />
            <arg line="${buildDirectory}/${buildLabel}/${archiveName}" /> 

See the jar processor page for details on the options available for the jar processor.

What if I don't have Java 5 (or my Java implementation doesn't have unpack200)?

If the client being updated is not running Java 5.0 or the unpack200 executable cannot be found by other means, then the Update client will not attempt to retrieve the *.pack.gz files.

Related Pages