Jump to: navigation, search

Difference between revisions of "Update Site Optimization"

(Update to include marking of conditioned jars)
Line 15: Line 15:
 
<p>There are two sides to this solution, steps that must be taken during a component's build, and steps that are taken on the update site itself.</p>
 
<p>There are two sides to this solution, steps that must be taken during a component's build, and steps that are taken on the update site itself.</p>
 
<p>
 
<p>
To ensure that the jars downloaded from an update site are the same as jars downloaded in a zip distribution, the jars need to be normalized (or repacked) during the build process (see the [[Pack200|Pack200 wiki page]]).  This is especially true if the jars will be signed.  If the jars are being [[JAR Signing|sent to the Eclipse Foundation to be signed]], then this repacking will be done at that time.  The actual build of the digest and packing of the jars can be considered a separate step and can be done on the update site itself.</p>
+
To ensure that the jars downloaded from an update site are the same as jars downloaded in a zip distribution, the jars need to be normalized (or repacked) during the build process (see [[Update_Site_Optimization#Conditioning Build Results with the JarProcessor|Conditioning build results]] below).  This is especially true if the jars will be signed.  If the jars are being [[JAR Signing|sent to the Eclipse Foundation to be signed]], then this repacking will be done at that time.  The actual build of the digest and packing of the jars can be considered a separate step and can be done on the update site itself.</p>
 
<br>
 
<br>
 
===The Site Optimizer===
 
===The Site Optimizer===
Line 23: Line 23:
 
</pre>
 
</pre>
 
The site optimizer application exposes the digest builder and the jar processor.  The digest builder is the tool that creates the actual site digest, the jar processor is a tool that can repack, sign, pack or unpack a jar and all its nested jars recursively.
 
The site optimizer application exposes the digest builder and the jar processor.  The digest builder is the tool that creates the actual site digest, the jar processor is a tool that can repack, sign, pack or unpack a jar and all its nested jars recursively.
<p>
 
The site optimizer can be used during a build to do the repacking of the jars.  Exactly when it should be called depends on how the build is organized.  If the build first builds update jars that are repackaged into the download zips, then the optimizer should be run on those update jars before they are repackaged.  If the build produces the download zips first, then the optimizer should be run on the download zips.  In both cases, we have either a zip full of jars, or a zip full of directories that contain jars.  The site optimizer can take this zip as input and output a similarly shaped zip containing the repacked (and optionally signed) jars:
 
<pre>
 
java -jar /eclipse/startup.jar -application org.eclipse.update.core.siteOptimizer -jarProcessor
 
  -repack -outputDir ./out sdk.zip
 
  
java -jar /eclipse/startup.jar -application org.eclipse.update.core.siteOptimizer -jarProcessor
 
  -repack -sign sign_script.sh -outputDir ./out  sdk.zip
 
</pre>
 
</p>
 
<p>See the [[Pack200#Jar Processor|jar processor]] page for details on the options available for the jar processor.</p>
 
<br>
 
 
===The Update Site===
 
===The Update Site===
 
If the update site is going to contain packed jars, then the site.xml file should specify that it supports pack200 by setting the pack200 attribute: <code><site pack200="true"></code>.  This lets the Update Manager know that the site contains packed jars, and it will look for a .jar.pack.gz file beside the .jar file that it would normally download.  If the .jar.pack.gz file is found, it will be downloaded and unpacked, otherwise the .jar file is downloaded as normal.
 
If the update site is going to contain packed jars, then the site.xml file should specify that it supports pack200 by setting the pack200 attribute: <code><site pack200="true"></code>.  This lets the Update Manager know that the site contains packed jars, and it will look for a .jar.pack.gz file beside the .jar file that it would normally download.  If the .jar.pack.gz file is found, it will be downloaded and unpacked, otherwise the .jar file is downloaded as normal.
Line 43: Line 32:
 
   -digestOutputDir=/eclipse/digest -siteXML=/eclipse/site/site.xml  -jarProcessor -pack -outputDir /eclipse/site /eclipse/site
 
   -digestOutputDir=/eclipse/digest -siteXML=/eclipse/site/site.xml  -jarProcessor -pack -outputDir /eclipse/site /eclipse/site
 
</pre>
 
</pre>
This command will build the digest and traverse the /eclipse/site directory structure and pack all the jars it finds.  The output of a pack is a .pack.gz file, so the result is that beside each jar, there will be a jar.pack.gz file.
+
This command will build the digest and traverse the /eclipse/site directory structure to pack all the conditioned jars it finds.  By default the jarProcessor will only process jar files that have been previously marked as conditioned.  The output of a pack is a .pack.gz file, so the result is that beside each conditioned jar, there will be a jar.pack.gz file.
  
 +
===Conditioning Build Results with the JarProcessor===
 +
<p>
 +
By default the JarProcessor will only process jar files that have been marked as conditioned.  This means that the siteOptimizer can be run to pack an entire update site and only those jars that have been previously conditioned during their build will get packed.  The JarProcessor marks a given jar as conditioned by placing a <tt>META-INF/eclipse.inf</tt> file in the jar.
 +
</p>
 +
<p>
 +
Because the JarProcessor will skip unmarked jars by default, in order to condition and mark jars in the first place, the argument <tt>-processAll</tt> must be provided to tell the JarProcessor to process the unmarked jars. </p>
 +
<p>Exactly when the conditioning should be performed depends on how the build is organized.  If the build first builds update jars that are repackaged into the download zips, then the optimizer should be run on those update jars before they are repackaged.  If the build produces the download zips first, then the optimizer should be run on the download zips.  In both cases, we have either a zip full of jars, or a zip full of directories that contain jars.  The site optimizer can take this zip as input and output a similarly shaped zip containing the repacked (and optionally signed) jars.</p>
 +
 +
<p>Below is an example of an ant call to perform the conditioning during a build.  It invokes java 1.5 on the <tt>startup.jar</tt> located in <tt>${baseLocation}</tt>.  The <tt>org.eclipse.update.core.siteOptimizer</tt> application is run with the -jarProcessor option.  The input is the <tt>${buildDirectory}/${buildLabel}/${archiveName}</tt> file and the output is an archive with the same name located in <tt>${outSite}</tt>.
 +
<pre>
 +
        <java jar="${baseLocation}/startup.jar"
 +
              fork="true"
 +
              jvm="${java15-home}/bin/java"
 +
              failonerror="true"
 +
              maxmemory="256m"
 +
              dir="${buildDirectory}">
 +
            <arg line="-application org.eclipse.update.core.siteOptimizer" />
 +
            <arg line="-jarProcessor -verbose -processAll -repack -outputDir ${outSite}" />
 +
            <arg line="${buildDirectory}/${buildLabel}/${archiveName}" />
 +
        </java>
 +
</pre>
 +
</p>
 +
<p>See the [[Pack200#Jar Processor|jar processor]] page for details on the options available for the jar processor.</p>
 +
<br>
 
==What if I don't have Java 5?==
 
==What if I don't have Java 5?==
 
If the client being updated is not running Java 5.0 and the unpack200 executable cannot be found by other means, then the Update client will not attempt to retrieve the *.pack.gz files.
 
If the client being updated is not running Java 5.0 and the unpack200 executable cannot be found by other means, then the Update client will not attempt to retrieve the *.pack.gz files.

Revision as of 15:22, 24 May 2006

The Problem

The Eclipse Install/Update design concept includes grouping artifacts called features which are published on an Update Site located on a remote server. A feature consists of the feature manifest file and other resources placed in a single JAR archive. When directed at the update site, Eclipse Update Manager must download each of these JARs and parse the manifest in order to perform activities such as site browsing, searching, dependency checking etc.

This approach works reasonably well for moderate update sites, but does not scale well for large sites like Callisto. Each of the feature JARs is small, but opening a connection and downloading this small JAR is costly and adds up. Even worse, users need to pay this price BEFORE they even decide if they want to install anything from the site. A solution is needed to reduce the number of connections simply to browse or search the update site.

Once the features to install have been selected, Update needs to physically download plug-in JARs onto user's machine. At this point, payload size ceases to be trivial - a full Callisto download is several hundred megabytes. A technique to reduce the payload size would benefit users who are downloading the full Callisto set.

The Solution

The solution comes in two parts: the site digest, and the use of Pack200. The site digest is produced by merging all the information needed for browsing and searching a site into one file that is archived for size and can be downloaded using one connection instead the many separate connections needed to download the features. Pack200 is a jar compression utility that is part of J2SE 5.0 that will reduce the size of the jars significantly.

Both these solutions require enhancements of the Install/Update code to make Update capable of consuming these artifacts. However these performance enhancements are optional and Install/Update should continue to perform as normal in their absence.


Builds, Update Sites and the Site Optimizer

There are two sides to this solution, steps that must be taken during a component's build, and steps that are taken on the update site itself.

To ensure that the jars downloaded from an update site are the same as jars downloaded in a zip distribution, the jars need to be normalized (or repacked) during the build process (see Conditioning build results below). This is especially true if the jars will be signed. If the jars are being sent to the Eclipse Foundation to be signed, then this repacking will be done at that time. The actual build of the digest and packing of the jars can be considered a separate step and can be done on the update site itself.


The Site Optimizer

The org.eclipse.update.core bundle provides an application extension named org.eclipse.update.core.siteOptimizer which can be invoked from the command line.

java -jar /eclipse/startup.jar -application org.eclipse.update.core.siteOptimizer [options]

The site optimizer application exposes the digest builder and the jar processor. The digest builder is the tool that creates the actual site digest, the jar processor is a tool that can repack, sign, pack or unpack a jar and all its nested jars recursively.

The Update Site

If the update site is going to contain packed jars, then the site.xml file should specify that it supports pack200 by setting the pack200 attribute: <site pack200="true">. This lets the Update Manager know that the site contains packed jars, and it will look for a .jar.pack.gz file beside the .jar file that it would normally download. If the .jar.pack.gz file is found, it will be downloaded and unpacked, otherwise the .jar file is downloaded as normal.

The site optimizer is used on the update site to build the digest and do the actual packing of the jars:

java -jar /eclipse/startup.jar -application org.eclipse.update.core.siteOptimizer -digestBuilder
  -digestOutputDir=/eclipse/digest -siteXML=/eclipse/site/site.xml  -jarProcessor -pack -outputDir /eclipse/site /eclipse/site

This command will build the digest and traverse the /eclipse/site directory structure to pack all the conditioned jars it finds. By default the jarProcessor will only process jar files that have been previously marked as conditioned. The output of a pack is a .pack.gz file, so the result is that beside each conditioned jar, there will be a jar.pack.gz file.

Conditioning Build Results with the JarProcessor

By default the JarProcessor will only process jar files that have been marked as conditioned. This means that the siteOptimizer can be run to pack an entire update site and only those jars that have been previously conditioned during their build will get packed. The JarProcessor marks a given jar as conditioned by placing a META-INF/eclipse.inf file in the jar.

Because the JarProcessor will skip unmarked jars by default, in order to condition and mark jars in the first place, the argument -processAll must be provided to tell the JarProcessor to process the unmarked jars.

Exactly when the conditioning should be performed depends on how the build is organized. If the build first builds update jars that are repackaged into the download zips, then the optimizer should be run on those update jars before they are repackaged. If the build produces the download zips first, then the optimizer should be run on the download zips. In both cases, we have either a zip full of jars, or a zip full of directories that contain jars. The site optimizer can take this zip as input and output a similarly shaped zip containing the repacked (and optionally signed) jars.

Below is an example of an ant call to perform the conditioning during a build. It invokes java 1.5 on the startup.jar located in ${baseLocation}. The org.eclipse.update.core.siteOptimizer application is run with the -jarProcessor option. The input is the ${buildDirectory}/${buildLabel}/${archiveName} file and the output is an archive with the same name located in ${outSite}.

        <java jar="${baseLocation}/startup.jar" 
              fork="true" 
              jvm="${java15-home}/bin/java" 
              failonerror="true" 
              maxmemory="256m" 
              dir="${buildDirectory}"> 
            <arg line="-application org.eclipse.update.core.siteOptimizer" /> 
            <arg line="-jarProcessor -verbose -processAll -repack -outputDir ${outSite}" />
            <arg line="${buildDirectory}/${buildLabel}/${archiveName}" /> 
        </java> 

See the jar processor page for details on the options available for the jar processor.


What if I don't have Java 5?

If the client being updated is not running Java 5.0 and the unpack200 executable cannot be found by other means, then the Update client will not attempt to retrieve the *.pack.gz files.

Related Pages