Jump to: navigation, search

Difference between revisions of "Project Download Stats"

(HOWTO get download stats (file, domain, country) and trend plots - the easy way!)
(p2 notes)
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Querying & Collating Eclipse.org Project Download Stats =
+
Projects hosted at Eclipse.org can track download requests and issue queries for statistical and trending purposes.  There are two mechanisms to record a download request:
  
With a few simple steps, you can get [http://www.eclipse.org/emf/downloads/downloads.php download statistics] for your project like this:
+
* For ZIP file downloads, using the mirrors
 +
* For p2 repositories, using Equinox p2 download stats
  
<table>
 
<tr valign="top">
 
<td>[[Image:Download_stats_some_countries.gif]]</td>
 
<td>[[Image:Download_stats_2week_trend.gif]]</td>
 
<td>[[Image:Download_stats_some_files.gif]]</td>
 
</tr>
 
</table>
 
  
This engine consists of 4 pieces:
+
== ZIP files ==
  
* ''<tt style="color:DarkGreen">stats.php</tt>''
+
When linking to a ZIP file on download.eclipse.org, projects must use mirrors. The link would look like this:
** script to query sql tables and return html/xml
+
* ''<tt style="color:DarkGreen">genXML.sh(.txt)</tt>''
+
** shell script to generate data from above php script
+
* ''<tt style="color:DarkGreen">downloads.php</tt>''
+
** php script to collate/sort/filter stored data
+
* ''<tt style="color:DarkGreen">xml/nightly/*.xml</tt>'',  
+
''<tt style="color:DarkGreen">xml/weekly/*.xml</tt>'',
+
''<tt style="color:DarkGreen">xml/monthly/*.xml</tt>''
+
** sample (real) data for EMF, Jan 01 - Mar 02, 2006.
+
  
The above code is in CVS here:  
+
    http://eclipse.org/downloads/download.php?file=/path/to/file.zip
  
''<tt style="color:DarkGreen">anonymous@dev.eclipse.org:/cvsroot/org.eclipse/www/emf/downloads/</tt>''
+
where /path/to/file.zip is the path relative to download.eclipse.org. When a user picks a mirror site, the download request is automatically logged.
  
Below are the three steps required to set up some or all of these tools. Note that for graphics (TLD icons and bars) you will also need to copy image files from here:
 
  
''<tt style="color:DarkGreen">anonymous@dev.eclipse.org:/cvsroot/org.eclipse/www/emf/images/</tt>''
+
== Equinox p2 download stats ==
  
== Step 1: Query On Demand ==
+
To enable download stats on your p2 repository, please see [[Equinox p2 download stats]]. Use the following URL for your stats server: http://download.eclipse.org/stats/
+
1. Install ''<tt style="color:DarkGreen">stats.php</tt>'' on your website, eg. commit it to
+
''<tt style="color:DarkGreen">/cvsroot/org.eclipse/www/emf/downloads/stats.php</tt>'' so that
+
it's accessible via http as http://www.eclipse.org/emf/downloads/stats.php
+
  
2. Register it with the webmaster so that you will have access to the
+
Please see the p2 notes below.
SQL class, '''dbconnection_downloads_ro.class.php'''. Without this, the queries
+
will fail.
+
  
3. Tweak the script using your own username/password restrictions, and set
+
== Querying the database ==
your own filenames for which to query.
+
  
4. When you are satisfied with the queries, displayed as either HTML or XML,
+
Log into the [http://portal.eclipse.org/ MyFoundation portal], and use the [tools] for all Committers option of the Eclipse Projects portlet.  
you can now automate the nightly/weekly/monthly collection of data snapshots.
+
  
== Step 2: Query On Schedule / Archived Snapshots ==
+
Enter a partial filename to search for, relative to the downloads area. The broader the search, the longer it will take to return results, so try to identify the core files that make up one user download. Use the % (percentage) as a wildcard to substitute multiple characters, or the _ (underscore) wildcard to substitute one character. '''You don't need to use % at the beginning or the end'''.
  
1. Install ''<tt style="color:DarkGreen">genXML.sh.txt</tt>'' on some linux-capable machine
+
Sample filenames:
(native or cygwin). Rename it to ''<tt style="color:DarkGreen">genXML.sh</tt>'' and set it
+
executable ('''<tt style="color:DarkRed">chmod 700 genXML.sh</tt>''').
+
  
2. Check that the script works - you will need ''bash'' and ''wget'' installed on your linux
+
    '''/tools/emf/downloads/drops/2.0.1/R''' : All 2.0.1 release build files (ending wildcard assumed)
system. Run the script w/o options ('''<tt style="color:DarkRed">./genXML.sh</tt>''') to see
+
    '''webtools%1.0.1%zip'''                : All Webtools 1.0.1 zip files
its usage instructions. You can also read the script to see additional crontab examples.
+
    '''eclipse-SDK%3.2.2'''                 : All the eclipse SDK ZIP files for release 3.2.2
 +
    '''/birt%1.0%zip'''                     : All birt 1.0 zip files
 +
    '''/stats/%org.eclipse.wst.xml'''        : All the org.eclipse.wst.xml downloads in all p2 repos
  
3. To install the script to your crontab, edit your crontab and add entries for what you'd
+
If you need suggestions for your download stats, please contact the webmaster.  
like to do - set a nightly, weekly, monthly schedule for when you want to collect new data.  
+
('''<tt style="color:DarkRed">crontab -e</tt>''')
+
  
4. For example, you can copy the following 6 lines into your crontab:
+
[[Image:download_stats.png]]
  
==== crontab entries ====
+
== p2 notes ==
 +
(These notes were compiled from the cross-project list.
  
<pre>
+
# It is better to tie stats to a FEATURE rarther than a bundle. Because bundles come in 2 variants (.pack.gz / .jar) so with a bundle you have duplicate work adding the stats tracker (and, the app from bug 310132 which auto-generates stats properties doesn't support it).
  # nightly stats (previous day) @ 6am, do yesterday's data
+
# Note though that a commercial product which uses a different feature structure than Eclipse Open Source (and so just gets your bundle) won't be counted when you count feature access. That's likely not relevant.  
  00 6 * * * ~/crontab/genXML.N.sh > /dev/null
+
# If you have a feature and bundle with the same ID, don't add the p2.downloadStats property to both. Since it would count each download twice.
 
+
# It may be adviseable to version your stats ID somehow. For instance, use "org.eclipse.rse.core_tm320" or "org.eclipse.wst.ui_helios_sr0" or "org.eclipse.ptp_4.0". If you don't version the stats ID you are tracking, you may run into troubles when we do Helios SR1 which will re-use the same repo location (/releases/helios).
  # weekly (previous week, starting on Sunday) @ 6:20am on Sunday, do previous week's data
+
# Maybe obvious. the stats tracker will not catch people updating from Galileo to Helios. Because the Galileo p2 impl did not have the stats code enabled. You'll only count p2 downloads that happen with Eclipse 3.6.
  20 6 * * 0 ~/crontab/genXML.W.sh > /dev/null
+
# I'm wondering whether the helios aggregator's access to my project repository already counts as a download? Probably not too relevant after all.
 
+
  # monthly (previous full month) @ 6:40am on 1rst of the month, do prev month's data
+
  40 6 1 * * ~/crontab/genXML.M.sh > /dev/null
+
</pre>
+
 
+
5. The above-referenced scripts are wrappers for ''<tt style="color:DarkGreen">genXML.sh</tt>''. Create them thus:
+
 
+
==== ~/crontab/genXML.N.sh ====
+
 
+
<pre>
+
  ~/crontab/genXML.sh -user emf-dev -pass trilobyt3 -F -D -C -dates \
+
    `date --date="$(date +%Y-%m-%d) -1 day" +%Y%m%d` -l \
+
    /var/www/emf/downloads/xml/nightly 2>&1 | tee ~/crontab/logs/genXML.N.log.txt
+
 
+
  ~/crontab/genXML.sh -uml2 -user emf-dev -pass trilobyt3 -F -D -C -dates \
+
    `date --date="$(date +%Y-%m-%d) -1 day" +%Y%m%d` -l \
+
    /var/www/uml2/downloads/xml/nightly 2>&1 | tee ~/crontab/logs/genXML.N.log.txt
+
</pre>
+
 
+
==== ~/crontab/genXML.W.sh ====
+
 
+
<pre>
+
  ~/crontab/genXML.sh -user emf-dev -pass trilobyt3 -F -D -C -weeks \
+
    `date --date="$(date +%Y-%m-%d) -1 week" +%U` -l \
+
    /var/www/emf/downloads/xml/weekly 2>&1 | tee ~/crontab/logs/genXML.W.log.txt
+
 
+
  ~/crontab/genXML.sh -uml2 -user emf-dev -pass trilobyt3 -F -D -C -weeks \
+
    `date --date="$(date +%Y-%m-%d) -1 week" +%U` -l \
+
    /var/www/uml2/downloads/xml/weekly 2>&1 | tee ~/crontab/logs/genXML.W.log.txt
+
</pre>
+
 
+
==== ~/crontab/genXML.M.sh ====
+
 
+
<pre>
+
  ~/crontab/genXML.sh -user emf-dev -pass trilobyt3 -F -D -C -months \
+
    `date --date="$(date +%Y-%m-15) -1 month" +%m` -l \
+
    /var/www/emf/downloads/xml/monthly 2>&1 | tee ~/crontab/logs/genXML.M.log.txt
+
 
+
  ~/crontab/genXML.sh -uml2 -user emf-dev -pass trilobyt3 -F -D -C -months \
+
    `date --date="$(date +%Y-%m-15) -1 month" +%m` -l \
+
    /var/www/uml2/downloads/xml/monthly 2>&1 | tee ~/crontab/logs/genXML.M.log.txt
+
</pre>
+
 
+
== Step 3: Displaying, Comparing, Plotting & Collating Archived Snapshot Data ==
+
 
+
1. To view your stored data in different ways, you can use
+
''<tt style="color:DarkGreen">downloads.php</tt>''. This should be installed next to wherever your data is collected,
+
eg., if you have a webserver with a ''<tt style="color:DarkGreen">/var/www/</tt>'' root, and you place your data into  
+
''<tt style="color:DarkGreen">/var/www/emf/downloads/xml/</tt>'', this file should be
+
''<tt style="color:DarkGreen">/var/www/emf/downloads/downloads.php</tt>''.
+
 
+
For a real-world example, go here: http://www.eclipse.org/emf/downloads/downloads.php
+
 
+
2. You can customize the way the Files By Type grouping works to suit your specific file names by editing the function getFileType($url).  
+
For the EMF case, this is:
+
 
+
==== downloads.php#getFileType($url) ====
+
 
+
<pre>
+
  function getFileType($url) {
+
    $matches = array(
+
      "Standalone Zip"  => "emf-sdo-xsd-Standalone-",
+
      "Full SDK Zip" => "emf-sdo-xsd-SDK-",
+
      "EMF SDK Zip" => "emf-sdo-SDK-",
+
      "EMF RT Zip" => "emf-sdo-runtime-",
+
      "EMF Update Manager Jar" => "org.eclipse.emf.ecore",
+
      "XSD SDK Zip" => "xsd-SDK-",
+
      "XSD RT Zip" => "xsd-runtime-",
+
      "XSD Update Manager Jar" => "org.eclipse.xsd");
+
foreach ($matches as $label => $match) {
+
      if (false!==strpos($url,$match)) return $label;
+
    }
+
    return "Other Files";
+
  }
+
</pre>
+
 
+
--[[User:Nickb|Nickb]] 17:45, 3 March 2006 (EST)
+

Revision as of 16:06, 24 September 2012

Projects hosted at Eclipse.org can track download requests and issue queries for statistical and trending purposes. There are two mechanisms to record a download request:

  • For ZIP file downloads, using the mirrors
  • For p2 repositories, using Equinox p2 download stats


ZIP files

When linking to a ZIP file on download.eclipse.org, projects must use mirrors. The link would look like this:

   http://eclipse.org/downloads/download.php?file=/path/to/file.zip

where /path/to/file.zip is the path relative to download.eclipse.org. When a user picks a mirror site, the download request is automatically logged.


Equinox p2 download stats

To enable download stats on your p2 repository, please see Equinox p2 download stats. Use the following URL for your stats server: http://download.eclipse.org/stats/

Please see the p2 notes below.

Querying the database

Log into the MyFoundation portal, and use the [tools] for all Committers option of the Eclipse Projects portlet.

Enter a partial filename to search for, relative to the downloads area. The broader the search, the longer it will take to return results, so try to identify the core files that make up one user download. Use the % (percentage) as a wildcard to substitute multiple characters, or the _ (underscore) wildcard to substitute one character. You don't need to use % at the beginning or the end.

Sample filenames:

   /tools/emf/downloads/drops/2.0.1/R : All 2.0.1 release build files (ending wildcard assumed)
   webtools%1.0.1%zip                 : All Webtools 1.0.1 zip files
   eclipse-SDK%3.2.2                  : All the eclipse SDK ZIP files for release 3.2.2
   /birt%1.0%zip                      : All birt 1.0 zip files
   /stats/%org.eclipse.wst.xml        : All the org.eclipse.wst.xml downloads in all p2 repos

If you need suggestions for your download stats, please contact the webmaster.

Download stats.png

p2 notes

(These notes were compiled from the cross-project list.

  1. It is better to tie stats to a FEATURE rarther than a bundle. Because bundles come in 2 variants (.pack.gz / .jar) so with a bundle you have duplicate work adding the stats tracker (and, the app from bug 310132 which auto-generates stats properties doesn't support it).
  2. Note though that a commercial product which uses a different feature structure than Eclipse Open Source (and so just gets your bundle) won't be counted when you count feature access. That's likely not relevant.
  3. If you have a feature and bundle with the same ID, don't add the p2.downloadStats property to both. Since it would count each download twice.
  4. It may be adviseable to version your stats ID somehow. For instance, use "org.eclipse.rse.core_tm320" or "org.eclipse.wst.ui_helios_sr0" or "org.eclipse.ptp_4.0". If you don't version the stats ID you are tracking, you may run into troubles when we do Helios SR1 which will re-use the same repo location (/releases/helios).
  5. Maybe obvious. the stats tracker will not catch people updating from Galileo to Helios. Because the Galileo p2 impl did not have the stats code enabled. You'll only count p2 downloads that happen with Eclipse 3.6.
  6. I'm wondering whether the helios aggregator's access to my project repository already counts as a download? Probably not too relevant after all.