Jump to: navigation, search

Difference between revisions of "UDC/How is Usage Data Stored"

< UDC
(Record)
(Table Description)
Line 167: Line 167:
  
 
===Table Description===
 
===Table Description===
 +
New strings are added to this table as they are encountered. All entries in this table should be unique.
 +
 
<pre>+--------+------------------+------+-----+---------+----------------+
 
<pre>+--------+------------------+------+-----+---------+----------------+
 
| Field  | Type            | Null | Key | Default | Extra          |
 
| Field  | Type            | Null | Key | Default | Extra          |

Revision as of 13:45, 22 September 2010

Overview

Usage data is aggressively normalized on the server. Ultimately, the data is stored in multiple tables.

Tables

Profile

A record is created in the profile table, usagedata_profile for each workspace we encounter. When the UDC uploads data, it provides us with two identifiers (both are UUIDs generated by the UDC) representing the user and the workspace. The user identifier effectively represents an individual computer; the workspace identifier represents an individual workspace. This distinction was deemed necessary in order to account for users who run multiple Eclipse workspaces.

A row is created in the profile table for each distinct userId/workspaceId pairing.

Table Description

The table contains an id field which is the primary key. This field is used as the target of the foreign-key reference from the Upload table.

+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| id          | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| userId      | char(40)         | NO   | MUL |         |                | 
| workspaceId | char(40)         | NO   | MUL |         |                | 
+-------------+------------------+------+-----+---------+----------------+

Example

+----+--------------------------------------+--------------------------------------+
| id | userId                               | workspaceId                          |
+----+--------------------------------------+--------------------------------------+
|  1 | bbb6e4b1-391f-4da5-8ce1-a89a19cdb5ff | 7b38a3b8-a40f-4250-8be2-aa78c1e3ba9e | 
|  2 | bbb6e4b1-391f-4da5-8ce1-a89a19cdb5ff | a680e38b-aa01-4ac7-a11c-8bf457099570 | 
|  3 | bbb6e4b1-391f-4da5-8ce1-a89a19cdb5ff | 8f093d31-ba89-4e54-be98-8c7bc936587f | 
|  4 | 9d4a44d0-302b-4a00-aede-719e975ecfc8 | d4b5add9-cbc8-43cb-a6f8-696bac623429 | 
|  5 | 518e2328-faea-4f07-a46e-c0c97e050171 | 033ae773-8c3e-4240-8423-253d48be6780 | 
|  6 | 02a0ab5b-d51d-4905-ad8d-a66626da2774 | 386208d2-28a5-4ffc-a290-fdd476748857 | 
|  7 | 43d7c638-6f5a-463c-91ca-f0f7da905d4c | 7ea57f59-7f48-42c3-b563-7a1a17f22ab4 | 
|  8 | 77882409-c2c3-4d1a-9117-308fc123af70 | f369e621-2c0e-4dc9-b9c7-44208d6fc6b9 | 
|  9 | e3fe3f74-9510-41f7-a437-2997dc607b0c | 47a7e9f7-dc94-49dd-a344-92c4b321492b | 
| 10 | 04617774-515b-4c72-8327-9733bd52ebce | 9c1f5d31-cd1c-4e48-bd47-0689b58b6467 | 
+----+--------------------------------------+--------------------------------------+

Upload

Each entry in the upload table, usagedata_upload, represents an upload event. That is, every time the UDC "calls home" with an upload, we add a row to this table. Each row records the profileId of the user (from the Profile table using the userId/workspaceId combination provided with the upload event), the country code (ccode), and the time (on the server) of the upload.

Table Description

The table contains an id field which is the primary key. This field is used as the target of the foreign-key reference from a Record table.

+-----------+------------------+------+-----+-------------------+----------------+
| Field     | Type             | Null | Key | Default           | Extra          |
+-----------+------------------+------+-----+-------------------+----------------+
| id        | int(10) unsigned | NO   | PRI | NULL              | auto_increment | 
| profileId | int(10) unsigned | NO   |     |                   |                | 
| ccode     | char(2)          | YES  |     | NULL              |                | 
| time      | timestamp        | NO   | MUL | CURRENT_TIMESTAMP |                | 
+-----------+------------------+------+-----+-------------------+----------------+

Example

+----+-----------+-------+---------------------+
| id | profileId | ccode | time                |
+----+-----------+-------+---------------------+
|  1 |         2 | ca    | 2008-04-04 13:17:07 | 
|  2 |         2 | ca    | 2008-04-04 15:24:18 | 
|  3 |         3 | ca    | 2008-04-04 15:26:45 | 
|  4 |         3 | ca    | 2008-04-04 15:29:37 | 
|  5 |         3 | ca    | 2008-04-04 15:30:20 | 
|  6 |         4 | us    | 2008-04-04 16:04:19 | 
|  7 |         5 | be    | 2008-04-04 16:08:36 | 
|  8 |         6 | us    | 2008-04-04 16:30:33 | 
|  9 |         7 | ca    | 2008-04-04 16:35:54 | 
| 10 |         8 | it    | 2008-04-04 16:41:02 | 
+----+-----------+-------+---------------------+

Record

Multiple "record" tables are created and maintained, one for each month of gathered usage data. The table usagedata_record_monthly_200901, for example, contains usage data uploaded in January 2009. We decided to separate this data into multiple tables in anticipation of having to deal with extremely large amounts of data (there are 127,680,562 records in the January 2009 table).

This table is highly normalized. Most of the content of a usage data event record is textual. All content is stored in the "String" table (see below); only the id of the corresponding entry is stored in the record table itself.

The record table records the uploadId of the upload from which the a row originates, along with the what, kind, bundleId, bundleVersion, description, and time fields (described in What Gets Captured).

In order to determine the user from which an individual record was obtained, a Record table must be joined to the Upload table and then to the Profile table.

For example,

select p.userId, u.ccode, r. bundleId 
  from usagedata_record_monthly_200901 as r 
    join usagedata_upload as u on (r.uploadId=u.id) 
    join usagedata_profile as p on (u.profileId=p.id) 
  limit 1;
+--------------------------------------+-------+----------+
| userId                               | ccode | bundleId |
+--------------------------------------+-------+----------+
| 0ac00d5e-7bf4-41e1-9d20-c6800f452fa9 | us    |      727 | 
+--------------------------------------+-------+----------+
mysql>


Table Description

The what, kind, bundleId, bundleVersion, and description fields are foreign key references into the String table (see below).

+---------------+---------------------+------+-----+---------+----------------+
| Field         | Type                | Null | Key | Default | Extra          |
+---------------+---------------------+------+-----+---------+----------------+
| id            | int(10) unsigned    | NO   | PRI | NULL    | auto_increment | 
| uploadId      | int(10) unsigned    | NO   | MUL |         |                | 
| what          | int(10) unsigned    | NO   |     |         |                | 
| kind          | int(10) unsigned    | NO   | MUL |         |                | 
| bundleId      | int(10) unsigned    | YES  | MUL | NULL    |                | 
| bundleVersion | int(10) unsigned    | YES  |     | NULL    |                | 
| description   | int(10) unsigned    | YES  |     | NULL    |                | 
| time          | bigint(20) unsigned | NO   |     |         |                | 
+---------------+---------------------+------+-----+---------+----------------+

Example

+----+----------+------+------+----------+---------------+-------------+---------------+
| id | uploadId | what | kind | bundleId | bundleVersion | description | time          |
+----+----------+------+------+----------+---------------+-------------+---------------+
|  1 |  1083789 |    6 |   72 |      727 |             1 |         761 | 1224176378264 | 
|  2 |  1083789 |    8 |    9 |       12 |         32232 |          12 | 1224176378265 | 
|  3 |  1083789 |    8 |    9 |     1681 |         10763 |        1681 | 1224176378265 | 
|  4 |  1083789 |    8 |    9 |       21 |         10765 |          21 | 1224176378265 | 
|  5 |  1083789 |    8 |    9 |       23 |         32233 |          23 | 1224176378265 | 
|  6 |  1083789 |    8 |    9 |       25 |         10766 |          25 | 1224176378265 | 
|  7 |  1083789 |    8 |    9 |       27 |         10766 |          27 | 1224176378266 | 
|  8 |  1083789 |    8 |    9 |       10 |         10767 |          10 | 1224176378266 | 
|  9 |  1083789 |    8 |    9 |       29 |          8821 |          29 | 1224176378266 | 
| 10 |  1083789 |    8 |    9 |       31 |         10768 |          31 | 1224176378266 | 
+----+----------+------+------+----------+---------------+-------------+---------------+

String

The String table, usagedata_string, contains a single entry for each string representing in the what, kind, bundleId, bundleVersion, and description fields of a Record table.

You can use a join to denormalize the data. For example:

select 
    r.id, w.string as what, k.string as kind, 
    bi.string as bundleId, bv.string as bundleVersion, 
    d.string as description, from_unixtime(r.time/1000) as time 
  from 
    usagedata_record_monthly_200901 as r 
    join usagedata_string as w on (r.what=w.id) 
    join usagedata_string as k on (r.kind=k.id) 
    join usagedata_string as bi on (r.bundleId=bi.id) 
    join usagedata_string as bv on (r.bundleVersion=bv.id) join usagedata_string as d on (r.description=d.id) 
  limit 10;
+----+-----------+-------------+----------------------------------------+---------------------------+----------------------------------------+---------------------+
| id | what      | kind        | bundleId                               | bundleVersion             | description                            | time                |
+----+-----------+-------------+----------------------------------------+---------------------------+----------------------------------------+---------------------+
|  1 | activated | perspective | org.eclipse.jst.j2ee.ui                |                           | org.eclipse.jst.j2ee.J2EEPerspective   | 2008-10-16 12:59:38 | 
|  2 | started   | bundle      | org.eclipse.osgi                       | 3.4.2.R34x_v20080826-1230 | org.eclipse.osgi                       | 2008-10-16 12:59:38 | 
|  3 | started   | bundle      | org.eclipse.equinox.simpleconfigurator | 1.0.0.v20080604           | org.eclipse.equinox.simpleconfigurator | 2008-10-16 12:59:38 | 
|  4 | started   | bundle      | org.eclipse.core.contenttype           | 3.3.0.v20080604-1400      | org.eclipse.core.contenttype           | 2008-10-16 12:59:38 | 
|  5 | started   | bundle      | org.eclipse.core.databinding           | 1.1.1.M20080827-0800b     | org.eclipse.core.databinding           | 2008-10-16 12:59:38 | 
|  6 | started   | bundle      | org.eclipse.core.expressions           | 3.4.0.v20080603-2000      | org.eclipse.core.expressions           | 2008-10-16 12:59:38 | 
|  7 | started   | bundle      | org.eclipse.core.filebuffers           | 3.4.0.v20080603-2000      | org.eclipse.core.filebuffers           | 2008-10-16 12:59:38 | 
|  8 | started   | bundle      | org.eclipse.core.filesystem            | 1.2.0.v20080604-1400      | org.eclipse.core.filesystem            | 2008-10-16 12:59:38 | 
|  9 | started   | bundle      | org.eclipse.core.jobs                  | 3.4.0.v20080512           | org.eclipse.core.jobs                  | 2008-10-16 12:59:38 | 
| 10 | started   | bundle      | org.eclipse.core.net                   | 1.1.0.I20080604           | org.eclipse.core.net                   | 2008-10-16 12:59:38 | 
+----+-----------+-------------+----------------------------------------+---------------------------+----------------------------------------+---------------------+


Table Description

New strings are added to this table as they are encountered. All entries in this table should be unique.

+--------+------------------+------+-----+---------+----------------+
| Field  | Type             | Null | Key | Default | Extra          |
+--------+------------------+------+-----+---------+----------------+
| id     | int(10) unsigned | NO   | PRI | NULL    | auto_increment | 
| string | varchar(256)     | NO   | MUL |         |                | 
+--------+------------------+------+-----+---------+----------------+

Example

+----+-----------------------------+
| id | string                      |
+----+-----------------------------+
|  1 |                             | 
|  2 | deactivated                 | 
|  3 | workbench                   | 
|  4 | org.eclipse.ui.workbench    | 
|  5 | 3.4.0.I20080205-0010        | 
|  6 | activated                   | 
|  7 | closed                      | 
|  8 | started                     | 
|  9 | bundle                      | 
| 10 | org.eclipse.core.filesystem | 
+----+-----------------------------+