Jump to: navigation, search

Platform-releng/How to do miscellaneous releng tasks

There are times when some quirky thing out of the ordinary has to be done. This page captures some of them. They are likely easily to get out of date or details change from case to case, but ... thought they would at least serve as "hints" in case others have to ever do these things (and they serve as reminders to me :). They are likely too be quirky and volatile to be part of a FAQ, but might evolve to be a "procedures document". Thought I'd try to capture them every time I do one that I find confusing or hard to keep straight, or when someone asks about, so it is not a complete or exhaustive document, but again, hints. Note that most of these procedures require shell access to build.eclipse.org.

How to restart Hudson tests (method 1)

[November 28, 2012]

[modified on February 17, 2015, since I, myself (David Williams) never use this method any more, but instead use "method 2)". This "method 1" might still work ... but, "method 2" would normally be easier, especially if it is "just one platform" that needs to be re-ran.]

Sometimes tests have to be restarted and reran. This is especially true if something goes wrong with Hudson, say, and its restarted during the middle of a test run.

Hudson tests can be re-ran directly from Hudson web interface; just provide the buildId and eclipseStream and the scripts will figure out where to get the build from the "download.eclipse.org". This works because there is a cronjob running that knows how to efficiently "look for results" and if finds any, will collect them up and summarize them on main download page.

The could be done programatically as well. This example is specific for a Kepler I-build, but idea would be same for others. It assumes the build is complete, and on "downloads" and this is just to run the tests on Hudson. It might be easier to do programmatically, if you had to restart all three tests, for example, otherwise, the webpages are pretty easy.

The file to start tests is in

/shared/eclipse/eclipse4I/build/supportDir/org.eclipse.releng.eclipsebuilder/testScripts

To do the retest a file named startTests.sh is executed from command line.

But, this file needs two parameters, the buildId to test, and the eclipseStream that the build is from. You could edit the sh file directly, but best is to edit a file named buildParams.shsource. The contents of that file, for a Kepler re-test, would be something like

buildId=I20120911-1000
eclipseStream=4.3.0

Then, then startTests.sh is executed, it will read the values from that file.

(I think startTests.sh should be ran from screen shell, or otherwise allowed to continue running even if you logoff or lose connection.)

How to restart Hudson tests (method 2)

[February 17, 2015]]

Tests can reliably be re-ran, even a "long time" after initial run (assuming the build is till on "downloads"), because we save all the relevant data on "downloads", and the test's "input parameters", as a whole, specify exactly what to run, and what to use to "publish" the results. Note: currently, we can only run "M-, N-, and I-builds" with the two part time stamp. We can not run, for example, the tests from "S-4.5M4-201412151800" (But, normally, S-4.5M4-201412151800 corresponds exactly to I20141215-1800 so that would be the build to use to re-run tests from a milestone.

The tests jobs are all ran on the "shared Hudson" instance (no "HIPP instance" for production tests) and are listed at

 https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/ 

Hopefully the test "names" are self explanatory, such as 'ep45I-unit-win32' is for the unit tests for Windows 32 bit machine, for the Eclipse 4.5 I-builds. A test such as ep45N-unit-win32 is the "N-build" counter part. All these test jobs are pretty much identical, but ran as different jobs for two reasons: 1) it improves the automatic history "book keeping". So for example if the number of tests failures increase or decrease, then you are comparing "apples to apples". 2) There are times that "machine restrictions" apply, for example, on Windows, we allow only one build to run, at a time, whereas on the Mac, we allow more than one to run.

Permissions

Anyone in "eclipse.platform.releng" group can run/edit the test jobs. And others can be added to the "project security", by user id, if the need arises, such as for "backup".

Preparation

To "re-run" a test, you need three pieces of information, the buildId, such as M20150204-1700, the 3-digit build stream, such as 4.4.2, and the "hash tag" of the aggregator for that build, such as 115d147f542bfcfeeba452946993c2f2578e85a8.

If the build ran once (i.e. is in "history") these values are in the "parameters" field of the existing test attempt. If even lost in history, the values can be obtained from the download directory.

To Re-run

You need to login to the "shared Hudson" (and, for that one, it is your committer ID, and password, not your email, as it is on HIPP instances). Click the "Build now" link, and you will be presented with a form to fill-in the 3 values from above. Click on "ok" (Labeled 'Build'), and check back to see if it's running! (You should at least see it "queued up" if it can not run right away, due to the test machine being busy).

NOTE: once a test-run is finished on Hudson, there are some other cron jobs that run under my id (david_williams) since ultimately to upload to "downloads" it must be done by a committer id (or, other id, as arranged by webmaster, as they have done for HIPP instances). These cron jobs "check for test results" every 10 minutes or so, and should be fully automated. If/when that ever has to change ... will have another "job transition" document for that! These crobjobs that check for "test results" depend on the Hudson job named "ep-collectResults" which is a very simple job, that has write access to the directory '/shared/eclipse/sdk/testjobdata/'. That "ep-collectResults" job writes a small data file there, and the cronjob checks that directory for "new work to do", and if found, runs the job to "summarize" the results, and push up to "downloads". The "ep-collectResults" job rarely ever fails. But, if you think it is, you can check it's console, as you can any other Hudson job.

How to re-collect Hudson tests

[November 28, 2012]

Occasionally the tests run fine and you can "see" the test results on Hudson, but they are not summarized and integrated with the builds download page. This can happen for example, if Hudson is busy and returns a "502" error while trying to fetch the zip file of the results.

There is a way to "manually" retry this "fetch and summarize" job. We currently try only once, because occasionally there are test failures that result in "infinite reties" for some times of errors.

In brief, look in

/shared/eclipse/sdk/testjobdata

There are data files there named 'testjobdata'<timestamp>'.txt'. There would be three of them generated for each build ... one for each platform tested. They are prefixed according to their state (no prefix meaning "has not been processed yet". For example,

testjobdata201211280639.txt
RAN_testjobdata201211280639.txt
RUNNING_testjobdata201211280639.txt
ERROR_testjobdata201211280639.txt

There are also files named 'collection-out.txt' and 'collection-err.txt' which give some logs of standard out and standard error file the "collect and summarize" jobs ran. If you see a recoverable error in there (e.g. we received a 502 from Hudson) you can re-try the job just by renaming the file. For example, rename 'ERROR_testjobdata201211280639.txt' back to 'testjobdata201211280639.txt' (filenames matching "testjobdata*" are processed by the usual cron job, which runs every 10 minutes).

The cronjob that runs is /shared/eclipse/sdk/testdataCronJob.sh. In theory, a releng committer could run this directly (instead of waiting for cronjob, or if cronjob itself is broken) but occasionally, in past, I've been surprised that some permissions aren't right (and is seldom tested).

The scripts in '/shared/eclipse/sdk' are stored in git in '/org.eclipse.releng.eclipsebuilder/scripts/sdk' but there is no "checkout/checkin" going on automatically ... they are stored there for safety and history.

How to see tests results on Hudson

[December 10, 2012]

Our jobs on Hudson are collected in the Eclipse and Equinox view. Tests based on 3.x builds are prefixed with ep3 and Tests based on 4.x are prefixed with ep4. The view shows the status of the last job ran (or, current job running). To see history, you need to click on one job. To see which test job corresponds to which build, you need to "drill down" and look at "Parameters" of each job.

Below are some example screen shots.


The first shows the history of one job.

progress bar shows a job in progress (and, its icon will be blinking).
: a yellow dot icon means the job finished but there were test failures (normal for our current tests)
: a grey dot means the job started but was cancelled (could have been cancelled on purpose, or might have been that Hudson was restarted).
: a red dot means there was an error that prevented the tests from running (such as they could not be installed).



Job History for 4.x based Windows 32 bit tests


The following screen shot shows the results of a normal job. You can see there tests ran, and had the usual 100 or so failures. You can click on "TestResults" on the left nav bar to see the whole list of tests ... we have about 80,000.


One Job


To see exactly which build was tested, you need to click on the "Parameters" link on the left nav bar to see the buildId and eclipseStream.


Job Parameters



How to change red 'test results' to green

As explained in bug 387066 there are times when the link at top of download page is red (indicating errors occurred) when in fact there were no errors in the JUnit tests. Until that's fixed, there is a way to "manually" set the color -- without literally editing the PHP page. If a file named 'overrideTestColor' exists in the drop directory, the color of the link will automatically be green. This file can be put in place without necessarily having shell access, by anyone with proper access and write permissions to Eclipse 'downloads'. For example,

  rsync -e ssh overrideTestColor build:/home/data/httpd/download.eclipse.org/eclipse/downloads/drops4/<buildId> 

where <buildId> is the id such as "I20140526-2000" (the contents of 'overrideTestColor' don't matter). Perhaps easier, is just to use

  ssh build touch /home/data/httpd/download.eclipse.org/eclipse/downloads/drops4/<buildId>/overrideTestColor

(Depending on how you have your SSH config file set-up, you may need to spell out "build.eclipse.org" and specify your committer Id, instead of just using "build").

How to cancel an on-going build

Permissions

The Eclipse and Equinox builds are ran from the "e4Build" id, so the anyone who can log-in to a build-machine shell, with that ID, can cancel an ongoing build. (Contact David Williams or Paul Webster for access to e4Build, but you'll need to contact the webmaster@eclipse.org to get shell access.).

Confirm Build is actually running

To confirming a specific type of build is actually running, I typically use a command similar to

$ ps -ef | grep "eclipse/builds/4I"

Each build (I-, M-, N-) will typically have 3 to 5 processes running at once, and "like-build processes" will all match a pattern similar to that in the above grep. To see "all" of the eclipse builds running (I- and M- and N-) you would just use something similar to

$ ps -ef | grep "eclipse/builds"

Note: in some cases cases, the build might be in different stages, such as "just cloning from git", or "compiling", or "publishing". These can usually be determined from the process names, but, if anyone wanted to see the details of what is happening, there is a master log, for each build-type, so you can "tail" that file, with a command similar to

tail -f /shared/eclipse/builds/mb4I.out.log

To kill all processes associated with a build

You could use "kill <processId>", where processId is obtained from above "ps" queries.

It is easier, though, to kill them all at once, with a command similar to

pkill -f eclipse/builds/4I

Note: kill -9 should not be needed (for a "forceful" shutdown), and best not to use that, if not needed.

How to reschedule a build

The Eclipse production builds are currently (as of 2/17/2015) all ran from cron jobs. These are in the "crontab" of the e4Build user (so, you need to have shell access with 'e4Build', as above).

As always, you edit a crontab with "crontab -e". The jobs in that file are all fairly well documented, so that existing schedules can be changed. By convention, we do "rebuilds" in a section that is labeled "# rebuilds".

How to run a test-build

A "test build" is one that goes through all the normal processes as a "production build" except that a) they are not signed, b) the repo is not "tagged", c) the unit tests are not ran, and d) the results just stay on the build machine (not uploaded to "downloads").

To run such a build, you need to run time under the "e4Build" id (so that "permissions" are correct) and likely has to be started from an SSH shell. The same "start build" scripts that are using for cron jobs, are used for test builds, except that a '-t' parameter is passed ... which sets a variable or two that signifies a "test build" to the appropriate actions are taken (or, not taken). These scripts can be found under

/shared/eclipse/builds

and have fairly self explanatory names, such as mb4I.sh, mb4M.sh or mb4N.sh. So, to run a test I-build, navigate to "/shared/eclipse/builds", and, with e4Build id, issue a command such as

 ./mb4I.sh -t

You, and others, can still see the results from the web, by navigating to a directory under

http://build.eclipse.org/eclipse/builds/

Other releng tasks

See Platform-releng/Platform_Build_Automated#Routine_release_engineering_tasks_for_builds for other, more routine releng tasks.