Common Build Infrastructure/Managing Hudson
In order to prevent multiple people from trying to restart the same slave, all such activity should be announced/coordinated on the cross-project mailing list.
Any specific 'tweaks'(or new jobs) should be documented on a bug(https://bugs.eclipse.org)
Here's the process for authorized committers to restart Hudson. A restart is required if the master node is failing, or multiple slaves are acting up.
1) Login to the web interface (https://hudson.eclipse.org/shared/) NOTE: if Hudson is so dead that the web interface isn't responding, get Webmaster involved.
2) Click the 'Manage Hudson' link in the left hand menu
3) Select 'Prepare Hudson for shutdown' This will allow you to type a short message explaining why Hudson is to be shutdown, as well as prevent any new jobs from running.
4) Either wait for any running jobs to finish or cancel them
5) Under Manage Hudson, click 'Plugin Manger'
6) Select the 'Installed Plugins' tab
7) At the bottom of that page click the 'Restart when no jobs are running' button
8) Wait for Hudson to restart itself.
Restarting a specific slave
The Windows slave requires Webmaster assistance to restart, however the unix slaves can be 'soft' restarted as follows:
1) Login to the Hudson web interface
2) Select the node from the list of nodes on Hudsons main page.
3) Click the 'Mark node temporarily offline' button in the upper right corner of the page. Provide a short message about why you're about to restart the slave. (The button then changes says to the reverse operation, "mark this node online".
4) Wait for any jobs to finish or cancel them
5) Click the 'disconnect' link in the left menu
6) Once the node is disconnected wait ~30s and click the 'Launch slave agent' button just under the node name in the main window.
- [User experience, (David Williams, circa 7/2012), I don't always see a "launch slave agent" button. Once I click "disconnect" (step 5) and confirm with a "reason message", the disconnect button goes away. I refresh occasionally and eventually the disconnect button comes back. At that point, I press the "mark this node online" (end result of step 3) and it all all starts up. But, another observation (perhaps it depends on platform?) sometimes the "disconnect" button does not come back (e.g. after 30 or 60 seconds) but if I then press "mark this node online" then there is a "Launch slave agent" button that appears, and I can then click on that to start things up.]
7) Watch the login process and check that everything looks 'ok'(ie: no errors). The "logs" link in right nav bar is very interesting way to watch the login process.
Creating a new job
Before creating a new job you need:
- A job name
- The id of a committer who will 'own' the job
- Extra committer ids
- A 'source' job to copy
- A specific job type
1) Login to Hudsons web interface
2) Click the 'New job' link in the left hand menu
3) Provide the job name and 'default' project type(Build an open source project), if the project does not provided one.
3.a) If the project has provided a source job, select 'Copy existing job' and paste the source jobs name into the 'Copy from' text box
4) Press ok
5) Once the job config loads, scoll down to the 'Security' section.
6) Add the owning committer(and any extra committers one at a time) via the 'user group to add' textbox, and by clicking 'add' for each entry. If you press 'enter' on your keyboard, you'll be doing this again.
7) Set the permissions for each user. On most jobs that means everything except 'Extended read'.
8) Remove your id from the list.
9) Scroll to the bottom of the page and click save.