Hudson-ci/features/Future Enhancements

Hudson Future Enhancements

(Proposed by: Stuart Lorber)

Job Step Description Field

Add an optional description field for every job step. There’s a description field at the top of a job configuration but that is really for a job overview.

A job configuration is like a Java class; there is a description of the class’s function at the top of the class but there is also a description (or should be) for every method.

It would be helpful to be able to come back to a job configuration later or come back to a job configuration that someone else created and look at the description to understand what the step does.

This is all about maintainability.

Note: This has been enabled for Ant, shell, batch and Maven steps. It should be enabled for all builders.

Disabling Job Steps

Job steps should be able to be disabled. This often comes up when debugging a job script. The only way to NOT run a job step is to delete it. I often find myself having to clone a job, delete steps, debug, look at the original job to start adding steps back into the copied script and then rerunning / debugging. If changes are made I need to put those changes back into the original job. I don’t delete the old job and rename the new job to the old job’s name because job references are broken or incorrect.

It would be nice to be able to disable a step so these manual steps weren’t necessary.

If you could disable steps you could even duplicate a step and try different configurations for that step. For instance, you might want to try a step with different parameters. It would be nice to be able to duplicate this step and then be able to tweak the copy, disable the original step and rerun the job.

This feature would be used in almost every new job stream we set up and often when doing job modifications.

Note: This has been enabled for Ant, shell, batch and Maven steps. It should be enabled for all builders.

Setting Hudson Job’s Exit Status

Currently the only way we have to set a job as unstable is based on JUnit test results.

In some cases, for instance, we run Rational Functional Tester over some of our products’ UIs. These produce their own proprietary html-based output to show successes and failures.

The developer who wrote the Ant scripts to run these tests looks at reported failures and executes a “fail” in the Ant script if there are any failures or exceptions. This worked for our old CI system but a Hudson job will fail and any test results will not be archived. Therefore there are no test results to review.

So, could Hudson either:

Have some way to set an exit status that’s accessible, for instance, based on the existence of a file or an Ant exit status?
Add additional attributes:
- a. Add an attribute to a step in a job to allow the job to continue if a step fails.
- b. Have an accompanying attribute that tells a subsequent step to execute if the previous step failed.

This might add a kind of if/then logic flow within a job.

3. Add an attribute to a step in a job or as a post build step that allows the user to set the exit status of the Hudson job based on a failed step. This would allow, for instance, a job to still archive job artifacts if something in the job fails. This might also be a good debug tool during job setup.
4. Adding a post build step (similar to “delete workspace after build completes” with its sub-options) and allows artifacts to be archived even if the job fails. This may cause its own problems because the job might fail during the archive steps because the artifacts were not generated. This might cause more confusion than it’s worth.

"Copy Artifacts From Another Project” Retry

This is the same issue as we have with SVN checkouts.

When we hit our main job flow we have 20 or more jobs running and they all copy a large number of large artifacts to their workspaces. We’re running our jobs on a large number of slaves. Our build system has the slaves and our Hudson server on their own 1 Gbit switch so we’re able to move a lot of data quickly. Each of these jobs may be copying 2 to 4 (or more) gigabytes of data.

Our Hudson server is only about 6 months old and, like our SVN server, is a decent machine.

I’m not sure of the proper behavior here.

Some options might be:

Delete the objects that have been already copied in that step and try again.
Keep track of which objects were successfully copied and upon retry delete objects that were unsuccessfully copied and retry the copy on any objects matching the copy criteria that have not yet been copied.

Can there be a configurable timeout option? Would a timeout option be an easy way to reduce this issue from occurring?

MultiJob Project Column Headings And Status Information

There are two columns that provide insufficient information.

“Last Success” shows a green or yellow status but there is no timestamp like on the main Hudson page.

“Last Failure” shows a red status if any existing builds failed. Since there is no timestamp the user has no way of knowing when the last failure occurred without looking at the job page that shows the build history.

We normally keep 5 builds of these types of jobs so when the user goes to the jobs page they’ll see which build failed. However, if more than one “page” of builds is kept the user would have to click “More…” to see which build failed.

The green, yellow and red balls for the “Last Success” and “Last Failure” should be timestamps.

Job Configurations

A section of a configuration can be expanded (i.e. “Advanced”) but cannot be collapsed. This is also the case within a step or section of a step; for instance, “Advanced” can be clicked on an Ant step but cannot be collapsed.

Expanding a section allows the user to see greater detail but keeping these sections expanded just makes the configuration longer and harder to follow.

UI Navigation Enhancements

These notes are based on our configuration and address issues of scalability. We are finding that Hudson does not scale well to manage all of the jobs and nodes we currently have and our plans for the remainder of 2015 and heading into 2016.

We are producing documentation to manage our build environment. Much of this documentation would be unnecessary with changes to the information Hudson could provide.

General View

When a user connects to Hudson without logging in they see certain options/features:

People
Build History
Job Relationship
Check File Fingerprint
Disk usage (if plugin is installed).

In addition they see the Build Queue and Build Executor Status (Node List).

This information is not applicable to anyone who hasn’t logged in.

The only information a non-logged in user should see is the System Message and any public jobs (team-based).

After logging in a system administrator sees some more options including “Manage Teams”. “Manage Teams” should be an option under “Manage Hudson”.

The average user does not need to see the Build Queue or Node List. It would be easier for a user to see a list of jobs that are queued and a list of jobs that are processing.

Main Dashboard

The main dashboard has never changed.

It has a lot of information that includes links for system configuration, a list of all nodes that shows what jobs are running and a list of jobs with many columns of information.

Most people don’t need to see any of the configuration options in the upper left corner. “New Job” may be the only option in this section that would be used by a non-system administrator.

The average user does not need to see the Build Queue or Node List and what’s running on a particular node. They can see from the list of jobs if the job is running and get any pertinent information by clicking on the job. They only need to know if the job is running and if it’s finished.

Node information doesn't need to be on the main dashboard page (to be addressed later).

It would help to have a sort function on the main dashboard page to allow jobs that are currently building or are queued to run to appear at the top of the screen.

Another option would be to have a plugin that allows a split screen with a separate section at the top for running jobs and queued jobs. Information on queued jobs is as important to know as what jobs are currently running.

Filtering / Organizing Job Information

Questions from users often come up - like “where can I find the installer for ‘X’?”

We've created tabs and organized jobs and job names logically but the use of these various lists becomes confusing when you have 150+ jobs. (We’re at about ~150 and growing rapidly).

In addition to the number of jobs we have about 25 teams. These teams are based on functional group as well as product and product version. We use this scheme partly for proper authorization and partly for organization. The management side is not that difficult but the delivery of information and artifacts is confusing to the user. This is also related to the original Hudson 2.x UI that probably did not assume this number of jobs and job visibility.

We're looking for ways to make a person’s view of their environment as clean and clear as possible based on a growing number of teams and jobs.

In addition, the Hudson dashboard is very wide. We’ve added tabs to help break up jobs into views based on various criteria which requires horizontal scrolling.

We’ve looked at our old 2.2.1 Hudson server that has only 3 tabs. The dashboard is still too wide to fit on a standard sized screen. This requires the user to scroll back and forth to see all the available information. An option might be to allow the user to toggle a “minimized” view that does not display most of the job information columns.

UI Navigation Enhancements

These notes are based on our configuration and address issues of scalability. We are finding that Hudson does not scale well to manage all of the jobs and nodes we currently have and our plans for the remainder of 2015 - 2016.

We are producing documentation to manage our build environment. Much of this documentation would be unnecessary with changes to the information Hudson could provide.

Main Dashboard

As mentioned earlier, this list can become more trouble than it’s worth when there are a large number of nodes. We’re planning to have, by the end of 2015, around 70 nodes. Finding any useful information on a list that long – even with logically named nodes – is very difficult. With the use of labels in node configurations that logically groups nodes into a pool this list becomes even less valuable. There is more noise than information.

Add a filter to allow the user to see only those nodes that are currently running jobs.

Node List

This screen has some valuable information. It does not show running jobs but it does show information about the slave. For us the most valuable information here is the Clock Difference. It helps us diagnose problems with source checkouts and made us install proper “time server” sync software on our Windows VM slaves and automated hourly time syncs with time servers on our Linux VM slaves.

It also provides free disk space information and the operating system of the slave.

It does not show information about node groups.

It does not show information about what physical machine would be hosting a virtual machine.

Node Detail

Defining a node allows, among other things, the assignment of a “label”. This label can be used to create “node groups”.

When a job is assigned to a node group there is no panel to show where a job is assigned. The only way to know where the job is assigned is to go back into the job configuration.

It would be nice to have a panel similar to the node list panel shown above in #2 that would be a list of node groups rather than nodes.

Clicking on one of these node groups could either open up a panel (as shown above) that shows nodes assigned to that node group and jobs assigned to that node group.

An alternative, which would be nicer, would be to allow the user to expand a node group from the list of node groups. This would show you the node and job information. This would allow the user to get a better overview of their system.

Node Views

We would like to have a panel that displays a logical grouping of slaves.

This display would be user defined.

It would allow users to group nodes together in a logical view to help give an overview of their slave farm configuration.

Our use case would be that we have physical slaves that run VirtualBox VMs. We do not run any jobs directly on the machine.

Right now this information is on a wiki that is updated as new VMs are created or VMs are deleted or moved to another slave.

It would make sense to have this information in a location that is on the system that manages the definition of nodes.

This information should be available to non-system administrators because developers need to get onto a slave / VM to update software or clean up artifacts.

Allowing a user to define multiple overview panels would give the user the flexibility to view their environment in different ways.

Email Configuration

When using the team functionality a job's configuration should be defined as part of the team definition. This would allow a team's membership to be changed quickly and remove the requirement to modify each job's configuration as a team's members change.

If a job's configuration has email notification enabled and the list is blank the list should default to the team's default email list.
If a job's configuration has email notification disabled no emails are sent.
If a job's configuration has email notification enabled but the list has entries the email notification for this job should use this list and override the team's default email list.

SCM Credentials

Use logic similar to the email flow stated above.

Define the SCM credentials to use at the team level to validate SCM entries in a job's config.

Use these credentials, by default, so the user does not have to enter them, for instance, after a "copy team" action.

Multi-Job Promotion

Add the option when promoting a multi-job to promote all jobs called in that job's config.

Add the option to set the build text for these jobs (should support html markup).

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Hudson-ci/features/Future Enhancements

Contents

Hudson Future Enhancements

Job Step Description Field

Disabling Job Steps

Setting Hudson Job’s Exit Status

"Copy Artifacts From Another Project” Retry

MultiJob Project Column Headings And Status Information

Job Configurations

UI Navigation Enhancements

General View

Main Dashboard

Filtering / Organizing Job Information

UI Navigation Enhancements

Main Dashboard

Node List

Node Detail

Node Views

Email Configuration

SCM Credentials

Multi-Job Promotion

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Hudson-ci/features/Future Enhancements

Contents

Hudson Future Enhancements

Job Step Description Field

Disabling Job Steps

Setting Hudson Job’s Exit Status

"Copy Artifacts From Another Project” Retry

MultiJob Project Column Headings And Status Information

Job Configurations

UI Navigation Enhancements

General View

Main Dashboard

Filtering / Organizing Job Information

UI Navigation Enhancements

Main Dashboard

Node List

Node Detail

Node Views

Email Configuration

SCM Credentials

Multi-Job Promotion