Jump to: navigation, search

Difference between revisions of "Hudson-ci/features/Future Enhancements"

 
Line 39: Line 39:
 
#3. Add an attribute to a step in a job or as a post build step that allows the user to set the exit status of the Hudson job based on a failed step.  This would allow, for instance, a job to still archive job artifacts if something in the job fails.  This might also be a good debug tool during job setup.
 
#3. Add an attribute to a step in a job or as a post build step that allows the user to set the exit status of the Hudson job based on a failed step.  This would allow, for instance, a job to still archive job artifacts if something in the job fails.  This might also be a good debug tool during job setup.
 
#4. Adding a post build step (similar to “delete workspace after build completes” with its sub-options) and allows artifacts to be archived even if the job fails.  This may cause its own problems because the job might fail during the archive steps because the artifacts were not generated.  This might cause more confusion than it’s worth.
 
#4. Adding a post build step (similar to “delete workspace after build completes” with its sub-options) and allows artifacts to be archived even if the job fails.  This may cause its own problems because the job might fail during the archive steps because the artifacts were not generated.  This might cause more confusion than it’s worth.
 
=== Source Control Retry ===
 
 
We occasionally have jobs fail because of timeouts during SVN checkouts.  Is it possible to add a retry option?  This option could allow the user to designate the number of retries and have the retry action either follow the “Check-out Strategy” designated in the job configuration or allow the user to specify the retry “Check-out Strategy”.
 
 
[[Image:scm-retry.png]]
 
 
 
This may affect us more because when we kick off the main part of our job flows we have about 20 or more test jobs and some installer jobs so there’s a lot of source code to checkout at the same time.
 
 
Our SVN server is only about a year old and is a decent machine so we’re not able to fix the problem by upgrading that machine.
 
 
Can there be a configurable timeout option?  Would a timeout option be an easy way to reduce this issue from occurring?
 
  
 
=== "Copy Artifacts From Another Project” Retry ===
 
=== "Copy Artifacts From Another Project” Retry ===

Latest revision as of 07:05, 18 June 2014

Hudson Future Enhancements

(Proposed by: Stuart Lorber)

Job Step Description Field

Add an optional description field for every job step. There’s a description field at the top of a job configuration but that is really for a job overview.

A job configuration is like a Java class; there is a description of the class’s function at the top of the class but there is also a description (or should be) for every method.

It would be helpful to be able to come back to a job configuration later or come back to a job configuration that someone else created and look at the description to understand what the step does.

This is all about maintainability.

Disabling Job Steps

Job steps should be able to be disabled. This often comes up when debugging a job script. The only way to NOT run a job step is to delete it. I often find myself having to clone a job, delete steps, debug, look at the original job to start adding steps back into the copied script and then rerunning / debugging. If changes are made I need to put those changes back into the original job. I don’t delete the old job and rename the new job to the old job’s name because job references are broken or incorrect.

It would be nice to be able to disable a step so these manual steps weren’t necessary.

If you could disable steps you could even duplicate a step and try different configurations for that step. For instance, you might want to try a step with different parameters. It would be nice to be able to duplicate this step and then be able to tweak the copy, disable the original step and rerun the job.

This feature would be used in almost every new job stream we set up and often when doing job modifications.

Setting Hudson Job’s Exit Status

Currently the only way we have to set a job as unstable is based on JUnit test results.

In some cases, for instance, we run Rational Functional Tester over some of our products’ UIs. These produce their own proprietary html-based output to show successes and failures.

The developer who wrote the Ant scripts to run these tests looks at reported failures and executes a “fail” in the Ant script if there are any failures or exceptions. This worked for our old CI system but a Hudson job will fail and any test results will not be archived. Therefore there are no test results to review.

So, could Hudson either:

  1. Have some way to set an exit status that’s accessible, for instance, based on the existence of a file or an Ant exit status?
  2. Add additional attributes:
    • a. Add an attribute to a step in a job to allow the job to continue if a step fails.
    • b. Have an accompanying attribute that tells a subsequent step to execute if the previous step failed.

This might add a kind of if/then logic flow within a job.

  1. 3. Add an attribute to a step in a job or as a post build step that allows the user to set the exit status of the Hudson job based on a failed step. This would allow, for instance, a job to still archive job artifacts if something in the job fails. This might also be a good debug tool during job setup.
  2. 4. Adding a post build step (similar to “delete workspace after build completes” with its sub-options) and allows artifacts to be archived even if the job fails. This may cause its own problems because the job might fail during the archive steps because the artifacts were not generated. This might cause more confusion than it’s worth.

"Copy Artifacts From Another Project” Retry

This is the same issue as we have with SVN checkouts.

When we hit our main job flow and we have those 20 or more jobs running they all copy a large number of large artifacts to their. We’re running our jobs on a large number of slaves and our build system has all slaves and our Hudson server on their own 1 Gbit switch so we’re able to move a lot of data quickly. Each of these jobs may be copying 2 to 4 (or more) gigabytes of data.

Our Hudson server is only about 6 months old and like our SVN server it’s a decent machine.

I’m not sure of the proper behavior here.

Some options might be:

  1. Delete the objects that have been already copied in that step and try again.
  2. Keep track of which objects were successfully copied and upon retry delete objects that were unsuccessfully copied and retry the copy on any objects matching the copy criteria that have not yet been copied.

Can there be a configurable timeout option? Would a timeout option be an easy way to reduce this issue from occurring?

MultiJob Project Column Headings And Status Information

Multijob.png

There are two columns that provide insufficient information.

“Last Success” shows a green or yellow status but there is no timestamp like on the main Hudson page.

“Last Failure” shows a red status if any existing builds failed. Since there is no timestamp the user has no way of knowing when the last failure occurred without looking at the job page that shows the build history.

We normally keep 5 builds of these types of jobs so when the user goes to the jobs page they’ll see which build failed. However, if more than one “page” of builds is kept the user would have to click “More…” to see which build failed.

The green, yellow and red balls for the “Last Success” and “Last Failure” should be timestamps.

Job Configurations

A section of a configuration can be expanded (i.e. “Advanced”) but cannot be collapsed. This is also the case within a step or section of a step; for instance, “Advanced” can be clicked on an Ant step but cannot be collapsed.

Expanding a section allows the user to see greater detail but keeping these sections expanded just makes the configuration longer and harder to follow.

Job Configuration Source Control

There is a plugin that provides an audit trail for who changes a job configuration. This should this be expanded into some kind of source control system?  

UI Navigation Enhancements

These notes are based on our configuration and address issues of scalability. We are finding that Hudson does not scale well to manage all of the jobs and nodes we currently have and our plans for the remainder of 2014.

We are producing documentation to manage our build environment. Much of this documentation would be unnecessary with changes to the information Hudson could provide.

General View

When a user connects to Hudson without logging in they see certain options/features:

  1. People
  2. Build History
  3. Job Relationship
  4. Check File Fingerprint
  5. Disk usage (if plugin is installed).

In addition they see the Build Queue and Build Executor Status (Node List).

I don’t think any of this information is applicable to anyone who hasn’t logged in.

The only information they should see is the System Message and any public jobs they have access to (team based).

After logging in a system administrator sees some more options including “Manage Teams”. Why isn’t “Manage Teams” an option under “Manage Hudson”?

The average user does not need to see the Build Queue or Node List.

Main Dashboard

The main dashboard hasn’t changed since I’ve started working with it.

It has a lot of information that includes links for system configuration, a list of all nodes that shows what jobs are running a list of jobs with many columns of information. (Am I missing anything?)

Most people don’t need to see any of the configuration options in the upper left corner. “New Job” may be the only option in this section that would be used by a non-administrator.

The average user does not need to see the Build Queue or Node List and what’s running on a particular node. They can see from the list of jobs if the job is running and get any pertinent information by clicking on the job. They need to know if the job is running and if it’s finished.

Maybe this node information shouldn’t be on the main dashboard page (to be addressed later).

It might be nice to be able to have a sort on the main dashboard page to allow jobs that are currently building to appear at the top of the screen.

Maybe there can be a plugin that allows a split screen with a separate section at the top for running jobs and queued jobs. Information on queued jobs is probably as important to know as what jobs are currently running.

Filtering / Organizing Job Information

I constantly get questions like “where can I find the installer for ‘X’?”

I’ve tried very hard to create tabs and organize jobs and job names logically but the use of these various lists becomes confusing when you have 100+ jobs. (We’re at about ~100 and growing rapidly).

In addition to the number of jobs we have about 25 teams. These teams are based on functional group as well as product and product version. We use this scheme partly for proper authorization and partly for organization. The management side is not that difficult but the delivery of information and artifacts is confusing to the user. I think this is also related to the original Hudson 2.x UI that probably did not assume the number of jobs and job visibility.

I’ve tried to set up tabs that logically break up jobs into product group. In many cases a person will not have access to any jobs under that tab but the tab is there anyway and the person gets confused on seeing empty tabs.

In general I’m looking for ways to make a person’s view of their environment as clear as possible based on a growing number of teams and jobs.

In addition, the Hudson dashboard is very wide. I’ve added tabs to help break up jobs into views based on various criteria. However, most people see tabs that are empty because of job authorization. Would it be possible to hide tabs that a user does not have access to?

I’ve looked at our old 2.2.1 Hudson server that has only 3 tabs. The dashboard is still too wide to fit on a standard sized screen. This requires the user to scroll back and forth to see all the available information. Maybe the user can toggle a “minimized” view that does not display most of the job information columns.

UI Navigation Enhancements

These notes are based on our configuration and address issues of scalability. We are finding that Hudson does not scale well to manage all of the jobs and nodes we currently have and our plans for the remainder of 2014.

We are producing documentation to manage our build environment. Much of this documentation would be unnecessary with changes to the information Hudson could provide.

Main Dashboard

Nodes.png

As mentioned earlier, this list can become more trouble than it’s worth when there are a large number of nodes. We’re planning to have, by the end of 2014, around 70 nodes. Finding any useful information on a list that long – even with logically named nodes – is very difficult. With the use of labels in node configurations that logically groups nodes into a pool this list becomes even less valuable.

As our list of nodes grows I find this list less and less useful. We will quickly hit a point when this list will be useless; there will be more noise than information.

Add a filter to allow the user to see only those nodes that are currently running jobs. In some cases this would help because during the day we only run a small subset of jobs. During a full job run we might have 20 nodes active; however 20 nodes is better than 70.

Node List

Node-list.png

This screen has some valuable information. It does not show running jobs but it does show information about the slave. For me the most valuable information here is the Clock Difference. It helps us diagnose problems with source checkouts and made us install proper “time server” sync software on our Windows VM slaves and automated hourly time syncs with time servers on our Linux VM slaves.

It also provides free disk space information (which I have yet to investigate) and the operating system of the slave.

It does not show information about node groups.

It does not show information about what physical machine would be hosting a virtual machine.

Node Detail

Node-detail.png

Defining a node allows, among other things, the assignment of a “label”. This label can be used to create “node groups”.

When a job is assigned to a node group there is no panel to show where a job is assigned. The only way to know where the job is assigned is to go back into the job configuration.

It would be nice to have a panel similar to the node list panel shown above in #2 that would be a list of node groups rather than nodes.

Clicking on one of these node groups could either open up a panel (as shown above) that shows nodes assigned to that node group and jobs assigned to that node group.

An alternative, which would be nicer, would be to allow the user to expand a node group from the list of node groups. This would show you the node and job information. This would allow the user to get a better overview of their system.

Node Views

I would like to have a panel that displays a logical grouping of slaves.

This display would be user defined.

It would allow users to group nodes together in a logical view to help give an overview of their slave farm configuration.

Our use case would be that we have physical slaves that run VirtualBox VMs. We do not run any jobs directly on the machine.

Right now this information is on PowerPoint printouts I keep updating and tacking to a whiteboard in my cube as well as a wiki.

It would make sense to have this information in a location that’s on the system that manages the definition of nodes.

This information should be available to non-system administrators because developers need to get onto a slave / VM to update software or clean up artifacts.

Node2.png Node3.png

Allowing a user to define multiple overview panels would give the user the flexibility to view their environment in different ways.