Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Hudson-ci/Planning/Long Term Build History

< Hudson-ci‎ | Planning
Revision as of 06:05, 13 November 2011 by Henrik.hlyh.dk (Talk | contribs) (New page: == Business case == Our Hudson instance(s) has to support multiple projects and deployment pipelines.These projects (and our managers) has a interest in measuring how well the build perf...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Business case

Our Hudson instance(s) has to support multiple projects and deployment pipelines.These projects (and our managers) has a interest in measuring how well the build performs and how smoothly the deployment pipeline executes. This involves extracting certain metrics and logs for further analysis such as

  • Time to run a build, success rate and time to fix a broken build, 
  • How long a certain part of the deployment pipeline takes. e.g. how long does the create domains step take
  • How long does a full environment deployment take, and feedback time from check-in to verified deployment
  • Success rate for deployments and time to fix
  • Reasons for deployment to fail

In order to get statistics for the above reporting we need to keep a certain amount of history preferably 2 weeks or more.


Problem:

Based on our observations one of the biggest impediments to Hudson's scalability is the amount of history kept. The reason being that all builds are being loaded into memory with a decent amount of metadata, but it is not only memory consumption that suffers. The issues we are seeing are:

  • Hudson generally runs slower as history builds up, meaning that the user interface gets less responsive
  • Sometimes the Hudson master begins to take 100% CPU (or at least 100% of one CPU core) even though no builds are executed on master (requires a restart)
  • Reboot time increases from 2 minutes upwards of 15-20 minutes
  • Plugin changes becomes increasingly painful since any removed/changed plug-ins will fill the log file with class not found exceptions during deserialization. This error handling are most likely a contributor to the increase in reboot time.

The last problem can be worked around by ejecting the part of the build history which has the offending XML fragments, but that defeats the purpose of keeping longer history


Suggested solution #1 (by henrik)

Currently the LogRotator supports 3 levels of history

  1. Everything kept
  2. Artefacts removed but metadata and log kept
  3. Build removed


One solution could be to add another level to the build history called archevied. In this level metadata and the log is kept but isn't cached in memory. If the metadata is needed the build is loaded into a LRU cache (of fixed size). 

Combined with a more intelligent log rotation, we should be able to greatly reduce the number of builds kept in memory. For a better log rotation I would like:

  • It should be a extension point for plugin developers to use
  • Enhance the build promotion feature (provided by the build-promotion-plugin), to support the notion of relative quality levels. This way we will be able to evict builds like "archive all builds with lower promotion level" or "keep only 3 builds of this level", "keep artefact of this build until a new build of same promotion level appears"


Suggested Solution #2 (by henrik)

Another approach could be to have a plugin  which listens to build event and pushes the relevant metrics to a SQL database. With published database schema Hudson users could extract any information and use whatever graphing tool like to visualize the data without having to wait for Hudson to implement just the graph they need or developing their own plugin.

Back to the top