Skip to main content
Jump to: navigation, search

SimRel/Simultaneous Release Engineering

This page is to outline the main steps that the "Simultaneous Release Engineer" must do for various stages of the release and for some special cases.

It also includes "general interest" sections such as an overview of the repositories used and the concepts behind the Jenkins jobs.

This documentation is meant to be an overview or orientation. In the Jenkins jobs themselves and in the scripts discussed here there is much more detail on specifics.

Please add or modify this wiki page if omissions or errors are noticed so that over time it will get better and stay current and accurate.

Repositories

Branches

All releases (milestones, release candidates and final release) are built from the 'master' branch of org.eclipse.simrel.build. Since the Eclipse IDE Photon release there are no update releases anymore and hence no <name>_update branches.

Tools and Utility scripts

The "tools and utilities" used for the build are from org.eclipse.simrel.tools and there we always use "master" with variables for things like "trainName" which effect the URLs generated, etc.

Of all the tools and utilities the most important is the build.xml. This will build the "data" repository (it assumes it is already checked out correctly by Jenkins, or user if running locally). Technically it will run by simply invoking "ant" (Ant uses "build.xml" by default).

By specifying some properties that are specific to "Eclipse.org" infrastructure, via the "production.properties" file, the build can be more efficient and reliable. For example, the build.xml script converts all the "https://download.eclipse.org" URLs to their local files system equivalents: "file:///home/data/httpd/download.eclipse.org". In theory we'd like to think "it should not matter much", but it does seem to matter -- I suspect because for such a large number of repositories and such a large number of artifacts (and, so many aggregation builds!) that p2 (via the CBI aggregator) is hitting the 'http' server very hard.

The org.eclipse.simrel.tools repository also includes many useful utility scripts that are not necessarily used often, but are needed by the entire process of "doing a release".

Jenkins Jobs

There are two jobs that run the aggregator build:


The Gerrit validation job automatically runs on new Gerrit reviews and checks only that the requirements and version constraints all fit together. If the validation is successful, the change can be merged. Then the full build runs (it polls for changes every 10 minutes).

It is helpful to use this two-step approach to catch basic errors early and fast. Some errors only appear in the full build, but usually a lot less often.

This usually works, but not always, depending on how the build was broken -- for example "repository not found" can effect the whole build at any point in time, since if someone deletes their repository that is mentioned in their aggrcon file then there is nothing the aggregator or Jenkins can do. But, people should really not be doing that, and usually require some education on correct procedures if that happens frequently from the same project. Typically, if it happens at all, it was just an accident based on a typo or something, now that most projects have been educated. :)

Releng jobs

In addition to the above two jobs, there are also "releng jobs":

Process steps

Routine Aggregation Builds

Most of the time, the release engineer simply needs to keep an eye on the builds and if it fails, investigate to the point of knowing if a project did something wrong or if the Jenkins job itself is failing for some other reason. The former cases (project issues) are usually documented in the Simultaneous Release FAQ in the Common errors and what to do about them section. The release engineer's role in that case is to simply communicate with the project and make sure they are "working on it". In some cases, such as someone has "broken the build" and then already gone home for the night, a contribution might need to be disabled until the project fixes their issues. For Gerrit validation builds, such proactive communication is not necessary. It is required for the full aggregation build since if someone "breaks the build" it could prevent others from contributing.

In the other main case, that is, Jenkins job issues, the errors are usually something strange, such as "lost connection", or a corrupt clone of the repository. In most cases, if the problem is not obvious from reading the log, the procedure to follow, is

  • a) simply try again, and see if same error occurs,
  • b) if it does, try "manually" cleaning the workspace via the web interface and see if an error occurs,
  • c) if it does, then try restarting the Jenkins instance and see if the error still occurs, and
  • d) if it does, then actually start detailed debugging to see what the issue is.

There may be some cases where is it not clear if a failure is a project issue or an infrastructure issue and in those cases, the first step is usually to discuss or communicate with the project to see if they know what the issue is or if they are working on it.

Note.png
Notifications
The SimRel release engineer needs to be added to the Jenkins jobs email notifications to be notified of any failures.


The other thing to do "continuously" is to monitor the cross-project mailing list, and the cross-project Bugzilla component, so see if anyone has an issue with the build that the release engineer needs to help with. Sometimes, it may be more of a "Planning Council chair person's" question, or even a "peer-to-peer" project question, but, best to always asks if any doubt.

Milestones and Release Candidates

Overview

Action Release
Milestones & RC1 RC2

1. a week before the scheduled time, check *.aggrcon files
2. monitor cross-project list for "extension" requests
3. "staging is complete"

4. "promote to releases"

yes

5. "make visible"

yes yes*

6. On the release day

  • Run "checkMirrors.sh" locally a few hours before "makeVisible" job
  • Check that "makeVisible" job runs successfully
  • Send a note to cross-project mailing list that Milestone/Release Candidate/Release is available
    • Provide links to releases repository and EPP builds

7. After the release

  • Re-enable the aggregation
  • Check repo reports
yes

yes*) Schedule the "makeVisible" job not for RC2 release date (usually Friday), but GA release date (usually Wednesday after the "quite period").

Details

  • A week or so before the scheduled time, check the *.aggrcon files to make sure no projects or features are disabled (enabled="false") and if so, send a reminder to the cross-project list asking if the project is aware of that and help resolve the issue, if any.
  • As dictated by the schedule (such as see 2019-06 schedule) monitor the mailing lists to see if anyone has asked for an "extension" to the scheduled time.
  • When staging is complete (i.e, no extensions requested, and no jobs running) announce on the cross-project list that "staging is complete" and disable the aggregation job.
  • Shortly after the announcement that "staging is complete" use the job named simrel.releng.promoteToReleases to copy what is in staging to the appropriate releases directory. This allows mirroring of the artifacts to begin so that a number of mirrors will be available at the time it is "made visible".
  • We do not "make visible" RC2 at the time RC2 is done -- since that is really the "final build". We only make it visible on the final release day (usually a Wednesday).
  • Schedule the simrel.releng.makeVisible job to run at the schedule time (usually 9:30 on Friday, for a 10:00 availability -- the extra 30 minutes being used to sanity check things, and make sure all is well).
  • This requires not only the time be set, but also the default "trainName" and "checkpoint" build parameters, since it is not an "interactive" job.
  • Monitor that "makeVisible" job at the time it is scheduled to run, along with the EPP counter part. Simultaneously chat with the EPP project lead (or release engineer) to make sure all is well from that end. Simultaneously run a short, manual "check for updates" action from Eclipse IDE itself as confirmation that all is as expected after the "makeVisible" job runs. Note: there is also a simrel.releng.sanityCheckComposites that is intended to run automatically (or can be ran manually) but the point of the "short manual, check for updates" step is to confirm things work when not on the Eclipse.org infrastructure.
  • During the day or hours before "making visible", run the "checkMirrors.sh" job (on non-infrastructure machine and network) to make sure the mirrors are populating. If it appears that, by the time of "making visible" for general availability, there will be less that 3 or 4 mirrors, it is best to discuss with the webmaster to see if the "make visible" step should be postponed, or if the mirror synchronization can be sped up. Note, the checkMirrors script typically requires a manual edit for each new repository that is being "made visible". And, best to include some downloadable artifacts in that query (such as one or two EPP artifacts) in addition to the repository directory.
  • Send a note to cross-project list that XYZ is available.
  • Re-enable any jobs that were disabled (assuming it's not the "final" release).
  • After each milestone or release candidate it is best to check the "repo reports" to see if there are any especially egregious errors or omissions. Some examples might be if a project is not signing any of their jar files or if the the "versions" of bundles or features decrease when compared with reference repository. (See bug 500224 for info on the "reference repository". I *think* I have fixed the routine cases but every "major release" the reference repository will need to me manually edited in the scripts, until that bug it fixed, and even then, a property will need to be updated.)
Note.png
Responsibility
Note that projects (Projects Leads and PMCs) are technically responsible for the quality of the repository, not the release engineer! It helps if the release engineer encourages them and reminds them to look! :) At least until someone improves the tests to cause "failures" for cases that should be failures.


Shortly before final build

A week or so before the final build, it is best to remind everyone (via cross-project list) what the schedule is, and to point them to (or create) a "Final Daze" document.

During quiet week before general availability

  • Make sure the Info Center is created.
  • run the "promoteToRelease" script, if not already done. (It is best to wait until quiet week, since someone might ask for a rebuild prior to that, and there should still be enough time to mirror.)

See also the SimRel/Release_Checklist.

Shortly after general availability

  • Best to tag the two repositories with a "human readable tag", such as "2019-06". This will make future comparisons, if needed, easier.
    • Note: Just because it is "tagged" it may not be reproducible since it depends on the projects having the correct permanent URL in the aggrcon file. In the past, projects have been encouraged to update that URL, but not all do, and it is not typically double checked, or anything.
    • Note: the commit hash of every build is saved away in a file under "buildinfo" for each repository we create.

And then ...

Do it all again! :)

Keep Jenkins builds and web pages up to date

SimRel

//TODO: update this section - Start

The build.xml file (and probably others) have some items that should "stay current". Such as the version of the platform we use to "do the build" or "run the tests". These are specified very specifically, instead of just "getting the latest" in order to have builds that are more reliable and reproducible. But it does mean whenever a new build of the platform comes out, we should update the URLs, etc. from where we get the platform.

In the simrel.aggrcon file itself, there is a property, named 'buildRoot' (or 'Build Root' if using the property page GUI from Aggregation Editor) that should be update every major release to something like 'buildresultsPhoton' or similar. The "buildresult" part since the .gitignore file says to ignore '/buildresult*' files and the last part, such as release name 'Photon' so that there can be different areas for the current stream and the update stream.

The "web page" part of this task needs to be done every release. It can be seen at the top of the SimRel Jenkins instance there are some links in the description to the "repo reports" for the historical record of links for previous releases. The later needs to be added every release.

//TODO: update this section - End

Aggregator and Analyzers

The other builds that need to be "kept up to date" are the CBI Aggregator builds, and p2repo Analyzer builds. See the "p2RepoRelated builds". These are mostly just a matter of updating the pre-reqs and targets so they stay current with the latest released version of the platform and whatever else in our prereqs changed with the latest Simultaneous Release.

The Bugzilla components for these items also need to be "watched" for major problems or contributions. While this is mostly a "CBI community activity" (i.e. not the Simultaneous Release engineer's responsibility) in practice the release engineer needs to pay attention since a carelessly made change could impact the Simultaneous Release.

Remove inactive projects

This activity is needed primarily after M3, which is the deadline for projects to declare if they plan to participate or not. But, it can come up at other times, if a project states that they have changed their mind and will not participate. The release engineer needs to be involved since, presumably, if a project is no longer interested in participating, there is no one particularly interested in making sure their contribution file is removed. The actual list of projects to remove is worked out by collaborating with the Planning Council (and, Wayne).

It is best to start with simply disabling the contribution, in the aggrcon file. This can be done by adding 'enabled="false"' to the '<contribution' element. Then "validation aggregation" to see if the removal of those projects breaks anyone else. If it does, coordinate via Bugzilla and the cross-project list on what what projects want to do about it.

Once you are ready to physically remove the files related to the contribution, it is best to "re-enable them" (in your workspace) and use the CBI Aggregator Editor to remove them. Ideally this will also remove any stray features that are in custom categories, but, it may not always, in which case it needs to be cleaned up "manually" in the simrel.aggr file. Once the contribution has been removed from the simrel.aggr file, the actual aggrcon file can be deleted.

Remove inactive committers

Roughly once per year, inactive committers should be removed from the "callisto-dev" group. This is primarily a "server hygiene" sort of task. Technically there is no problem letting the group membership grow larger and larger, but as always, best that people only have permission where they really need it.

The principle if deciding if a committer is inactive is if they have not committed anything since the previous major release (so, roughly have not committed anything in a year and 3 months).

There are some scripts in 'org.eclipse.simrel.tools' under 'reportUtilities' that can help with the git queries to determine who has been active and who has not. The hard part is that occasionally someone does "commits" with different email Ids. That is why we have the ".mailmap" file in the 'org.eclipse.simrel.build' project, so that over-all we can keep a record of "who is who" as "many to one" mappings are found.

Also, it is important that a bug be opened and the "*proposed* list of committers to remove" be posted there to give people a chance to say if our scripts (or .mailmap) is wrong, or if even though they have been inactive, they still need write access. Once the dust settles from that bug (give it 2 to 4 weeks) and a firm list of removals is known, the actual list of committer ids to remove can be given to the webmaster for removal from the Linux group.

How to do a re-spin

Overview

It should be an absolute exception and not a regular thing! A "respin" means to redo a repository (typically milestone or an actual release) at some point significantly after its initial deadline has passed, when more care and control is desired over the input and output.

A common reason for needing one is that it might be discovered during "quiet week" that there is a serious bug in one component that has "cross project" implications (such as, functional issues might prevent PHP and XML from both being ran in the same workspace). Another example is if a third-party bundle has been included that was later found to be "unacceptable" from a licensing point of view.

Note.png
Responsibility
It is the Planning Council (not "release engineering") that decides if a respin is warranted but typically they would want the input of release engineering as well. Also note, if a respin is done near the end of quiet week, this usually implies an automatic delay of one-week for the general availability -- i.e. no need to do an all-nighter. :)


The method by which care and control is achieved is that the previous candidate repository provides the "input" for all of the projects (via aggrcon files) except for the one or two projects that are contributing to a respin. This is done since some of the URLs or contents at the URLs may have changed since the candidate release either intentionally or accidentally. In a perfect world, each project would maintain their repositories and their aggrcon files such that the candidate build could be reproduced exactly, but there are always a few projects that do not, so its easier to "force" the exact same build, by changing the input source, rather than trying to get everyone lined up to have the correct files and repositories to reproduce the previous candidate release.

Steps

  • create a branch of org.eclipse.simrel.build project from the commit hash (or tag) of the release for which we are doing a rebuild. Name it something obvious like "2019-06_respin_branch". The important part is that all the feature ids and versions match exactly what was built before. (We will be re-doing the URLs). Note: the commit hash of every build is saved away in a file under "buildinfo" for each repository we create.
  • With that branch loaded in your workbench, run a utility in your workbench, which is in the org.eclipse.simrel.tools repository (master branch) in a directory named transformToOneRepo. In that directory is an XSL file named changeAllRepos.xsl and an Ant file, which runs the XSL Transform, named changeAllRepos.xml.
  • Before running the utility, specify two parameters on "command line" of the ant job, so to speak: from the "external tools" configuration, under the JRE tab:
- newRepository: The first parameter changes each repo in each aggrcon file (by using the XSL file) to point to the specific, existing repository that we are rebuilding.
- Example: -DnewRepository=https://download.eclipse.org/releases/neon/201609281000/
- javax.xml.transform.TransformerFactory: The second parameter is for the precise XSL Transformer to use. This parameter may not always be required. It depends on the JRE you are using.
- Example: -D=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl

The above is all merely "preparation". After running the Ant file (which runs the XSL Transform) it is recommended to commit that change to the newly created branch so that the next commit will cleanly show "what is changing".

  • As the next step, you want to change the one or two aggrcon files for the projects that are contributing to the respin. Those one or two files will point to the new URL that has their fix. Usually both "versions" and the repository location have to be changed in the aggrcon file. The project(s) at least provide the data to use if not actually make the change themselves. Commit that change, and make sure only the desired differences exist.
  • Once all that is done, it is good to "validate" and "validate aggregation" using the CBI aggregator editor in the IDE to make sure the basics are correct.
  • Create a new Jenkins job that uses the newly created branch. Typically, a copy of an existing job is the only Jenkins job that is required. After the copy is made, edit its configuration to modify the branch that is checked out by Jenkins. Run that job manually, and let it trigger a "promoteToStaging" as usual.
  • Once that new "staging" repository has been created, a "p2Diff" should be ran comparing the staging repository with the previous candidate to confirm the only things changed were what was expected to change. (In reality, occasionally one or two other things might change, simply because p2 is not completely deterministic and has some heuristics to avoid "near infinite optimization". But if any doubt, ask the projects or cross-project list if anyone is concerned -- typically the unexpected changes are "good", such as the candidate repo may have two versions of a bundle, but the respin repository has only one version of that bundle).
  • After this new staging repository is confirmed accurate, then the previously described steps to "promote" and "make visible" would be followed, according to what ever schedule the Planning Council came up with for the respin. Typically, the EPP packages are re-created also, and typically at least some projects do some more functional testing (but there is firm rules or "signoff" process that applies to all cases).

Back to the top