Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SimRel/Simultaneous Release Engineering"

m (Add item for 'buildRoot' property update.)
(SimRel)
Line 197: Line 197:
 
In the simrel.aggrcon file itself, there is a property, named 'buildRoot' (or 'Build Root' if using the property page GUI from Aggregation Editor) that should be update every major release to something like 'buildresultsPhoton' or similar. The "buildresult" part since the .gitignore file says to ignore '/buildresult*' files and the last part, such as release name 'Photon' so that there can be different areas for the current stream and the update stream.  
 
In the simrel.aggrcon file itself, there is a property, named 'buildRoot' (or 'Build Root' if using the property page GUI from Aggregation Editor) that should be update every major release to something like 'buildresultsPhoton' or similar. The "buildresult" part since the .gitignore file says to ignore '/buildresult*' files and the last part, such as release name 'Photon' so that there can be different areas for the current stream and the update stream.  
  
The "web page" part of this task needs to be done every release. It can be seen at the top of the main SimRel Jenkins instance (namely, https://ci.eclipse.org/simrel/) there are some links in the description to the "repo reports" for the current last successful "clean builds" as well as an historical record of links for previous releases. The later needs to be added every release. The former only once a year or so just to correct the names and the exact URLs that are pointed to.
+
The "web page" part of this task needs to be done every release. It can be seen at the top of the [https://ci.eclipse.org/simrel/ SimRel Jenkins instance] there are some links in the description to the "repo reports" for the current last successful "clean builds" as well as an historical record of links for previous releases. The later needs to be added every release. The former only once a year or so just to correct the names and the exact URLs that are pointed to.
  
 
== Aggregator and Analyzers ==  
 
== Aggregator and Analyzers ==  

Revision as of 08:00, 16 February 2018

This page is to outline the main steps that the "Simultaneous Release Engineer" must do for various stages of the release and for some special cases.

It also includes "general interest" sections such as an overview of the repositories used and the concepts behind the "multi-step" Jenkins jobs.

This documentation is meant to be an overview or orientation. In the Jenkins jobs themselves and in the scripts discussed here there is much more detail on specifics.

Please add or modify this wiki page if omissions or errors are noticed so that over time it will get better and stay current and accurate.

Repositories

Branches

Milestones and initial releases are built from the 'master' branch of org.eclipse.simrel.build. Update/Service releases are built from the <name>_update branch (e.g. Oxygen_update). The <name>_update branch is created from master in late June or early July, as we transition from building the "main" release, to building its corresponding "update" releases.


Note.png
Naming the <last release> branch
The branch used to be called <name>_maintenance (e.g. Neon_maintenance). Since the Oxygen release it's called <name>_update since we now present the builds following the main release as "updates" and not as strict, minimal change "maintenance" builds.


Note.png
Never use master branch
There has been a discussion (see Bug 519843) to never use a 'master' branch and always start with N+1_update by branching N_update. That way, committers would never have to change branches from "master" to "X_update" when working on that stream as we go from the "initial release" phase to the "update" phase. This has been declined, since committers will always have to switch between streams no matter what the branches are called.


Tools and Utility scripts

The "tools and utilities" used for the build are from org.eclipse.simrel.tools and there we always use "master" with variables for things like "trainName" which effect the URLs generated, etc.

Of all the tools and utilities the most important is the build.xml. This will build the "data" repository (it assumes it is already checked out correctly by Jenkins, or user if running locally). Technically it will run by simply invoking "ant" (Ant uses "build.xml" by default).

By specifying some properties that are specific to "Eclipse.org" infrastructure, via the "production.properties" file, the build can be more efficient and reliable. For example, the build.xml script converts all the "http://download.eclipse.org" URLs to their local files system equivalents: "file:///home/data/httpd/download.eclipse.org". In theory we'd like to think "it should not matter much", but it does seem to matter -- I suspect because for such a large number of repositories and such a large number of artifacts (and, so many aggregation builds!) that p2 (via the CBI aggregator) is hitting the 'http' server very hard.

The org.eclipse.simrel.tools repository also includes many useful utility scripts that are not necessarily used often, but are needed by the entire process of "doing a release".

Jenkins Jobs

There are three main steps to a complete "run" of related Jenkins jobs:

The successful completion of one job triggers the next job in that sequence. This is done that way entirely to provide quicker feedback to those making contributions and each corresponds to the similarly named functions in the CBI Aggregator.

Also, it is helpful to use this three-step approach because different errors may show up at each step. Typically, the largest errors are spotted in the quickest, 'validation' job. There are different errors that can show up in 'cached builds' and 'clean builds' jobs which are typically more subtle and which occur less frequently.

These three jobs are meant to be related by the exact commit hash used for the initial "Validation" job. Hence, that "commit" is passed from one job to the next, by the magic of Jenkins. The reason for this is simply to increase the odds that a contribution that validated successfully will create a new staging repository. If someone else comes along after and contributes something that "breaks the build", we do not want that first contribution to get held up, simply because someone after them broke the build. [This usually works, but not always, depending on how the build was broken -- for example "repository not found" can effect the whole build (all the jobs) at any point in time, since if someone deletes their repository that is mentioned in their aggrcon file then there is nothing the aggregator or Jenkins can do. But, people should really not be doing that, and usually require some education on correct procedures if that happens frequently from the same project. Typically, if it happens at all, it was just an accident based on a typo or something, now that most projects have been educated. :)

Validation Build

A Validation Build is the quickest, as it checks only that the requirements and version constraints all fit together.

The 'validate' jobs are triggered in two ways: first, any change in the 'build' repository (It is currently set to check for changes every 5 minutes). Second, there is a periodic trigger that will cause the validate job to run every Monday morning. The purpose of this periodic trigger is simply to help catch cases early where someone has changed (removed) their repository but forgotten to update their aggrcon file. The validate job can be triggered manually by any SimRel committer. All the other jobs (cached build, clean build, and promote to staging) are triggered by a previous build job and they purposely can not be triggered manually (except by the release engineer).

Validation Gerrit Build

The Validation Gerrit Build (e.g. simrel.photon.runaggregator.VALIDATE.gerrit) is exactly the same as the Validation Build, except it runs from the Gerrit refspec, instead of the tip of the branch. This is very useful since most errors with contributions will show up during the "validation job", so this prevents something being committed to the branch that would "break the build" and allows the committers to fix their contribution before that point.

The Gerrit jobs are triggered by pushes to /refs/for/<branch>.

Cached Build

A Cached Build is fairly fast since even though it "downloads the artifacts", it does so only if they do not already exist in its cache, so typically the download time is a LOT faster than for a Clean Build.

Clean Build

A Clean Build as the name implies removes any previously cached information or artifacts and builds the repository "from scratch". And, it takes a long time. It takes roughly 2 hours, even when running on "Eclipse.org" infrastructure (longer if running remotely).

The "clean build" takes so long that if many people are contributing near the same time that others are (which is common, right before the deadlines), the "clean build" can get backed up and result in a very long queue that can take a day, or so, to run every project's commit. That is why where is a small groovy script, called 'clearCache' that runs at the start of every "clean build". If at the start of a clean build, that script finds there are other "clean builds" waiting in the queue, then it simply cancels the current clean build before it starts, and allows the next "clean build" in the queue to run, which also checks if any others are waiting in the queue. Once a clean build gets started, however, it runs to completion. That is, no job is "interrupted" when a new one comes into the queue. While this means we do not have a perfectly one-to-one mapping of "each commit" getting a "complete Jenkins build", in most cases it is pretty close, and if several commits all passed the "cached build" step, then chances are they are all "good to go" for the "clean build" step (that is, running each separately would not find any "new" errors, and they do not often interfere with each other at that point.)

Note.png
Changes to base platform
If there are changes to the base platform or any of the tools that are installed into it, then the Jenkins workspace needs to be manually cleaned, from Jenkins's web app page. This is because the currently installed instances are never reinstalled, if they already exist, for efficiency.


Releng jobs

In addition to the above 4 jobs, there are also several "releng jobs". These are typically run manually (such as simrel.releng.promoteToReleases) or at a pre-specified day and time (e.g. simrel.releng.makeVisible).

Process steps

Routine Aggregation Builds

Most of the time, the release engineer simply needs to keep an eye on the builds and if it fails, investigate to the point of knowing if a project did something wrong or if the Jenkins job itself is failing for some other reason. The former cases (project issues) are usually documented in the Simultaneous Release FAQ in the Common errors and what to do about them section. The release engineer's role in that case is to simply communicate with the project and make sure they are "working on it". In some cases, such as someone has "broken the build" and then already gone home for the night, a contribution might need to be disabled until the project fixes their issues. For Validation_Gerrit jobs, such proactive communication is not necessary. It is required for the others since if someone "breaks the build" it could prevent others from contributing.

In the other main case, that is, Jenkins job issues, the errors are usually something strange, such as "lost connection", or a corrupt clone of the repository. In most cases, if the problem is not obvious from reading the log, the procedure to follow, is

  • a) simply try again, and see if same error occurs,
  • b) if it does, try "manually" cleaning the workspace via the web interface and see if an error occurs,
  • c) if it does, then try restarting the Jenkins instance and see if the error still occurs, and
  • d) if it does, then actually start detailed debugging to see what the issue is.

There may be some cases where is it not clear if a failure is a project issue or an infrastructure issue and in those cases, the first step is usually to discuss or communicate with the project to see if they know what the issue is or if they are working on it.

Note.png
Notifications
The release engineer needs to not only be listed as "build master" in the simrel.aggr file (which will cause them to be CC'd for any build failure) but also subscribe to the RSS feed from the Jenkins jobs. This is because the aggregator itself will not send mail for all failures, even some originating from the aggregator (such as for "inconsistent model"), and certainly will not send mail from failures due to infrastructure problems. Both sources of mail need to be "continuously" monitored.


The other thing to do "continuously" is to monitor the cross-project mailing list, and the cross-project Bugzilla component, so see if anyone has an issue with the build that the release engineer needs to help with. Sometimes, it may be more of a "Planning Council chair person's" question, or even a "peer-to-peer" project question, but, best to always asks if any doubt.

Routine Milestones and Release Candidates

Overview

Action Main release Update release
Milestones & RC1-RC3 RC4 RC1-RC3 RC4

1. a week before the scheduled time, check *.aggrcon files
2. monitor cross-project list for "extension" requests
3. "staging is complete"

yes

4. "promote to releases"

yes yes* no yes*

5. On the scheduled day

  • Run "checkMirrors.sh" locally a few hours before "makeVisible" job
  • Check that "makeVisible" job runs successfully
  • Send a note to cross-project mailing list that Milestone/Release Candidate is available
    • Provide links to releases repository and EPP builds
yes yes* no yes*

6. After the release

  • Re-enable Validation Build job (except for "final" build")
  • Check repo reports
yes

yes*) Schedule the "makeVisible" job not for RC4 release date (usually Friday), but GA release date (usually Wednesday after the "quite week").

Details

These are some items done specifically for "milestones" and "release candidates".

Note.png
Update releases do not have milestones
As of this writing, for update releases we do not have any milestones, only "release candidates".


Note.png
Only milestones and RCs of main release are promoted to releases repository
It is only for milestones of the "main" release, that we put milestone and release candidates in ".../releases/<trainName>". We do not do so for the update releases since that URL is already in use for the official release.


Another minor point, we do not promote (i.e. make visible) RC4 at the time RC4 is done -- since that is really the "final build". We only promote it (i.e. "make visible") at the time of the final release day.

  • A week or so before the scheduled time, check the *.aggrcon files to make sure no projects or features are disabled (enabled="false") and if so, send a reminder to the cross-project list asking if the project is aware of that and help resolve the issue, if any.
  • As dictated by the schedule (such as see Oxygen schedule) monitor the mailing lists to see if anyone has asked for an "extension" to the scheduled time.
  • When staging is complete (i.e, no extensions requested, and no jobs running) announce on the cross-project list that "staging is complete" and disable the "Validation" job. (Disabling the Validation job usually suffices, since it triggers all the subsequent jobs, but you can disable the "promote to staging" one too, if you are paranoid about it :) since it is the "promoteToStaging" that might mess up the EPP build because the EPP builds are done against the staging repository.
  • [NOTE: this step is done only for the "main" build, not "update releases" -- well, it is done for "RC4" of the update release, since that is the "final release".] Shortly after the announcement that "staging is complete" use the job named simrel.releng.promoteToReleases to copy what is in staging to the appropriate releases directory. This allows mirroring of the artifacts to begin so that a number of mirrors (though usually not all) will be available at the time it is "made visible".

Make visible job

  • Schedule the simrel.releng.makeVisible job to run at the schedule time (usually 9:30 on Friday, for a 10:00 availability -- the extra 30 minutes being used to sanity check things, and make sure all is well).
  • This requires not only the time be set, but also the default "trainName" and "checkpoint" build parameters, since it is not an "interactive" job.

Check Mirrors

  • During the day or hours before "making visible", run the "checkMirrors.sh" job (on non-infrastructure machine and network) to make sure the mirrors are populating. If it appears that, by the time of "making visible" for general availability, there will be less that 3 or 4 mirrors, it is best to discuss with the webmaster to see if the "make visible" step should be postponed, or if the mirror synchronization can be sped up. Note, the checkMirrors script typically requires a manual edit for each new repository that is being "made visible". And, best to include some downloadable artifacts in that query (such as one or two EPP artifacts) in addition to the repository directory.
  • Monitor that "makeVisible" job at the time it is scheduled to run, along with the EPP counter part. Simultaneously chat with the EPP project lead (or release engineer) to make sure all is well from that end. Simultaneously run a short, manual "check for updates" action from Eclipse IDE itself as confirmation that all is as expected after the "makeVisible" job runs. Note: there is also a simrel.releng.sanityCheckComposites that is intended to run automatically (or can be ran manually) but the point of the "short manual, check for updates" step is to confirm things work when not on the Eclipse.org infrastructure.
  • Send a note to cross-project list that XYZ is available.
  • Re-enable any jobs that were disabled (assuming not the "final" release).
  • After each milestone or release candidate it is best to check the "repo reports" to see if there are any especially egregious errors or omissions. Some examples might be if a project is not signing any of their jar files or if the the "versions" of bundles or features decrease when compared with reference repository. (See bug 500224 for info on the "reference repository". I *think* I have fixed the routine cases but every "major release" the reference repository will need to me manually edited in the scripts, until that bug it fixed, and even then, a property will need to be updated.) Note that projects (Projects Leads and PMCs) are technically responsible for the quality of the repository not the release engineer, but it helps if the release engineer encourages them and reminds them to look!.:) At least until someone improves the tests to cause "failures" for cases that should be failures.

Shortly before final build

A week or so before the final build, it is best to remind everyone (via cross-project list) what the schedule is, and to point them to (or create) a "Final Daze" document.

During quiet week before general availability

  • Make sure the Info Center is created.
  • run the "promoteToRelease" script, if not already done. (It is best to wait until quiet week, since someone might ask for a rebuild prior to that, and there should still be enough time to mirror.)

Shortly after general availability

  • Best to tag the two repositories with a "human readable tag", such as "Neon.2" so future comparisons, if needed, will be easier. Note: Just because it is "tagged" it may not be reproducible since it depends on the projects having the correct permanent URL in the aggrcon file. In the past, projects have been encouraged to update that URL, but not all do, and it is not typically double checked, or anything. Note: the commit hash of every build is saved away in a file under "buildinfo" for each repository we create.
  • After an "initial release", the main branch must be forked to be <trainName>_updates and announce the change on cross-project list. [Note: details of this item may change slightly if the procedures are changed, as described under the 'Repositories' section above.]

And then ...

Do it all again! :)

Keep Jenkins builds and web pages up to date

SimRel

Currently, our Jenkins jobs contain the name of the release that they are for. That is just for clarity to users (committers) looking at the page. So, once per year, when one stream ends and another begins, the jobs need to be "copied" with new names, and then their configuration changed to point to the correct branches to build.

Additionally, the build.xml file (and probably others) have some items that should "stay current". Such as the version of the platform we use to "do the build" or "run the tests". These are specified very specifically, instead of just "getting the latest" in order to have builds that are more reliable and reproducible. But it does mean whenever a new build of the platform comes out, we should update the URLs, etc. from where we get the platform.

In the simrel.aggrcon file itself, there is a property, named 'buildRoot' (or 'Build Root' if using the property page GUI from Aggregation Editor) that should be update every major release to something like 'buildresultsPhoton' or similar. The "buildresult" part since the .gitignore file says to ignore '/buildresult*' files and the last part, such as release name 'Photon' so that there can be different areas for the current stream and the update stream.

The "web page" part of this task needs to be done every release. It can be seen at the top of the SimRel Jenkins instance there are some links in the description to the "repo reports" for the current last successful "clean builds" as well as an historical record of links for previous releases. The later needs to be added every release. The former only once a year or so just to correct the names and the exact URLs that are pointed to.

Aggregator and Analyzers

The other builds that need to be "kept up to date" are the CBI Aggregator builds, and p2repo Analyzer builds. See the "p2RepoRelated builds". These are mostly just a matter of updating the pre-reqs and targets so they stay current with the latest released version of the platform and whatever else in our prereqs changed with the latest Simultaneous Release.

The Bugzilla components for these items also need to be "watched" for major problems or contributions. While this is mostly a "CBI community activity" (i.e. not the Simultaneous Release engineer's responsibility) in practice the release engineer needs to pay attention since a carelessly made change could impact the Simultaneous Release.

Remove inactive projects

This activity is needed primarily after M4, which is the deadline for projects to declare if they plan to participate or not. But, it can come up at other times, if a project states that they have changed their mind and will not participate. The release engineer needs to be involved since, presumably, if a project is no longer interested in participating, there is no one particularly interested in making sure their contribution file is removed. The actual list of projects to remove is worked out by collaborating with the Planning Council (and, Wayne).

It is best to start with simply disabling the contribution, in the aggrcon file. This can be done by adding 'enabled="false"' to the '<contribution' element. Then "validation aggregation" to see if the removal of those projects breaks anyone else. If it does, coordinate via Bugzilla and the cross-project list on what what projects want to do about it.

Once you are ready to physically remove the files related to the contribution, it is best to "re-enable them" (in your workspace) and use the CBI Aggregator Editor to remove them. Ideally this will also remove any stray features that are in custom categories, but, it may not always, in which case it needs to be cleaned up "manually" in the simrel.aggr file. Once the contribution has been removed from the simrel.aggr file, the actual aggrcon file can be deleted.

Remove inactive committers

Roughly once per year, roughly after the first "update release" in September, inactive committers should be removed from the "callisto-dev" group. This is primarily a "server hygiene" sort of task. Technically there is no problem letting the group membership grow larger and larger, but as always, best that people only have permission where they really need it.

The principle if deciding if a committer is inactive is if they have not committed anything since the previous major release (so, roughly have not committed anything in a year and 3 months).

There are some scripts in 'org.eclipse.simrel.tools' under 'reportUtilities' that can help with the git queries to determine who has been active and who has not. The hard part is that occasionally someone does "commits" with different email Ids. That is why we have the ".mailmap" file in the 'org.eclipse.simrel.build' project, so that over-all we can keep a record of "who is who" as "many to one" mappings are found.

Also, it is important that a bug be opened and the "*proposed* list of committers to remove" be posted there to give people a chance to say if our scripts (or .mailmap) is wrong, or if even though they have been inactive, they still need write access. Once the dust settles from that bug (give it 2 to 4 weeks) and a firm list of removals is known, the actual list of committer ids to remove can be given to the webmaster for removal from the Linux group.

How to do a re-spin

Overview

It is not done often, but usually at least one per year is needed. A "respin" means to redo an significant repository (typically milestone or an actual release) at some point significantly after its initial deadline has passed, when more care and control is desired over the input and output. A common reason for needing one is that it might be discovered during "quiet week" that there is a serious bug in one component that has "cross project" implications (such as, functional issues might prevent PHP and XML from both being ran in the same workspace). Another example is if a third-party bundle has been included that was later found to be "unacceptable" from a licensing point of view. Note: it is the Planning Council (not "release engineering") that decides if a respin is warranted but typically they would want the input of release engineering as well. Also note, if a respin is done near the end of quiet week, this usually implies an automatic delay of one-week for the general availability -- i.e. no need to do an all-nighter. :)

The method by which care and control is achieved is that the previous candidate repository provides the "input" for all of the projects (via aggrcon files) except for the one or two projects that are contributing to a respin. This is done since some of the URLs or contents at the URLs may have changed since the candidate release either intentionally or accidentally. In a perfect world, each project would maintain their repositories and their aggrcon files such that the candidate build could be reproduced exactly, but there are always a few projects that do not, so its easier to "force" the exact same build, by changing the input source, rather than trying to get everyone lined up to have the correct files and repositories to reproduce the previous candidate release.

Steps

  • create a branch of org.eclipse.simrel.build project from the commit hash (or tag) of the release for which we are doing a rebuild. Name it something obvious like "Neon.4_respin_branch". The important part is that all the feature ids and versions match exactly what was built before. (We will be re-doing the URLs). Note: the commit hash of every build is saved away in a file under "buildinfo" for each repository we create.
  • With that branch loaded in your work bench, run a utility in your workbench, which is in the org.eclipse.simrel.tools repository (master branch) in a directory named transformToOneRepo. In that directory is an XSL file named changeAllRepos.xsl and

and Ant file, which runs the XSL Transform, named changeAllRepos.xml.

  • Before running the utility, specify two parameters on "command line" of the ant job, so to speak: from the "external tools" configuration, under the JRE tab:
- newRepository: The first parameter changes each repo in each aggrcon file (by using the XSL file) to point to the specific, existing repository that we are rebuilding.
- Example: -DnewRepository=http://download.eclipse.org/releases/neon/201609281000/
- javax.xml.transform.TransformerFactory: The second parameter is for the precise XSL Transformer to use. This parameter may not always be required. It depends on the JRE you are using.
- Example: -D=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl

The above is all merely "preparation". After running the Ant file (which runs the XSL Transform) it is recommended to commit that change to the newly created branch so that the next commit will cleanly show "what is changing".

  • As the next step, you want to change the one or two aggrcon files for the projects that are contributing to the respin. Those one or two files will point to the new URL that has their fix. Usually both "versions" and the repository location have to be changed in the aggrcon file. The project(s) at least provide the data to use if not actually make the change themselves. Commit that change, and make sure only the desired differences exist.
  • Once all that is done, it is good to "validate" and "validate aggregation" using the CBI aggregator editor in the IDE to make sure the basics are correct.
  • Create a new Jenkins job that uses the newly created branch. Typically, a "copy of an existing" "BUILD__CLEAN" job is the only Jenkins job that is required. After the copy is made, edit its configuration to modify the branch that is checked out by Jenkins. Run that job manually, and let it trigger a "promoteToStaging" as usual.
  • Once that new "staging" repository has been created, a "p2Diff" should be ran comparing the staging repository with the previous candidate to confirm the only things changed were what was expected to change. (In reality, occasionally one or two other things might change, simply because p2 is not completely deterministic and has some heuristics to avoid "near infinite optimization". But if any doubt, ask the projects or cross-project list if anyone is concerned -- typically the unexpected changes are "good", such as the candidate repo may have two versions of a bundle, but the respin repository has only one version of that bundle).
  • After this new staging repository is confirmed accurate, then the previously described steps to "promote" and "make visible" would be followed, according to what ever schedule the Planning Council came up with for the respin. Typically, the EPP packages are re-created also, and typically at least some projects do some more functional testing (but there is firm rules or "signoff" process that applies to all cases).

Back to the top