Difference between revisions of "Hudson-ci/features/Restart Within Hudson"

From Eclipsepedia

Jump to: navigation, search
(Example)
m (Example)
Line 136: Line 136:
 
Notes:
 
Notes:
 
# This technique of having another process restart Hudson is more reliable than having a child process of Hudson (the <code>hudson-restart</code> script) kill Hudson and relaunch it.
 
# This technique of having another process restart Hudson is more reliable than having a child process of Hudson (the <code>hudson-restart</code> script) kill Hudson and relaunch it.
# Some servers allow you to restart a single application in the same JVM, it is risky to do so. Unless all instances and classes of the previously running Hudson are garbage-collected, several restarts will exhaust <code>permgen</code>. It is more reliable to relaunch the entire server. Therefore it is a good practice to run only one application per server, so other applications aren't affected.
+
# While some servers allow you to restart a single application in the same JVM, it is risky to do so. Unless all instances and classes of the previously running Hudson are garbage-collected, several restarts will exhaust <code>permgen</code>. It is more reliable to relaunch the entire server. Therefore it is a good practice to run only one application per server, so other applications aren't affected.
# If the <code>run-hudson</code> always restarts Hudson, how <i>do</i> you stop Hudson gracefully? As written, you must <code>kill -9</code> the process running <code>run-hudson</code> and then separately <code>kill</code> the process running the <code>start-hudson</code> script.
+
# If the <code>run-hudson</code> script always restarts Hudson, how <i>do</i> you stop Hudson gracefully? As written, you must first <code>kill -9</code> the process running the <code>run-hudson</code> script and then <code>kill</code> (no -9) the process running the <code>start-hudson</code> script.
  
 
==Restart Links or Buttons==
 
==Restart Links or Buttons==

Revision as of 20:18, 5 July 2013

Contents

Restart Within Hudson

Configuration changes that require Hudson restart to take effect should provide a Restart link or button.

Unfortunately, the CLI restart or soft-restart methods, which call Hudson.restart and Hudson.safeRestart, respectively, don't handle many of the necessary use cases by default. A Hudson restart link based on these methods would too often fail. It is one thing to leave an opening for plugins to implement, but quite another to depend on the kindness of plugins for basic Hudson operations. The default Lifecycle implementation needs to be in some sense universal, even if it requires the cooperation of system administrators.

Requirements

In the following, "correct restart" means restart specifically tailored to the environment in which the Hudson instance is running. "System administrator" is a person who provisions and deploys the Hudson instance. "Hudson admin" is a person with admin privileges in Hudson. Sometimes these are the same people; sometimes not.

  • The current Lifecycle API must be preserved, for compatibility with existing plugin extensions. The new default mechanism will only be invoked if the hudson.lifecycle system property is not specified.
  • It must be possible for a system administrator to preconfigure Hudson for correct restart. This is particularly important for "no-admin" uses of Hudson.
  • Restart must require Hudson admin privileges (ACL.SYSTEM).

Design Approach

Two designs are being actively pursued.

The first, Restart Command, allows a system administrator to specify a command or provide a script during Hudson startup. This is most suited to Hudson running in containers like Tomcat, etc. which provide easy application restart from the command line. The second, Soft Restart, is more general and more likely to cause problems; it involves re-invoking a subset of the startup sequence to create new instances of Hudson classes, plugins, etc. rooted in a different classloader.

Restart Command

Correct restart is a multi-dimensional problem, different for each OS, service implementation and container (Tomcat, Jetty, GlassFish, etc.). The Lifecycle extension point is not well suited for multi-dimensional invocation, e.g., the OS is X AND the container is Y and (the container does not support single application restart OR the application name/war file location in the container is Z). While certain tricks, like replacing the Hudson WAR file, work to restart the application in many containers, to cover all the possibilites, Lifecycle would need an extension for every possible combination.

Yet, every system administrator already knows or can easily discover a command or script to restart any running Hudson instance. The best and most likely to be correct restart mechanism would leverage those commands/scripts and not try to replace it with Java code. The feature described below provides a generic Lifecycle that does.

Two new ways are provided for the system administrator to configure correct restart:

  • by providing a hudson-restart file in the $HUDSON_HOME provisioned for the Hudson instance and
  • by specifying a restart command on the command line when Hudson is invoked.

Command

In a nutshell, restart within Hudson will invoke a system administrator--supplied command.

Since the command is presumed to restart Hudson, it may not return at all. If it does return, it is expected to return successfully. If the restart command fails, restart will fail.

If the restart command is not specified, the existing default Lifecycle mechanism will be invoked.

If the hudson.lifecycle system property is specified, the restart command, if any, will not be used.

Default Restart Script

To allow system administrators to pre-configure Hudson for correct restart, if a restart script is present in HUDSON_HOME the initial value of the restart command will be set to invoke the script. The script must be named hudson_restart[.extension].

The script or program may have an extension, e.g., .bat on Windows or .sh, .bash, .py, etc. on Unix, but it doesn't matter what it is; in all cases, it will be invoked as a program.

The script must be executable.

The script should be specific to the environment in which the HUDSON_HOME is used.

Restart System Property

Even if a restart script is present in HUDSON_HOME, a system administrator may change the restart command to do something else, effectively ignoring the script. This may be overridden by the command line option:

-Dhudson.restart=value

The value of the hudson.restart system property will be used to initialize the restart command. The correct order of initialization is:

  • If -Dhudson.restart is specified, use that.
  • Otherwise, if a restart script is present, use a command that invokes the script.
  • Otherwise, the restart command is not specified.

Implementation

A new lifecycle, hudson.lifecycle.RestartCommandLifecycle, will be added to Hudson.

hudson.lifecycle.Lifecycle will be modified to use the RestartCommandLifecycle for restart if:

  • a restart script or command has been specified, and
  • the hudson.lifecycle system property is not defined.

The implementation will log a warning message if both xx and a restart script or command have been provided.

Soft Restart

Another way of looking at the restart problem is to observe that most of it deals with the container, service or command that launched Hudson. These issues would be sidestepped if Hudson could be restarted in the JVM it is currently running in.

A major advantage of such an approach would be that the fact that Hudson was restarted would not be detectible from the outside. The PID, servlet connection, etc. would be unchanged. This would life simpler for certain kinds of high availability environments.

The major disadvantage, of course, is it might cause massive memory leaks and thrown exceptions.

  • For starters, if permgen cannot be successfully garbage collected, the JVM will likely fail after restart with a "java.lang.OutOfMemoryError: PermGen space" error. Or the influx of newly loaded classes and instances might force an OutOfMemoryError in the heap.
  • The strategy depends on plugins:
    1. Not being able to find a way to lock themselves and their storage in memory,
    2. Obediently stopping whatever operations they have in progress and freeing resources like files and sockets.

This type of restart should not be attempted except during safe-restart.

Implementation

The initial implementation will modify org.eclipse.hudson.init.InitialSetup.invokeHudson() to create an outer class loader used to create all Hudson classes and instances not including those used by the initialization sequence up to that point. It will then use this class loader to create and invoke the thread that initializes the Hudson instance.

A soft-restart-plugin will be developed, with class org.hudsonci.plugins.SoftRestartLifecycle extends hudson.lifecycle.Lifecycle. The plugin will be installed like any other, and used only if the hudson.lifecycle system property is defined to name it prior to Hudson startup. Thus, the availability of soft restart will be entirely under the control of the system administrator. E.g.,

$ export hudson.lifecycle=org.hudsonci.plugins.SoftRestartLifecycle
$ <launch Hudson>

Example

This simple example shows how one might use the restart command to reliably restart Hudson no matter how it is started, with the added virtue that if Hudson exits for any reason, it will automatically restart. It involves four scripts, start-hudson, run-hudson, stop-hudson and hudson-restart, shown below in bash pseudo-code.

File ~/start-hudson.bash

#!/bin/bash
<command to run hudson.war or a server that runs hudson.war>
# this script always fails!
exit 1

File ~/run-hudson.bash

#!/bin/bash
until ~/start-hudson.bash; do
  echo "Restarting hudson"
  sleep 1
done

File ~/stop-hudson.bash

#!/bin/bash
<command to kill running hudson process or stop server running hudson>

Since the run-hudson script always restarts Hudson if it exits, simply causing Hudson to exit will restart it. So add the following script to the HUDSON_HOME that Hudson will use when started by the start-hudson script.

File $HUDSON_HOME/hudson-restart.bash

#!/bin/bash
~/stop-hudson.bash

Notes:

  1. This technique of having another process restart Hudson is more reliable than having a child process of Hudson (the hudson-restart script) kill Hudson and relaunch it.
  2. While some servers allow you to restart a single application in the same JVM, it is risky to do so. Unless all instances and classes of the previously running Hudson are garbage-collected, several restarts will exhaust permgen. It is more reliable to relaunch the entire server. Therefore it is a good practice to run only one application per server, so other applications aren't affected.
  3. If the run-hudson script always restarts Hudson, how do you stop Hudson gracefully? As written, you must first kill -9 the process running the run-hudson script and then kill (no -9) the process running the start-hudson script.

Restart Links or Buttons

Restart links should not be shown unless Lifecycle.get().canRestart() returns true. Otherwise, a message like "Restart required for changes to take effect" should be displayed.

A restart link should show the single word Restart and should call Hudson.softRestart.

Plugin Manager

The Plugin Manager will show a restart link if a) a plugin is updated or a plugin is loaded that requires restart, and b) if Lifecycle.get().canRestart() returns true.