Stardust/Knowledge Base/Infrastructure System Administration Maintenance/Daemons/DaemonWatchdog
The Stardust daemons can be started from within the Administration Portal, via API or by using the console command tool. Every daemon runs as separate thread that periodically checks if any actions need to be performed. There might be situations in which these daemon threads have stopped running due to a server crash or other critical issues. In these cases the daemons need to be restarted manually. A cron job that checks the health status of the daemons via the console command and restarts daemons that are not running anymore helps of course to automate such an operative task, but it requires the usage of external tools. The Daemon Watchdog is a more application integrated approach to check the daemon's health status and restart daemons in case of abnormal crashes.
The Watchdog consists of a Timer Bean and a bit of Spring configuration. It can be deployed with a Spring-managed Stardust runtime transparently. The Watchdog provides two configuration parameters:
Parameter delay defines the time in ms the Watchdog waits before it starts to do the first health check.
Parameter period defines the time in ms between two health checks.
Keep a delay parameter value of two minutes to ensure that the entire Spring application context is bootstrapped prior to the Watchdog's first execution.
<bean name="stardustDaemonWatchDog" class="com.infinity.bpm.clustering.DaemonWatchDog"> <property name="forkingService"> <bean parent="carnotForkingService" /> </property> </bean> <bean id="stardustDaemonWatchDogScheduler" class="org.springframework.scheduling.timer.TimerFactoryBean"> <property name="scheduledTimerTasks"> <list> <bean class="org.springframework.scheduling.timer.ScheduledTimerTask"> <!-- wait 2 minutes before starting repeated execution --> <property name="delay" value="120000" /> <!-- run every minute --> <property name="period" value="60000" /> <property name="timerTask" ref="stardustDaemonWatchDog" /> </bean> </list> </property> </bean>
The Watchdog checks for every daemon (event daemon, timer trigger, mail trigger) in every partition for the current status. If a daemon is supposed to run and the last execution time dated back longer than the specified periodicity (in seconds) for that particular daemon specified within the carnot.properties file (see example below), it will be restarted by the Watchdog.
timer.trigger.Periodicity = 20
mail.trigger.Periodicity = 30
Within a clustered environment, it is recommended to start the daemons only at one node in the cluster to avoid unnecesary lock wait overhead in the database. The Daemon Watchdog, however, can be deployed with Stardust on every node of a cluster. The first Watchdog that detects a daemon outage will initiate a restart. If the Stardust Daemon Queue is distributed and shared across all nodes within a cluster, it is guaranteed that the daemons are restarted on a node that is still alive.
The sources of the Daemon Watchdog are available from here and can be checked out as project into your Eclipse IDE. You will need to ensure that your Maven installation is configured to be able to access the Stardust and thirdy party repositories. To include the Watchdog just define a Java EE module dependency for your Stardust Dynamic web project and make sure the configContextLocation parameter in the web.xml does consider classpath*:META-INF/config/spring/*-context.xml.
<context-param> <param-name>contextConfigLocation</param-name> <param-value>WEB-INF/config/ipp/spring/*-context.xml, classpath*:META-INF/config/spring/*-context.xml </param-value> </context-param>