Gyrex/Development Space/ZooKeeper Interaction
|Mailing List • Forums • IRC • mattermost|
|Open • Help Wanted • Bug Day|
|Browse Source • Project Set File|
This article provides insights into how Gyrex interacts with ZooKeeper.
Please read about Apache ZooKeeper. This article may use terminology which is specific to ZooKeeper and also goes right into the details. You should also be familiar with:
- Equinox Extension Registry
- Gyrex Roles
- Equinox Applications (aka. Eclipse or OSGi Applications)
ZooKeeper is used in Gyrex in order to implement the core functionality of clustering in Gyrex:
- Which node belongs to the cloud? (Membership)
- Which node offers which functionality in the cloud? (Coordination)
ZooKeeper also offers receipts which allow to implement additional functionality. Based on those receipts the following additional capabilities are also implemented in Gyrex:
- Cloud Preferences Scope which stores preferences in ZooKeeper
- Queue Service (similar to Amazon SQS) which allows to post and receive messages with a visibility timout
- Lock Service which allows to create distributed locks (based on ephemeral as well as persistent nodes)
In order to ease development, an embedded ZooKeeper is started automatically in development mode (which is the default operation mode).
The ZooKeeper gate is the central point in Gyrex which coordinates all communication to ZooKeeper. It is responsible for establishing and maintaining a ZooKeeper connection. Listeners can be registered with the gate in order to participate in the connection life cycle.
The gate itself is maintained by a gate manager. This is an Equinox IApplication which must be started in order to enable participation of the Gyrex instance in the cloud. The application is contributed as an extension to the Equinox Extension Registry. Additionally a Gyrex Role extension is defined which ensures that the gate manager application is started when the role is activated.
As long as the gate manager application is running it will automatically re-connect the gate when the connection to ZooKeeper is lost. Strictly speaking, the old gate will be disposed and a new one will be created. But all active listeners will be retained and notified properly.
The gate life-cycle consists of the following three states:
This indicates a successful connection to ZooKeeper. A session has been establish (or recovered) and the system can interact with ZooKeeper.
This indicates that the instance is not connected with ZooKeeper and has no valid session. No interaction with ZooKeeper is possible.
This indicates that the instance is in the process of reestablishing a connection to ZooKeeper and recovering a session. No interaction with ZooKeeper is possible.
If and only if the session was recovered successfully, the gate will go right into the UP state. In all other cases, the gate will go into the DOWN state. The timeout for RECOVERING is closely related to the ZooKeeper session timeout.
The Cloud State is responsible for capturing the state of the node in the cloud. It's life-cycle is bound to bundle start and stop of the cloud bundle. Once active it hooks with ZooKeeper gate and reacts on the connection events.
The following cloud states exist:
Changes to the cloud state of a node will be published as OSGi event to corresponding topics.
This state indicates that the node operates successfully in the cloud. When the state changed to ONLINE all configured cloud server roles will be activated (which might start additional bundles and/or Equinox applications).
In order for a node to become online all of the following conditions must be met:
- The ZooKeeper gate must have established a valid ZooKeeper connection (i.e., gate must be UP).
- The node must be approved.
This state indicates that the node does not operate in the cloud. When the state changed to OFFLINE all previously activated cloud server roles will be de-activated (which might stop some bundles and/or Equinox applications).
In order for a node to become offline any of the following conditions must be met:
- The ZooKeeper gate must be DOWN.
- The node must not be approved, i.e. pending.
This state indicates that the node was previously online but lost the connection to the cloud. When the state changed to INTERRUPTED some of the previously activated cloud server roles will be de-activated (which might stop some bundles and/or Equinox applications).
In order for a node to become interrupted all of the following conditions must be met:
- The ZooKeeper gate must be RECOVERING.
- The node must be be approved.
- The node state was ONLINE.
ZooKeeper Based Services
ZooKeeper Based Services is a common implementation base for ZooKeeper based functionality. It offers things like active-state management, re-tries and shutdown/close handling on disconnect. However, ZooKeeper Based Services are not intended to follow the ZooKeeper gate life-cycle. Typically, they are services created upon application request *if* the ZooKeeper gate is available and they close/shutdown automatically when they are no longer needed or the ZooKeeper connection is lost.
Examples of those services are: