This is a preliminary design for the PTP Resource Management proxy communication protocol. This protocol is used to communicate between the Resource Manager System in Eclipse, and a lightweight proxy agent running on a target system. The primary purpose of the protocol is for system monitoring, process launch, and process control activities.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]
Resource Manager System
The Resource Manager System (RMS) is an Eclipse plugin that manages interaction with arbitrary resource managers. A resource manager in this context, is anything that provides program launch and monitoring services on a target system. Typically, a resource manager will be a job scheduler (e.g. LSF, LoadLeveler, PBS, etc.) running on a large multi-user system. Other types of resource managers include the Open Runtime Environment (ORTE) which is part of the OpenMPI distribution, or the MPICH2 runtime system. The RMS is responsible for populating an internal model in Eclipse which provides a cached representation of the system and program state. Various user interface views are available to inspect and interact with this model. Details of the RMS are provided in a separate document PTP/designs/rms.
The RMS communicates with proxy agents to gather information about the state of a target system. The proxy agent may be located on either a local or remote machine.
Proxy Agent Launch
The RMS is responsible for launching the proxy agent. On a local machine, this just involves executing a local process. To launch on a remote machine, the RMS must use an authenticated command service, such as ssh. Current plans are to utilize the Remote System Explorer (RSE) system to provide this remote proxy launch capability.
One instance of a communcation channel between the RMS and a proxy agent is known as a session. A session only supports communication to a single proxy agent at a time. The mechanism used to effect communcation between the RMS and a proxy agent is not defined in this document, but can be any bi-directional communications channel (e.g. TCP/IP sockets, etc.)
The communication protocol used between the RMS and the proxy agent is a simple text-based asynchronous command/event protocol. The RMS sends one or more commands to the proxy agent, which in turn will generate events that are returned to the RMS. One command may generate multiple events, but an event is always associated with a particular command. A command is not completed until either a corresponding ERROR or OK event is received.
Transaction IDs (TIDs) are numbers that are used to match commands and events. Since one command may generate multiple events, TIDs are essential in order to determine which command generated an event. This means that every event MUST have a TID that matches a corresponding command.
TIDs are only unique for uncompleted commands, not necessarily for the whole session.
Proxy agents SHOULD assume that a particular TID MAY be reused.
Proxy agents SHOULD NOT assume anything about the numbering or ordering of TIDs.
Any events received with an invalid TID (i.e. with no corresponding command) SHALL be discarded.
Element IDs are key to communcation between the RMS and the proxy agent. Every element in the model (queue, node, job, etc.) has a unique ID. This ID is generated by the proxy agent. In order to ensure the ID is unique, the RMS supplies a base ID as part of the initialization sequence. This base ID is guaranteed to be unique, and is used by the proxy agent to "uniquify" its generated IDs. The base ID is itself an element ID, so cannot be used by the proxy. It is possible to re-use an element ID, provided that the model element has been removed from the model. However this is not recommended.
Proxy agents have two choices for the types of ID's it generates: numeric or string. The RMS doesn't care which is used. There are advantages and disadvantages for each type. Numeric and string IDs can be mixed.
Numeric Element IDs
Numeric IDs are typically used when the proxy agent is dealing with large numbers of objects (e.g. processes, nodes, etc.) They are more efficient to transmit, because the compact Range Set Notation can be used. When using numeric IDs proxy agent should try and generate "blocks" of IDs to make the range sets more efficient. A proxy agent might, for example, reserve a range of IDs for nodes, a different range for processes, etc.
Numeric IDs are computed using the base ID in such a way that the ID value will be unique. For example, one such computation would be to add the base ID to a monotonically increasing series of integers starting at 1.
String Element IDs
String IDs are an easy way to generate IDs, but are less efficient because they prevent using the Range Set Notation. These IDs are generated by creating a unique string and appending it to the base ID. The new ID is then guaranteed to be unique across the mode.
Range Set Notation
This is a compact representation of sequences of numeric (unsigned integer) values. Ranges of consecutive values can be represented using the first and last values in the range, separated by a dash character (hex 2D). Ranges can then be grouped be separating them with commas (hex 2C). Spaces or other characters are not permitted in a range set.
For example, the range:
would be represented as:
Ranges in the set need not be in sorted order, and can be overlapping.
The range set:
represents the 12 consecutive numbers starting at 1.
Commands and events consist of sequences of ASCII characters formatted into a message. A message is transmitted in the following format:
MESSAGE ⇒ LENGTH " " COMMAND_OR_EVENT
LENGTH and COMMAND_OR_EVENT are separated by a space (hex 20). The LENGTH is the length of the COMMAND_OR_EVENT portion of the message including the space. COMMAND_OR_EVENT is the actual text of the command or event.
COMMAND_OR_EVENT ⇒ COMMAND | EVENT
The COMMAND_OR_EVENT portion of the message contains a header part followed by a sequence of arguments separated by spaces. Each argument is a string formatted using the String Format described below. The command format is described in more detail in the Commands section. The event format is described in more detail in the Events section.
Strings are transmitted using the following format:
STRING ⇒ LENGTH ":" CHARACTERS
where LENGTH is a fixed length 8 digit hexadecimal representation of the length of the string, ':' is a colon character (hex 3A), and CHARACTERS are the actual ASCII characters in the string. No string terminating character (e.g. NULL) is ever transmitted.
For example, the string "A String" would be formatted as:
A zero length string would be formatted as: