Difference between revisions of "PTP/designs/3.x/rm proxy"

Revision as of 17:57, 15 June 2007

Overview

This is a preliminary design for the PTP Resource Management proxy communication protocol. This protocol is used to communicate between the Resource Manager System in Eclipse, and a lightweight proxy agent running on a target system. The primary purpose of the protocol is for system monitoring, process launch, and process control activities.

Terminiology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]

Resource Manager System

The Resorce Manager System (RMS) is an Eclipse plugin that manages interaction with arbitrary resource managers. A resource manager in this context, is anything that provides program launch and monitoring services on a target system. Typically, a resource manager will be a job scheduler (e.g. LSF, LoadLeveler, PBS, etc.) running on a large multi-user system. Other types of resource managers include the Open Runtime Environment (ORTE) which is part of the OpenMPI distribution, or the MPICH2 runtime system. The RMS is responsible for populating an internal model in Eclipse which provides a cached representation of the system and program state. Various user interface views are available to inspect and interact with this model. Details of the RMS are provided in a separate document PTP/designs/rms.

Resource Manager Proxy

The RMS communicates with proxy agents to gather information about the state of a target system. The proxy agent may be located on either a local or remote machine.

Proxy Agent Launch

The RMS is responsible for launching the proxy agent. On a local machine, this just involves executing a local process. To launch on a remote machine, the RMS must use an authenticated command service, such as ssh. Current plans are to utilize the Remote System Explorer (RSE) system to provide this remote proxy launch capability.

Proxy Session

One instance of a communcation channel between the RMS and a proxy agent is known as a session. A session only supports communication to a single proxy agent at a time. The mechanism used to effect communcation between the RMS and a proxy agent is not defined in this document, but can be any bi-directional communications channel (e.g. TCP/IP sockets, etc.)

Proxy Protocol

The communication protocol used between the RMS and the proxy agent is a simple text-based asynchronous command/event protocol. The RMS sends one or more commands to the proxy agent, which in turn will generate events that are returned to the RMS.

Some generic properties of the protocol include:

One command MAY generate multiple events.
Commands and events are matched using a transaction ID (tid). The tid in an event MUST match a corresponding command.
Completion of a command is indicated by either an ERROR or OK event with matching tid.
Tids need only be unique for uncompleted commands. Once a command is completed, it's tid can be reused.
Any events received with an invalid tid SHOULD be discarded.

Protocol Format

Commands and events consist of sequences of ASCII characters. Separate elements of a command or event are delimited by spaces (hex 20). Elements consist of numbers or strings.

Numbers (such as IDs, lengths, etc.) are always formatted as fixed length sequences of hexadecimal characters.

Strings are formatted as follows:

LENGTH : CHARACTERS

where LENGTH is the number of characters in the string (formatted as 8 hexadecimal characters), ':' is a colon character (hex 3A), and CHARACTERS are the actual ASCII characters in the string.

For example, the string "A String" would be formatted as:

00000008:A String

A zero length string would be formatted as:

00000000:

Protocol Phases

The proxy protocol is divided into a number of phases. A phase determines the legal commands that can be sent to the proxy agent. During a particular phase, illegal commands SHOULD be discarded. Note: this may be changed to SHOULD generate an ERROR event.

Phases follow a strict ordering. Transition from one phase to the next occur when an OK event is received in response to a phase initiation command. A phase initiation command is a command that must be sent to initiate a particular phase. Once a phase has been initiated, any legal commands for that phase may be sent. The phase ordering is defined as follows:

INITIALIZE -> MODEL_DEF -> {START_EVENTS -> STOP_EVENTS}

The phases are defined in more detail in the following sections.

INITIALIZE

This is the first phase, and is used to initiate a communication session between the RMS and proxy agent, and agree on any protocol parameters that apply to this session.

Phase initiation command:

INIT

Legal commands:

none

DISCOVERY

The discovery phase is used to allow the proxy agent to inform the RMS of any dynamic property information. This information currently consists of attribute definitions and filter definitions which are described in more detail below.

Phase initiation command:

MODEL_DEF

Legal commands:

none

NORMAL

The normal phase is entered once the initialize and discovery phases are completed. This is the normal command/event processing phase.

Phase initiation command:

START_EVENTS

Legal commands:

SUBMIT_JOB
TERMINATE_JOB
STOP_EVENTS
QUIT

SUSPENDED

The suspended phase is used when the RMS needs to prevent the proxy agent from sending additional events.

Phase initiation command:

STOP_EVENTS

Legal commands:

START_EVENTS
QUIT

Phase Example

The following provides a simple example of phase transitions. Commands go from left to right, events from right to left. The command tid is shown in ()'s after the command or event name.

-- intialize phase --

INIT(1)         ->
                <- OK(1)

-- definition phase --

MODEL_DEF(2)    ->
                <- ATTR_DEF(2)
                <- ATTR_DEF(2)
                <- OK(2)

-- normal phase --

START_EVENTS(3) ->
                <- NEW_MACHINE(3)
                <- NEW_NODE(3)
STOP_EVENTS(4)  ->
                <- OK(3)

-- suspended phase --

START_EVENTS(5) ->
                <- OK(4)

-- normal phase --

                <- NEW_QUEUE(5)
QUIT(6)         ->
                <- OK(5)
                <- OK(6)

Note that the first suspended phase is not entered until the OK event corresponding to the START_EVENTS command (tid 3) is received. Similarly, the second normal phase is not entered until after the OK event corresponding to the STOP_EVENTS command (tid 4) is received.

Commands

Commands are formatted as simple ASCII text strings. A proxy command consists of a header and a body, separated by a space (hex 20), as follows:

COMMAND_HEADER COMMAND_BODY

The command header consists of three fixed length strings separated by colons (hex 3A), so it is itself fixed length. The format of the header is:

COMMAND_ID : TRANSACTION_ID : NUMBER_OF_ARGUMENTS

where

COMMAND_ID is a number representing the command to be performed

TRANSACTION_ID is the transaction ID assigned to this command

NUMBER_OF_ARGUMENTS are the number of space separated elements in the command body

The following sections describe the currently defined commands.

QUIT

ID: 0

Arguments: none

Description: Terminate the proxy agent. This command will cause the proxy agent to terminate as soon as possible.

Events: OK

INIT [1]

MODEL_DEF [2]

START_EVENTS [3]

STOP_EVENTS [4]

SUBMIT_JOB [5]

TERMINATE_JOB [6]

MOVE_JOB [7]

Not yet implemented. This command is intended to allow jobs to be moved between queues.

CHANGE_JOB [8]

Not yet implementd. This command is intended to allow a job's status to be changed (e.g. place a hold on a job)

Wire protocol for a command: Commands are currently sent as transformed (tohex) ascii strings.
They command message contains: {MESSAGE_LENGTH (8 byte hex integer), TRANSACTION_ID (8 byte integer), COMMAND (4 byte hex integer), COMMAND_ARGS}, where, MESSAGE_LENGTH is length of the message in bytes (excluding the MESSAGE_LENGTH item) and COMMAND_ARGS is a list of command arguments separated by spaces.

Initialize (CMD_INIT): Command to initialize the proxy. It has one argument, the wire protocol version number. After this command has been received, the proxy is ready to receive and process other commands from the RM. Initialization data may be passed on the command line when the proxy is run. The proxy asynchronously returns an OK/FAIL event completion of the command.

Model definition (CMD_MODEL_DEF): Command to start the model definition sequence. The proxy responds by a series of ATTR_DEF, LAUNCH_DEF, and ELEMENT_DEF events. Attributes (see below) are meta-data describing data from the proxy that the RM is expected to receive and possibly display in the UI. The sequence is terminated by an OK/FAIL event.

Start events (CMD_START_EVENTS): Command to start sending events back to the RM. Initially the proxy sends back the full machine state, but sends only state changes as diffs thereafter.

Stop events (CMD_STOP_EVENTS): Command to halt the event stream to the RM. The proxy responds by stoping the event stream, sending an OK event for the start event transaction, and finally sending an OK event for the stop event transaction.

List event filters (CMD_LIST_FILTERS): Command to list the set of filters used to limit the event stream. The proxy responds by returning a filter list event and then an OK/FAIL event.

Set event filters (CMD_SET_FILTERS): Command to set a set of filters used to limit the event stream. The proxy resonds with an OK/FAIL event.

Finish (CMD_QUIT): Command for the proxy to cleanup and exit. Proxy responds with an OK/FAIL event.

Submit job (CMD_SUBMIT_JOB): Command to submit a job.

Others: Kill job, ...

Suggestions for resource manager commands (from Dave Wootton): Additional commands that might be useful include commands to query node availability, query resource pools, query job classes, query job queues, query what's running on the machine, what's running on nodes, change job priority, cancel pending jobs (vs killing a running job), disconnect from and reconnect to proxy. If the proxy is expected to inform the resource manager of the complete state of the machine, queues, etc, at startup, some of these queries might not be required. NOTE that some of these will be including in the event stream.

Proxy Events

The proxy responds to commands from the RM by sending events back to the RM.

Protocol for events: 1. The proxy completes (or at least initiates) the command.
2. The proxy sends an event containing the transaction id back to the RM. Any data are returned with the event after the transaction id.

Wire protocol for an event: Events are currently sent as transformed (tohex) ascii strings.
They contain: LENGTH(8 hex digits) TRANSACTION_ID(8 hex digits) CODE(4 hex digits) " " DATA.

FAILURE_EVENT: Sent if a failure occurs while processing a command.

ATTR_DEF_EVENT: A list of attributes that the proxy may send as data.

LAUNCH_CONFIG_EVENT: A list of attributes needed to launch a job, e.g., every queue needs to send a list of attribute ids needed to submitted a job

JOB_SUBMISSION_EVENT: Notification that a job has been submitted for execution. The job id will be returned as part of the event.

Others: Machine change, host change, queue change, job status change, ...

Attributes

Attributes are used so that data sent to the RM is self describing. Attributes are meta-data describing actual data. Attribute ids must be unique and are generated by the proxy. The attribute name must persist across instances of the proxy. An attribute has a:

   ATTR_ID:
   ATTR_NAME:
   ATTR_TYPE:
   ATTR_SNAME:
   ATTR_LNAME:
   ATTR_MIN_VALUE:
   ATTR_MAX_VALUE:
   ATTR_DEF_VALUE:
   ATTR_VALSl:

@@ Line 173: / Line 173: @@
 ''NUMBER_OF_ARGUMENTS'' are the number of space separated elements in the command body
-The following sections describe the currently defined commands. The command ID is shown in []'s.
+The following sections describe the currently defined commands.
-==== QUIT [0] ====
+==== QUIT ====
+; ID : 0
 ; Arguments : none

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "PTP/designs/3.x/rm proxy"

Revision as of 17:57, 15 June 2007

Contents

Overview

Terminiology

Resource Manager System

Resource Manager Proxy

Proxy Agent Launch

Proxy Session

Proxy Protocol

Protocol Format

Protocol Phases

INITIALIZE

DISCOVERY

NORMAL

SUSPENDED

Phase Example

Commands

QUIT

INIT [1]

MODEL_DEF [2]

START_EVENTS [3]

STOP_EVENTS [4]

SUBMIT_JOB [5]

TERMINATE_JOB [6]

MOVE_JOB [7]

CHANGE_JOB [8]

Proxy Events

Attributes

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "PTP/designs/3.x/rm proxy"

Revision as of 17:57, 15 June 2007

Contents

Overview

Terminiology

Resource Manager System

Resource Manager Proxy

Proxy Agent Launch

Proxy Session

Proxy Protocol

Protocol Format

Protocol Phases

INITIALIZE

DISCOVERY

NORMAL

SUSPENDED

Phase Example

Commands

QUIT

INIT [1]

MODEL_DEF [2]

START_EVENTS [3]

STOP_EVENTS [4]

SUBMIT_JOB [5]

TERMINATE_JOB [6]

MOVE_JOB [7]

CHANGE_JOB [8]

Proxy Events

Attributes