This document describes a new resource manager architecture in an effort to simplify the development of new resource managers. For a detailed description of the overall PTP architecture, refer to the PTP 2.x Design Document.
In the existing architecture, a resource manager comprises two parts: a Java component that implements the client side of the Resource Manager Proxy Protocol and an external component (usually C or Python) that implements the server side of the protocol, and interacts with the runtime system on the target machine. In the new architecture, the resource manager comprises only a single Java component that interacts with the runtime system via the command-line interface.
The resource manager is a Java component that interacts with a (potentially remote) runtime system via the command-line interface. Commands are issued by the resource manager to perform activities, and the resource manager attempts to interpret the command results in a meaningful manner.
When a resource manager is first started, it attempts to discover information about the target system by issuing a discover command. This command will attempt to discover information such as:
- The number of racks (machines) and nodes
- Attributes about the hardware
- Job queues and resource requirements
- User definable parameters
The discover command is expected to only be run once when the resource manager is first started.
The output format for the discover command is currently system dependent, but an XML-based format is under development and is expected to be used for new resource managers.
Once discovery has completed, the resource manager will being monitoring the status of the system and jobs using a monitor command. Two types of monitor commands will be supported:
- Periodic monitoring, where the command is issued on a regular basis
- Continuous monitoring, where the command is issued once and continues to run for the life of the resource manager session
The monitor command will provide model update information, such as:
- Status changes to machines and nodes
- Status changes to queues, jobs and processes
- Other attribute changes to model elements
- New model elements (such as new nodes coming on line)
The output format for the monitor command is currently system dependent, but an XML-based format is under development and is expected to be used for new resource managers.