PTP/designs/rm view

From Eclipsepedia

< PTP‎ | designs
Revision as of 17:26, 27 October 2006 by Rsqrd.lanl.gov (Talk | contribs)

Jump to: navigation, search

Contents

Overview

This is a preliminary design for the PTP Resource Management system. This is the design of the first phase product, which is limited in scope to viewing the state of the resource manager. This includes the machines, jobs, queues, and nodes that are under the resource manager's control. The classes currently in PTP will implement these new interfaces in addition to their previously implemented interfaces.

File:Ptp resource manager view.pdf

Requirements

The Resource Management system is the Eclipse/PTP interface into a host's resource manager. The Resource Manager (RM) is responsible for determining the layout of the system represented by the host's resource manager (HRM). This includes the determination of what machines, and nodes constitute the physical layout of the system, and their status. The RM is also responsible in determining the dynamic structure of the HRM, i.e. what Queues are available for the resource manager, and what Jobs are queued and running within those Queues. This dynamic structure will also comprise Node allocation and Process information for each Job. Examples of HRMs are Torque, LSF, ORTE, or SLURM. Each of these will have corresponding Eclipse/PTP RMs.

The Resource Management system encompasses not only the interface into a host's resource manager, but also specification of the environment with which to build and launch parallel programs. This environment may include such aspects of building and running parallel jobs, as setting compilers, and paths, e.g. LD_LIBRARY_PATH and include paths. This environment may be effected via module files.

The RM will consist of two parts. The first part will reside within the local Eclipse/PTP session. It will contain the interface presented to the user and maintain the local structures necessary to represent the physical and dynamic structure of the HRM. The second part will reside on the host, and be more intimately related to the HRM. This second portion will form a (usually remote) proxy to the HRM. The local RM will forward requests to the HRM and receive asynchronous responses back from the HRM.

In its role of specifying parallel launch environments, the RM may be sensitive to changes in the HRM's version. In order to shield the Eclipse/PTP system from changes in HRM versions, the local interface for the RM's portion of the launch configuration should consist of a set of typed attributes, to be filled by the user, determined by querying the proxy to the HRM. These typed attributes may include memory or time resource allocation limits, or anything else particular to the selected RM's HRM.

Package rm.core

For interfaces and abstract classes, the responsibilities and collaborations refer to concrete objects that are implementations of the interface or abstract class.

Interface: IRMResourceManager

Responsibilities
Proxy used to connect to the ResourceManagerHost's actual resource manager (ARM).
Retrieve list of machines, nodes, jobs, process, and queues from ARM.
Notify registered objects that the lists have changed, either in composition, or in their element's attributes due to changes propagated from the ARM
Collaborations
IRMResourceManagerHost
IRMResourceManagerListener
RMNodesChangedEvent
RMJobsChangedEvent
RMQueuesChangedEvent
RMMachinesChangedEvent
RMStructureChangedEvent
IRMMachine, IRMNode, IRMJob, IRMQueue

Abstract Class: ResourceManagerFactory

Responsibilities
Subclasses of the class are to create and load instances of IRMResourceManager
dispose of any resources acquired by factory objects
Collaborations
IRMResouceManager subclasses


Class: ResourceManagerHost

Responsibilities
Determine which remote (or local) host's resource manager to proxy
Determine which resource manager on the host to proxy
Provide hosts's status
Collaborations
RMStatus

Interface: IRMMachine

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated machine
Set and provide specific attributes for a given attribute description
List all nodes associated with ARM's machine
Provide machine's status
Collaborations
IAttribute
IAttrDesc
RMStatus

Interface: IRMQueue

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated queue
Set and provide specific attributes for a given attribute description
List all nodes that may have jobs dispatched from this queue
Provide queue's status
Collaborations
IAttribute
IAttrDesc
RMStatus

Interface: IRMNode

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated node
Set and provide specific attributes for a given attribute description
List all jobs associated with ARM's node
List all queues that can run jobs on this node
Provide node's status
Collaborations
IAttribute
IAttrDesc
RMStatus

Interface: IRMJob

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated job
Set and provide specific attributes for a given attribute description
List all processes associated with ARM's job
Provide job's status
Collaborations
IAttribute
IAttrDesc
RMJobStatus

Interface: IRMProcess

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated process
Set and provide specific attributes for a given attribute description
Provide node on which the process runs
Collaborations
IAttribute
IAttrDesc

Enumeration: RMStatus

Responsibilities
Provide consistent labeling of element status
OK element is up and able to accept jobs, etc.
DOWN element is down, reason will have to be provided in other attributes
UNAVAILABLE element is unable to accept jobs, etc., reason will have to be provied in other attributes
ALLOCATED_OTHER element is up but unable to accept jobs due to allocations by other users
UNKNOWN the status is unknown
Collaborations
ResourceManagerHost, IRMMachine, IRMNode, IRMQueue

Enumeration: RMJobStatus

Responsibilities
Provide consistent labeling of job status
PENDING job is pending in queue
RUNNING job is running normally
SUSPENDED job is suspended, reason will have to be provided in other attributes
DONE job has completed normally
EXIT job has completed abnormally, reason will have to be provide in other attributes
UNKNOWN job status is unknown
Collaborations
IRMJob

Package rm.events

Interface: IRMResourceManagerListener

Responsibilities
Registration site for Observer pattern to allow objects to be notified of changes in the IRMResourceManager's state
Collaborations
IRMResourceManager
RMStructureChangedEvent, RMMachinesChangedEvent, RMNodesChangedEvent, RMJobsChangedEvent, RMQueuesChangedEvent

Abstract Class: ResourceManagerEvent

Responsibilities
Determine type of changed in the IRMResourceManager's state
The type can be ADDED, MODIFIED, or REMOVED
Collaborations
IRMResourceManager
RMStructureChangedEvent, RMNodesChangedEvent, RMJobsChangedEvent, RMQueuesChangedEvent, RMMachinesChangedEvent

Class: RMStructureChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has had major structure changes (table columns may need to be recreated)
Collaborations
none

Class: RMNodesChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed nodes
Collaborations
IRMNode

Class: RMJobsChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed jobs
Collaborations
IRMJob

Class: RMMachinesChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed machines
Collaborations
IRMMachine

Class: RMQueuesChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed queues
Collaborations
IRMQueue

Package rm.attributes

Interface: IAttribute

Responsibilities
Maintain the relationship between an attribute's value and its description
Specifies a strict-weak ordering of itself and other attributes
Provide a string representation of the attribute
Collaborations
IAttrDesc

Interface: IAttrDesc

Responsibilities
Provide a string description of the attribute
Provide a name of the attribute
Know the actual type of the attribute
Create new attributes of the correct type
Collaborations
IAttribute