Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "PTP/designs/rm view"

< PTP‎ | designs
m
(Requirements)
 
(31 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
[[Image: ptp_resource_manager_view.pdf|UML diagram]]
 
[[Image: ptp_resource_manager_view.pdf|UML diagram]]
  
== Package resourcemanager ==
+
== Requirements ==
 +
 
 +
The Resource Management system is the Eclipse/PTP interface into a host's
 +
resource manager.
 +
The Resource Manager (RM) is responsible for determining the layout of the
 +
system represented by the host's resource manager (HRM).
 +
This includes the determination of what machines, and nodes constitute the
 +
physical layout of the system, and their status.
 +
The RM is also responsible in determining the dynamic structure of the HRM,
 +
i.e. what Queues are available for
 +
the resource manager, and what Jobs are queued and running within those Queues.
 +
This dynamic structure will also comprise Node allocation
 +
and Process information for each Job.
 +
Examples of HRMs are Torque, LSF, ORTE, or SLURM.
 +
Each of these will have corresponding Eclipse/PTP RMs.
 +
 
 +
The Resource Management system encompasses not only the interface into a
 +
host's resource manager, but also specification of the environment
 +
with which to build and launch parallel programs.  This environment may include
 +
such aspects of building and running parallel jobs, as setting compilers,
 +
and paths, e.g. LD_LIBRARY_PATH and include paths.
 +
This environment may be effected via module files.
 +
 
 +
The RM will consist of two parts.
 +
The first part will reside within the local Eclipse/PTP session.
 +
It will contain the interface presented to the user and maintain
 +
the local structures necessary to represent the physical and dynamic
 +
structure of the HRM.
 +
The second part will reside on the host, and be more intimately related
 +
to the HRM.
 +
This second portion will form a (usually remote) proxy to the HRM.
 +
The local RM will forward requests to the HRM and receive asynchronous
 +
responses back from the HRM.
 +
 
 +
In its role of specifying parallel launch environments, the RM may be sensitive
 +
to changes in the HRM's version.
 +
In order to shield the Eclipse/PTP system from changes in HRM versions,
 +
the local interface for the RM's portion of the launch configuration should
 +
consist of a set of typed attributes, to be filled by the user, determined
 +
by querying the proxy to the HRM.
 +
These typed attributes may include memory or time resource allocation limits,
 +
or anything else particular to the selected RM's HRM.
 +
 
 +
Other requirements:
 +
 
 +
# Terminated jobs persist
 +
# Support for disconnect/reconnect to proxy
 +
# No synchronization issues between model and proxy (i.e. they never get out of sync)
 +
# Ability to register listeners on model objects (e.g. to detect when a job exits)
 +
# If stdout capture is supported by the resource manager
 +
#* Ability to display stdout while connected to proxy and job is running
 +
#* Preserve stdout while disconnected
 +
#* Entire run stdout preserved on terminated jobs
 +
# Efficiently refer to objects in fixed sets
 +
#* For communcation between Eclipse and proxy
 +
#* e.g. nodes in a machine, procs in a job
 +
 
 +
== Package rm.core ==
  
 
For interfaces and abstract classes, the responsibilities and collaborations refer to concrete objects that are implementations of the interface or abstract class.
 
For interfaces and abstract classes, the responsibilities and collaborations refer to concrete objects that are implementations of the interface or abstract class.
Line 12: Line 69:
  
 
; Responsibilities:  Proxy used to connect to the ResourceManagerHost's actual resource manager (ARM).<br> Retrieve list of machines, nodes, jobs, process, and queues from ARM.<br>  Notify registered objects that the lists have changed, either in composition, or in their element's attributes due to changes propagated from the ARM
 
; Responsibilities:  Proxy used to connect to the ResourceManagerHost's actual resource manager (ARM).<br> Retrieve list of machines, nodes, jobs, process, and queues from ARM.<br>  Notify registered objects that the lists have changed, either in composition, or in their element's attributes due to changes propagated from the ARM
; Collaborations:    IRMResourceManagerHost<br> IRMResourceManagerListener<br> IRMResourceManagerEvent<br> IRMMachine, IRMNode, IRMJob, IRMQueue
+
; Collaborations:    IRMResourceManagerHost<br> IRMResourceManagerListener<br> RMNodesChangedEvent<br>RMJobsChangedEvent<br> RMQueuesChangedEvent<br> RMMachinesChangedEvent<br> RMStructureChangedEvent<br> IRMMachine, IRMNode, IRMJob, IRMQueue
  
=== Interface: IRMResourceManagerListener ===
+
=== Abstract Class: ResourceManagerFactory ===
  
; Responsibilities: Registration site for Observer pattern to allow objects to be notified of changes in the IRMResourceManager's status
+
; Responsibilities: Subclasses of the class are to create and load instances of IRMResourceManager
 +
: dispose of any resources acquired by factory objects
  
;Collaborations:   IRMResourceManager<br> IRMEvent
+
; Collaborations: IRMResouceManager subclasses
  
=== Interface: ResourceManagerEvent ===
 
  
; Responsibilities:  Determine type of changed in the IRMResourceManager's state
 
; Collaborations: none
 
  
 
=== Class: ResourceManagerHost ===
 
=== Class: ResourceManagerHost ===
  
;Responsibilities:  Determine which remote (or local) host's resource manager to proxy<br> Determine which resource manager on the host to proxy
+
;Responsibilities:  Determine which remote (or local) host's resource manager to proxy<br> Determine which resource manager on the host to proxy<br> Provide hosts's status
; Collaborations: none
+
; Collaborations: RMStatus
  
 
=== Interface: IRMMachine ===
 
=== Interface: IRMMachine ===
  
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated machine<br> Provide specific attributes for a given attribute description<br> List all nodes associated with ARM's machine
+
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated machine<br> Set and provide specific attributes for a given attribute description<br> List all nodes associated with ARM's machine<br> Provide machine's status
; Collaborations:    IAttribute<br> IAttrDesc
+
; Collaborations:    IAttribute<br> IAttrDesc<br> RMStatus
  
 
=== Interface: IRMQueue ===
 
=== Interface: IRMQueue ===
  
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated queue<br> Provide specific attributes for a given attribute description<br> List all nodes that may have jobs dispatched from this queue
+
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated queue<br> Set and provide specific attributes for a given attribute description<br> List all nodes that may have jobs dispatched from this queue<br> Provide queue's status
;  Collaborations:    IAttribute<br>  IAttrDesc
+
;  Collaborations:    IAttribute<br>  IAttrDesc<br> RMStatus
  
 
=== Interface: IRMNode ===
 
=== Interface: IRMNode ===
  
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated node<br> Provide specific attributes for a given attribute description<br> List all jobs associated with ARM's node<br> List all queues that can run jobs on this node
+
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated node<br> Set and provide specific attributes for a given attribute description<br> List all jobs associated with ARM's node<br> List all queues that can run jobs on this node<br> Provide node's status
; Collaborations:    IAttribute<br> IAttrDesc
+
; Collaborations:    IAttribute<br> IAttrDesc<br> RMStatus
  
 
=== Interface: IRMJob ===
 
=== Interface: IRMJob ===
  
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated job<br> Provide specific attributes for a given attribute description<br> List all processes associated with ARM's job
+
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated job<br> Set and provide specific attributes for a given attribute description<br> List all processes associated with ARM's job<br> Provide job's status
; Collaborations:    IAttribute<br>  IAttrDesc
+
; Collaborations:    IAttribute<br>  IAttrDesc<br> RMJobStatus
  
 
=== Interface:  IRMProcess ===
 
=== Interface:  IRMProcess ===
  
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated process<br> Provide specific attributes for a given attribute description
+
; Responsibilities:  Provide the status information, i.e. attributes, for the ARM's associated process<br> Set and provide specific attributes for a given attribute description<br> Provide node on which the process runs
 
; Collaborations:    IAttribute<br> IAttrDesc
 
; Collaborations:    IAttribute<br> IAttrDesc
 +
 +
=== Enumeration: RMStatus ===
 +
 +
; Responsibilities:  Provide consistent labeling of element status<br> OK element is up and able to accept jobs, etc.<br> DOWN element is down, reason will have to be provided in other attributes<br> UNAVAILABLE element is unable to accept jobs, etc., reason will have to be provied in other attributes<br> ALLOCATED_OTHER element is up but unable to accept jobs due to allocations by other users<br> UNKNOWN the status is unknown
 +
; Collaborations:    ResourceManagerHost, IRMMachine, IRMNode, IRMQueue
 +
 +
=== Enumeration: RMJobStatus ===
 +
 +
; Responsibilities:  Provide consistent labeling of job status<br> PENDING job is pending in queue<br> RUNNING job is running normally<br> SUSPENDED job is suspended, reason will have to be provided in other attributes<br> DONE job has completed normally<br> EXIT job has completed abnormally, reason will have to be provide in other attributes<br> UNKNOWN job status is unknown
 +
; Collaborations:    IRMJob
 +
 +
== Package rm.events ==
 +
 +
=== Interface: IRMResourceManagerListener ===
 +
 +
; Responsibilities:  Registration site for Observer pattern to allow objects to be notified of changes in the IRMResourceManager's state
 +
 +
;Collaborations:    IRMResourceManager<br> RMStructureChangedEvent, RMMachinesChangedEvent, RMNodesChangedEvent, RMJobsChangedEvent, RMQueuesChangedEvent
 +
 +
=== Abstract Class: ResourceManagerEvent ===
 +
 +
; Responsibilities:  Determine type of changed in the IRMResourceManager's state
 +
: The type can be ADDED, MODIFIED, or REMOVED
 +
; Collaborations: IRMResourceManager<br> RMStructureChangedEvent, RMNodesChangedEvent, RMJobsChangedEvent, RMQueuesChangedEvent, RMMachinesChangedEvent
 +
 +
=== Class: RMStructureChangedEvent ===
 +
 +
; Superclass: ResourceManagerEvent
 +
; Responsibilities:  Event created when the ARM has had major structure changes (table columns may need to be recreated)
 +
; Collaborations: none
 +
 +
=== Class: RMNodesChangedEvent ===
 +
 +
; Superclass: ResourceManagerEvent
 +
; Responsibilities:  Event created when the ARM has added, modified, or removed nodes
 +
; Collaborations: IRMNode
 +
 +
=== Class: RMJobsChangedEvent ===
 +
 +
; Superclass: ResourceManagerEvent
 +
; Responsibilities:  Event created when the ARM has added, modified, or removed jobs
 +
; Collaborations: IRMJob
 +
 +
=== Class: RMMachinesChangedEvent ===
 +
 +
; Superclass: ResourceManagerEvent
 +
; Responsibilities:  Event created when the ARM has added, modified, or removed machines
 +
; Collaborations: IRMMachine
 +
 +
=== Class: RMQueuesChangedEvent ===
 +
 +
; Superclass: ResourceManagerEvent
 +
; Responsibilities:  Event created when the ARM has added, modified, or removed queues
 +
; Collaborations: IRMQueue
 +
 +
== Package rm.attributes ==
  
 
=== Interface: IAttribute ===
 
=== Interface: IAttribute ===

Latest revision as of 19:03, 1 November 2006

Overview

This is a preliminary design for the PTP Resource Management system. This is the design of the first phase product, which is limited in scope to viewing the state of the resource manager. This includes the machines, jobs, queues, and nodes that are under the resource manager's control. The classes currently in PTP will implement these new interfaces in addition to their previously implemented interfaces.

File:Ptp resource manager view.pdf

Requirements

The Resource Management system is the Eclipse/PTP interface into a host's resource manager. The Resource Manager (RM) is responsible for determining the layout of the system represented by the host's resource manager (HRM). This includes the determination of what machines, and nodes constitute the physical layout of the system, and their status. The RM is also responsible in determining the dynamic structure of the HRM, i.e. what Queues are available for the resource manager, and what Jobs are queued and running within those Queues. This dynamic structure will also comprise Node allocation and Process information for each Job. Examples of HRMs are Torque, LSF, ORTE, or SLURM. Each of these will have corresponding Eclipse/PTP RMs.

The Resource Management system encompasses not only the interface into a host's resource manager, but also specification of the environment with which to build and launch parallel programs. This environment may include such aspects of building and running parallel jobs, as setting compilers, and paths, e.g. LD_LIBRARY_PATH and include paths. This environment may be effected via module files.

The RM will consist of two parts. The first part will reside within the local Eclipse/PTP session. It will contain the interface presented to the user and maintain the local structures necessary to represent the physical and dynamic structure of the HRM. The second part will reside on the host, and be more intimately related to the HRM. This second portion will form a (usually remote) proxy to the HRM. The local RM will forward requests to the HRM and receive asynchronous responses back from the HRM.

In its role of specifying parallel launch environments, the RM may be sensitive to changes in the HRM's version. In order to shield the Eclipse/PTP system from changes in HRM versions, the local interface for the RM's portion of the launch configuration should consist of a set of typed attributes, to be filled by the user, determined by querying the proxy to the HRM. These typed attributes may include memory or time resource allocation limits, or anything else particular to the selected RM's HRM.

Other requirements:

  1. Terminated jobs persist
  2. Support for disconnect/reconnect to proxy
  3. No synchronization issues between model and proxy (i.e. they never get out of sync)
  4. Ability to register listeners on model objects (e.g. to detect when a job exits)
  5. If stdout capture is supported by the resource manager
    • Ability to display stdout while connected to proxy and job is running
    • Preserve stdout while disconnected
    • Entire run stdout preserved on terminated jobs
  6. Efficiently refer to objects in fixed sets
    • For communcation between Eclipse and proxy
    • e.g. nodes in a machine, procs in a job

Package rm.core

For interfaces and abstract classes, the responsibilities and collaborations refer to concrete objects that are implementations of the interface or abstract class.

Interface: IRMResourceManager

Responsibilities
Proxy used to connect to the ResourceManagerHost's actual resource manager (ARM).
Retrieve list of machines, nodes, jobs, process, and queues from ARM.
Notify registered objects that the lists have changed, either in composition, or in their element's attributes due to changes propagated from the ARM
Collaborations
IRMResourceManagerHost
IRMResourceManagerListener
RMNodesChangedEvent
RMJobsChangedEvent
RMQueuesChangedEvent
RMMachinesChangedEvent
RMStructureChangedEvent
IRMMachine, IRMNode, IRMJob, IRMQueue

Abstract Class: ResourceManagerFactory

Responsibilities
Subclasses of the class are to create and load instances of IRMResourceManager
dispose of any resources acquired by factory objects
Collaborations
IRMResouceManager subclasses


Class: ResourceManagerHost

Responsibilities
Determine which remote (or local) host's resource manager to proxy
Determine which resource manager on the host to proxy
Provide hosts's status
Collaborations
RMStatus

Interface: IRMMachine

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated machine
Set and provide specific attributes for a given attribute description
List all nodes associated with ARM's machine
Provide machine's status
Collaborations
IAttribute
IAttrDesc
RMStatus

Interface: IRMQueue

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated queue
Set and provide specific attributes for a given attribute description
List all nodes that may have jobs dispatched from this queue
Provide queue's status
Collaborations
IAttribute
IAttrDesc
RMStatus

Interface: IRMNode

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated node
Set and provide specific attributes for a given attribute description
List all jobs associated with ARM's node
List all queues that can run jobs on this node
Provide node's status
Collaborations
IAttribute
IAttrDesc
RMStatus

Interface: IRMJob

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated job
Set and provide specific attributes for a given attribute description
List all processes associated with ARM's job
Provide job's status
Collaborations
IAttribute
IAttrDesc
RMJobStatus

Interface: IRMProcess

Responsibilities
Provide the status information, i.e. attributes, for the ARM's associated process
Set and provide specific attributes for a given attribute description
Provide node on which the process runs
Collaborations
IAttribute
IAttrDesc

Enumeration: RMStatus

Responsibilities
Provide consistent labeling of element status
OK element is up and able to accept jobs, etc.
DOWN element is down, reason will have to be provided in other attributes
UNAVAILABLE element is unable to accept jobs, etc., reason will have to be provied in other attributes
ALLOCATED_OTHER element is up but unable to accept jobs due to allocations by other users
UNKNOWN the status is unknown
Collaborations
ResourceManagerHost, IRMMachine, IRMNode, IRMQueue

Enumeration: RMJobStatus

Responsibilities
Provide consistent labeling of job status
PENDING job is pending in queue
RUNNING job is running normally
SUSPENDED job is suspended, reason will have to be provided in other attributes
DONE job has completed normally
EXIT job has completed abnormally, reason will have to be provide in other attributes
UNKNOWN job status is unknown
Collaborations
IRMJob

Package rm.events

Interface: IRMResourceManagerListener

Responsibilities
Registration site for Observer pattern to allow objects to be notified of changes in the IRMResourceManager's state
Collaborations
IRMResourceManager
RMStructureChangedEvent, RMMachinesChangedEvent, RMNodesChangedEvent, RMJobsChangedEvent, RMQueuesChangedEvent

Abstract Class: ResourceManagerEvent

Responsibilities
Determine type of changed in the IRMResourceManager's state
The type can be ADDED, MODIFIED, or REMOVED
Collaborations
IRMResourceManager
RMStructureChangedEvent, RMNodesChangedEvent, RMJobsChangedEvent, RMQueuesChangedEvent, RMMachinesChangedEvent

Class: RMStructureChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has had major structure changes (table columns may need to be recreated)
Collaborations
none

Class: RMNodesChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed nodes
Collaborations
IRMNode

Class: RMJobsChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed jobs
Collaborations
IRMJob

Class: RMMachinesChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed machines
Collaborations
IRMMachine

Class: RMQueuesChangedEvent

Superclass
ResourceManagerEvent
Responsibilities
Event created when the ARM has added, modified, or removed queues
Collaborations
IRMQueue

Package rm.attributes

Interface: IAttribute

Responsibilities
Maintain the relationship between an attribute's value and its description
Specifies a strict-weak ordering of itself and other attributes
Provide a string representation of the attribute
Collaborations
IAttrDesc

Interface: IAttrDesc

Responsibilities
Provide a string description of the attribute
Provide a name of the attribute
Know the actual type of the attribute
Create new attributes of the correct type
Collaborations
IAttribute

Back to the top