PTP/designs/rm framework

Overview

This page describes the new design for the PTP resource manager monitoring/control framework. The motivation for providing a new framework is primarily because the existing RM infrastructure (both model and UI) will not scale and is not flexible enough to encompass all machines that PTP wishes to target.

The purpose of this framework is to:

Collect and display monitoring information relating to the operation of a target system in a scalable manner
Provide job submission, termination, and other job-related operations
Support debugger launch and attach
Enable the collection and display of stdin and transmission of stdout information from running jobs (where supported by the target system)

Monitoring information will comprise:

The status and position of user's jobs in queues
Job attribute information
Target system status and health information for arbitrary configurations
The physical/logical location of jobs on the target system
Predictive information about job execution

Key attributes of the framework include:

Support for arbitrary system configurations
Support for all existing resource managers
The ability to scale to petascale system sizes and beyond
Support for both user-installable and system-installable modes of operation
Automated installation for user-installable operation
Simple to add support for new resource managers
Eliminate the need for compiled proxy agents

Rationale

The existing RM design is documented in the PTP 2.x Design Document. The main issues with the existing RM design fall into the following areas:

Model scalability and flexibility
UI scalability
Complexity of adding new RM support

Model Scalability and Flexibility

PTP employs a MVC architecture for monitoring job and system status. The model is used to represent the target system and the jobs that are running on that system. The model receives updates from the proxy agents running on the target system.

Currently, the model provides a fixed hierarchy in which machines are comprised of nodes, and (resource manager) queues contain jobs. A job has one or more processes, which are running on specific nodes.

One problem with this approach is that model hierarchy is inflexible and can't be used to easily represent more complex architectures (e.g. BG). Although it is possible to map the architecture into machines/nodes, the user may wish to see the actual physical layout of the machine. Also, machines often have physical and logical layouts which should be visible to the user.

Another issue is that the model currently represents the entire system down to the individual process level. This is clearly going to have scaling issues with node/core counts in the hundreds of thousands and process counts in the millions.

The model is really only required for visualizing the system and job status on a target system. A better approach is to have a model that is tailored for this visualization, and that only models the currently visible aspects of the system.

UI Scalability

UI scalability is also a significant concern if PTP is to support petascale (and beyond) systems. The current runtime views display the model using individual icons to represent machines, nodes, jobs, etc. This has been recognized as a potential scaling issue for some time, and although some optimizations have been made, recent scaling tests have shown that the UI will be a major issue in achieving effective scaling.

The ability to scale the UI is going to require a combination of a compact representation for displaying system status, along with a drill down approach for visualizing more detailed information. In addition, it should be possible to provide different views of the same system, such as a physical and logical view. Finally, the UI should continue to link job information with system system in a way that provides meaningful information to the user.

New RM Support

Adding support for a new RM is currently a fairly significant undertaking, requiring the development of a proxy agent that interacts with the job scheduler via commands or APIs. The agent must convert the system-specific information into a protocol using a low-level protocol API. On the client side, additional work is required to provide a launch configuration UI that will allow the user to select the parameters/attributes for the launch.

An approach that simplifies this process is highly desirable, as it would enable PTP to support a broader range of systems, and hopefully expand adoption. The recently developed PBS RM has some features that help this process (much of the RM is job scheduler neutral), but the effort required is still significant. The goal here is to be able to add a new RM without any coding (e.g. via an XML description), or at least minimize the coding to a very small component.

Design

The RM framework is separated into control and monitoring components. Control and monitoring operations are independent and can operate without requiring interaction between the components.

RM Control

Control operations are used to control the submission and interaction with user initiated jobs.

Job attribute discovery
Job submission
Job-related commands (e.g. termination)
Debug session launch
Stdin/stdout forwarding

Control using current proxy protocol

The control operations are not affected by the scaling problems described above. Thus for existing RMs it would be OK to leave the current Proxy Protocol based RM for the control operations. To allow easy development of new RMs, it is important that the Control part of a new RM can be added using a few scripts and XML files.

RM Monitoring

The monitoring operations consist of:

System configuration discovery
System and job status change notification
View change notification

Monitoring based on LLView

The RM Monitoring could be largely based on LLView. LLView has scalable views and a back-end in Perl which supports PBS and LoadLeveler and seems easy extensible for other RMs (also in other languages than Perl). The scaling problem in the data communication is addressed by only communicating the information up to some level of detail which doesn't produce too much data.

Things currently not supported

Requesting additional information for specific racks/nodes/processes/jobs/...
Sending only the changes since last update ( currently the whole model is send on each update)
Getting frequent updates for some information. Currently all information is updated with the same interval. But the job state of own jobs might be required more frequent.

Some other considerations:

Feasibility of automated deployment of the user (vs. sysadmin) version of LLview
The rôle (if any) of the (Java) proxy in mediating between LLview and the client

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

PTP/designs/rm framework

Contents

Overview

Rationale

Model Scalability and Flexibility

UI Scalability

New RM Support

Design

RM Control

Control using current proxy protocol

RM Monitoring

Monitoring based on LLView

Transition

Outstanding Issues

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

PTP/designs/rm framework

Contents

Overview

Rationale

Model Scalability and Flexibility

UI Scalability

New RM Support

Design

RM Control

Control using current proxy protocol

RM Monitoring

Monitoring based on LLView

Transition

Outstanding Issues