Jump to: navigation, search

PTP/designs/rm framework

< PTP‎ | designs
Revision as of 12:39, 19 November 2010 by G.watson.computer.org (Talk | contribs)

Overview

This page describes the new design for the PTP resource manager monitoring/control framework. The motivation for providing a new framework is primarily because the existing RM infrastructure (both model and UI) will not scale and is not flexible enough to encompass all machines that PTP wishes to target.

The purpose of this framework is to:

  • Collect and display monitoring information relating to the operation of a target system in a scalable manner
  • Provide job submission, termination, and other job-related operations
  • Support debugger launch and attach
  • Enable the collection and display of stdin and transmission of stdout information from running jobs (where supported by the target system)

Monitoring information will comprise:

  • The status and position of user's jobs in queues
  • Job attribute information
  • Target system status and health information for arbitrary configurations
  • The physical/logical location of jobs on the target system
  • Predictive information about job execution

Key attributes of the framework include:

  • Support for arbitrary system configurations
  • Support for all existing resource managers
  • The ability to scale to petascale system sizes and beyond
  • Support for both user-installable and system-installable modes of operation
  • Automated installation for user-installable operation
  • Simple to add support for new resource managers
  • Eliminate the need for compiled proxy agents

Design

Outstanding Issues