Jump to: navigation, search

PTP/designs/rm framework

< PTP‎ | designs
Revision as of 13:30, 19 November 2010 by G.watson.computer.org (Talk | contribs)

This page describes the new design for the PTP resource manager monitoring/control framework.

The purpose of this framework is to:

  • Collect and display monitoring information relating to the operation of a target system in a scalable manner
  • Provide job submission, termination, and other job-related operations
  • Support debugger launch and attach
  • Enable the collection and display of stdin and transmission of stdout information from running jobs (where supported by the target system)

Monitoring information will comprise:

  • The status and position of user's jobs in queues
  • Job attribute information
  • Target system status and health information for arbitrary configurations
  • The physical/logical location of jobs on the target system
  • Predictive information about job execution

Key attributes of the framework include:

  • Support for arbitrary system configurations
  • Support for all existing resource managers
  • The ability to scale to petascale system sizes and beyond
  • Support for both user-installable and system-installable modes of operation
  • Automated installation for user-installable operation
  • Simple to add support for new resource managers