Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Aperi Use Cases"

 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Introduction =
+
= In''Italic text''troduction =
 
'''Authors: John Tyrrell and Ted Slupesky'''
 
'''Authors: John Tyrrell and Ted Slupesky'''
  
Line 42: Line 42:
 
* ''Enforcement of policies''
 
* ''Enforcement of policies''
 
** ''Example: HBA, drivers, at defined levels''
 
** ''Example: HBA, drivers, at defined levels''
 +
 +
== Configuration Management ==
 +
 +
* '''View current configuration'''
 +
** '''Storage arrays, storage ports, cache, LUNs, etc.'''
 +
** '''Switches, zones, zone sets, zone aliases, ports,''' ''port bindings,'' '''wwn, pwwn, etc.'''
 +
** ''Routers, hubs, and ports''
 +
** '''Servers, HBAs, S/W Initiators, Hosts,''' ''applications,'' '''mount points, files, etc.'''
 +
** ''Device and server cluster configuration''
 +
** '''''View resource Activity log information'''''
 +
 +
* '''Modify current configuration'''
 +
** '''Create/delete/modify/activate zones, zone sets, etc.'''
 +
** '''LUN Management (Volume create/delete, LUN create/delete, expand, LUN masking/mapping, export LUN to host, mount LUN on host)'''
 +
** ''Change device and server cluster configuration''
 +
** '''''Maintain resource Activity log information'''''
 +
** ''Display “what if” impact of potential changes''
 +
 +
* ''Policy-based restriction on control operations''
 +
** ''Example: “drain mode” as lease expiration approaches''
 +
** ''Example: awareness of what group a user is in, and what group a device is in, and presenting a view of the device or limiting control operations on the device based on group membership''
 +
 +
* ''Policy-based provisioning (“200G of platinum storage”)''
 +
** ''Enforce configuration rules, such as one zone per HBA, or dual paths for hosts, dual paths required''
 +
 +
== Capacity Management ==
 +
 +
* '''Reporting'''
 +
** '''Storage Array'''
 +
*** '''Raw capacity of the box (free/used)'''
 +
*** '''Disks/sizes/spares/RAID configuration'''
 +
*** '''Volumes/LUNs/''' ''Shares''
 +
*** '''LUNs mapped/unmapped/sizes'''
 +
*** '''Cache'''
 +
*** '''Any other internal capacity (e.g. snapshots, reserved space, overhead)'''
 +
** '''Host'''
 +
*** '''Logical drive/mount point/files/size (total/free)'''
 +
*** ''File level detail (size, age, owner/create date-time, last referenced date-time, name, etc.)''
 +
*** ''Processor, memory''
 +
** '''Tape'''
 +
*** '''Libraries, media changers, cartridges, slots'''
 +
*** ''Utilization of the tapes''
 +
** ''Switches''
 +
*** ''Port capacity''
 +
** '''Other'''
 +
*** ''File level:  oldest, largest, owner data, use time/stamps''
 +
*** '''''Storage trend capacity planning (at all levels of HSM hierarchy, and switches)'''''
 +
*** ''User information for charge back, billing based on resource usage''
 +
 +
* ''Policy-based capacity management''
 +
** ''Auto-delete of files of certain type''
 +
** ''Auto-HSM based on policy''
 +
** ''Quota management''
 +
** ''Threshold management (e.g. freespace) – alert user, kick off HSM, provision volume, choose another resource pool''
 +
** ''ILM''
 +
 +
== Performance Management ==
 +
* ''Performance capacity planning and problem diagnosis''
 +
** ''Reporting, monitoring, and threshold-based alerting, end to end and with drill down''
 +
** ''Storage Array''
 +
*** ''Monitor R/W I/O rates, service times, q-depth (i/o rate * service time), cache hits, read vs write, bytes transferred, etc. at the LUN and file level''
 +
*** ''Monitor other device internal performance metrics (internal pathways)''
 +
*** ''Time waiting for write destage, etc.''
 +
** ''Tape''
 +
*** ''Drive usage, # mounts, drive throughput, drive wait time, ...''
 +
** ''Host''
 +
*** ''Monitor R/W I/O rates, response times, bytes transferred by file, LUN, drive mount point, application transaction, etc.''
 +
*** ''Monitor memory usage, L2 cache usage, CPU usage''
 +
*** ''Determine poor performance windows and the cause''
 +
** ''Switch (Fabric)''
 +
*** ''Monitor port stats, bytes transferred, CRC errors, buffer credits, ISL delays, etc.''
 +
** ''Application''
 +
*** ''Transactions, response times, bytes transferred, etc.''
 +
 +
* ''Management''
 +
** ''Monitor performance at user-specified periodicities (e.g., every 15 minutes)''
 +
*** ''Get statistics at same interval across devices for correlation''
 +
*** ''Keep continuous record/log of statistics''
 +
**** ''Policy-based aggregation of statistics over time''
 +
** ''Policy-based provisioning based on performance SLA''
 +
 +
== Availability Management ==
 +
* '''Identify problems and potential problems'''
 +
** '''Monitor all discovered elements'''
 +
** '''''Identify the problems that cause business outages, people time, etc. (identify root cause of problem)'''''
 +
** '''Log all events, and generate reports, etc.'''
 +
* ''Automatically avoid (or fix) problems that occur''
 +
** ''Define automation events based on discrete events, thresholds exceeded, etc. to avoid the failure before it happens''
 +
* '''''Alert responsible administrator/application owner about problems that occur'''''
 +
** '''''If a failure occurs, determine business impact, contact owner (based on business impact), and initiate recovery'''''
 +
* ''Manage data protection''
 +
** ''Scope: single filesystem, single SAN, multisite disaster recovery''
 +
** ''Apply policies to provisioning based on required data protection levels (provision on continuously-replicated storage, snapshot storage, ...)''
 +
** ''Apply policies to data protection mechanisms (frequency of snapshot or backup, lifetime)''
 +
** ''Invoke backup/replication/snapshot mechanisms as required''
 +
** ''Invoke recovery actions when failure occurs''
 +
** ''Cross-reference to other configuration policies that affect availability, such as dual-path enforcement''
 +
** ''Provide reports''
 +
*** ''Example: Are backups running OK?''
 +
*** ''Example: How much storage are my backups/snapshots taking? (cross-reference capacity planning)''
 +
*** ''Example: Vulnerability to failure (what’s not backed up)''

Latest revision as of 21:14, 9 March 2007

InItalic texttroduction

Authors: John Tyrrell and Ted Slupesky

This is a living document intended to capture use cases that apply to the Aperi project. The document lists the applicable use cases, arranged according to the traditional disciplines of storage resource management. Each use case is marked up with an indication of the degree we believe Aperi satisfies the use case, as follows:

  • bold text means Aperi currently satisfies the use case,
  • bold italic text means Aperi currently partially satisfies the use case
  • italic text indicates that Aperi does not currently satisfy the use case.

Discovery Management

  • Identify the storage network, and applications
    • Support filtering for discovery
    • Support physical location of the resource
    • Resource ownership/management information
    • Multi-driven discovery cycle periodicities
    • User-driven “timeout” specifications for resources
  • Display the topology (elements/connections)
    • Supply different views
    • Topology drill-down
    • Visually display events for critical situations

Asset Management

  • Discover & display
    • Resource Type
    • Vendor, model, serial number, release level
    • Software and firmware levels if appropriate
    • Resource owner contact information
    • Management owner contact information
    • Vendor web site and support information
    • Lease expiration date information
    • Installation date/time/user information
    • Last upgrade date/time/user information
  • Inquiry/reporting
    • Example: How much of this vendor’s equipment do we have?
    • Example: How many things do I have at firmware level XXX?
    • Example: Are any systems left running Windows NT?
    • Note: These turn into ‘report on any of the attributes we collected up above in discover & display’
  • Enforcement of policies
    • Example: HBA, drivers, at defined levels

Configuration Management

  • View current configuration
    • Storage arrays, storage ports, cache, LUNs, etc.
    • Switches, zones, zone sets, zone aliases, ports, port bindings, wwn, pwwn, etc.
    • Routers, hubs, and ports
    • Servers, HBAs, S/W Initiators, Hosts, applications, mount points, files, etc.
    • Device and server cluster configuration
    • View resource Activity log information
  • Modify current configuration
    • Create/delete/modify/activate zones, zone sets, etc.
    • LUN Management (Volume create/delete, LUN create/delete, expand, LUN masking/mapping, export LUN to host, mount LUN on host)
    • Change device and server cluster configuration
    • Maintain resource Activity log information
    • Display “what if” impact of potential changes
  • Policy-based restriction on control operations
    • Example: “drain mode” as lease expiration approaches
    • Example: awareness of what group a user is in, and what group a device is in, and presenting a view of the device or limiting control operations on the device based on group membership
  • Policy-based provisioning (“200G of platinum storage”)
    • Enforce configuration rules, such as one zone per HBA, or dual paths for hosts, dual paths required

Capacity Management

  • Reporting
    • Storage Array
      • Raw capacity of the box (free/used)
      • Disks/sizes/spares/RAID configuration
      • Volumes/LUNs/ Shares
      • LUNs mapped/unmapped/sizes
      • Cache
      • Any other internal capacity (e.g. snapshots, reserved space, overhead)
    • Host
      • Logical drive/mount point/files/size (total/free)
      • File level detail (size, age, owner/create date-time, last referenced date-time, name, etc.)
      • Processor, memory
    • Tape
      • Libraries, media changers, cartridges, slots
      • Utilization of the tapes
    • Switches
      • Port capacity
    • Other
      • File level: oldest, largest, owner data, use time/stamps
      • Storage trend capacity planning (at all levels of HSM hierarchy, and switches)
      • User information for charge back, billing based on resource usage
  • Policy-based capacity management
    • Auto-delete of files of certain type
    • Auto-HSM based on policy
    • Quota management
    • Threshold management (e.g. freespace) – alert user, kick off HSM, provision volume, choose another resource pool
    • ILM

Performance Management

  • Performance capacity planning and problem diagnosis
    • Reporting, monitoring, and threshold-based alerting, end to end and with drill down
    • Storage Array
      • Monitor R/W I/O rates, service times, q-depth (i/o rate * service time), cache hits, read vs write, bytes transferred, etc. at the LUN and file level
      • Monitor other device internal performance metrics (internal pathways)
      • Time waiting for write destage, etc.
    • Tape
      • Drive usage, # mounts, drive throughput, drive wait time, ...
    • Host
      • Monitor R/W I/O rates, response times, bytes transferred by file, LUN, drive mount point, application transaction, etc.
      • Monitor memory usage, L2 cache usage, CPU usage
      • Determine poor performance windows and the cause
    • Switch (Fabric)
      • Monitor port stats, bytes transferred, CRC errors, buffer credits, ISL delays, etc.
    • Application
      • Transactions, response times, bytes transferred, etc.
  • Management
    • Monitor performance at user-specified periodicities (e.g., every 15 minutes)
      • Get statistics at same interval across devices for correlation
      • Keep continuous record/log of statistics
        • Policy-based aggregation of statistics over time
    • Policy-based provisioning based on performance SLA

Availability Management

  • Identify problems and potential problems
    • Monitor all discovered elements
    • Identify the problems that cause business outages, people time, etc. (identify root cause of problem)
    • Log all events, and generate reports, etc.
  • Automatically avoid (or fix) problems that occur
    • Define automation events based on discrete events, thresholds exceeded, etc. to avoid the failure before it happens
  • Alert responsible administrator/application owner about problems that occur
    • If a failure occurs, determine business impact, contact owner (based on business impact), and initiate recovery
  • Manage data protection
    • Scope: single filesystem, single SAN, multisite disaster recovery
    • Apply policies to provisioning based on required data protection levels (provision on continuously-replicated storage, snapshot storage, ...)
    • Apply policies to data protection mechanisms (frequency of snapshot or backup, lifetime)
    • Invoke backup/replication/snapshot mechanisms as required
    • Invoke recovery actions when failure occurs
    • Cross-reference to other configuration policies that affect availability, such as dual-path enforcement
    • Provide reports
      • Example: Are backups running OK?
      • Example: How much storage are my backups/snapshots taking? (cross-reference capacity planning)
      • Example: Vulnerability to failure (what’s not backed up)

Back to the top