InItalic texttroduction

Authors: John Tyrrell and Ted Slupesky

This is a living document intended to capture use cases that apply to the Aperi project. The document lists the applicable use cases, arranged according to the traditional disciplines of storage resource management. Each use case is marked up with an indication of the degree we believe Aperi satisfies the use case, as follows:

bold text means Aperi currently satisfies the use case,
bold italic text means Aperi currently partially satisfies the use case
italic text indicates that Aperi does not currently satisfy the use case.

Discovery Management

Identify the storage network, and applications
- Support filtering for discovery
- Support physical location of the resource
- Resource ownership/management information
- Multi-driven discovery cycle periodicities
- User-driven “timeout” specifications for resources

Display the topology (elements/connections)
- Supply different views
- Topology drill-down
- Visually display events for critical situations

Asset Management

Discover & display
- Resource Type
- Vendor, model, serial number, release level
- Software and firmware levels if appropriate
- Resource owner contact information
- Management owner contact information
- Vendor web site and support information
- Lease expiration date information
- Installation date/time/user information
- Last upgrade date/time/user information

Inquiry/reporting
- Example: How much of this vendor’s equipment do we have?
- Example: How many things do I have at firmware level XXX?
- Example: Are any systems left running Windows NT?
- Note: These turn into ‘report on any of the attributes we collected up above in discover & display’

Enforcement of policies
- Example: HBA, drivers, at defined levels

Configuration Management

View current configuration
- Storage arrays, storage ports, cache, LUNs, etc.
- Switches, zones, zone sets, zone aliases, ports, port bindings, wwn, pwwn, etc.
- Routers, hubs, and ports
- Servers, HBAs, S/W Initiators, Hosts, applications, mount points, files, etc.
- Device and server cluster configuration
- View resource Activity log information

Modify current configuration
- Create/delete/modify/activate zones, zone sets, etc.
- LUN Management (Volume create/delete, LUN create/delete, expand, LUN masking/mapping, export LUN to host, mount LUN on host)
- Change device and server cluster configuration
- Maintain resource Activity log information
- Display “what if” impact of potential changes

Policy-based restriction on control operations
- Example: “drain mode” as lease expiration approaches
- Example: awareness of what group a user is in, and what group a device is in, and presenting a view of the device or limiting control operations on the device based on group membership

Policy-based provisioning (“200G of platinum storage”)
- Enforce configuration rules, such as one zone per HBA, or dual paths for hosts, dual paths required

Capacity Management

Reporting
- Storage Array
  - Raw capacity of the box (free/used)
  - Disks/sizes/spares/RAID configuration
  - Volumes/LUNs/ Shares
  - LUNs mapped/unmapped/sizes
  - Cache
  - Any other internal capacity (e.g. snapshots, reserved space, overhead)
- Host
  - Logical drive/mount point/files/size (total/free)
  - File level detail (size, age, owner/create date-time, last referenced date-time, name, etc.)
  - Processor, memory
- Tape
  - Libraries, media changers, cartridges, slots
  - Utilization of the tapes
- Switches
  - Port capacity
- Other
  - File level: oldest, largest, owner data, use time/stamps
  - Storage trend capacity planning (at all levels of HSM hierarchy, and switches)
  - User information for charge back, billing based on resource usage

Policy-based capacity management
- Auto-delete of files of certain type
- Auto-HSM based on policy
- Quota management
- Threshold management (e.g. freespace) – alert user, kick off HSM, provision volume, choose another resource pool
- ILM

Performance Management

Performance capacity planning and problem diagnosis
- Reporting, monitoring, and threshold-based alerting, end to end and with drill down
- Storage Array
  - Monitor R/W I/O rates, service times, q-depth (i/o rate * service time), cache hits, read vs write, bytes transferred, etc. at the LUN and file level
  - Monitor other device internal performance metrics (internal pathways)
  - Time waiting for write destage, etc.
- Tape
  - Drive usage, # mounts, drive throughput, drive wait time, ...
- Host
  - Monitor R/W I/O rates, response times, bytes transferred by file, LUN, drive mount point, application transaction, etc.
  - Monitor memory usage, L2 cache usage, CPU usage
  - Determine poor performance windows and the cause
- Switch (Fabric)
  - Monitor port stats, bytes transferred, CRC errors, buffer credits, ISL delays, etc.
- Application
  - Transactions, response times, bytes transferred, etc.

Management
- Monitor performance at user-specified periodicities (e.g., every 15 minutes)
  - Get statistics at same interval across devices for correlation
  - Keep continuous record/log of statistics
    - Policy-based aggregation of statistics over time
- Policy-based provisioning based on performance SLA

Availability Management

Identify problems and potential problems
- Monitor all discovered elements
- Identify the problems that cause business outages, people time, etc. (identify root cause of problem)
- Log all events, and generate reports, etc.
Automatically avoid (or fix) problems that occur
- Define automation events based on discrete events, thresholds exceeded, etc. to avoid the failure before it happens
Alert responsible administrator/application owner about problems that occur
- If a failure occurs, determine business impact, contact owner (based on business impact), and initiate recovery
Manage data protection
- Scope: single filesystem, single SAN, multisite disaster recovery
- Apply policies to provisioning based on required data protection levels (provision on continuously-replicated storage, snapshot storage, ...)
- Apply policies to data protection mechanisms (frequency of snapshot or backup, lifetime)
- Invoke backup/replication/snapshot mechanisms as required
- Invoke recovery actions when failure occurs
- Cross-reference to other configuration policies that affect availability, such as dual-path enforcement
- Provide reports
  - Example: Are backups running OK?
  - Example: How much storage are my backups/snapshots taking? (cross-reference capacity planning)
  - Example: Vulnerability to failure (what’s not backed up)

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Aperi Use Cases

Contents

InItalic texttroduction

Discovery Management

Asset Management

Configuration Management

Capacity Management

Performance Management

Availability Management

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Aperi Use Cases

Contents

InItalic texttroduction

Discovery Management

Asset Management

Configuration Management

Capacity Management

Performance Management

Availability Management