Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "COSMOS Design 188390"

(Requirements)
 
(66 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
= Nagios Integration with COSMOS =
 
= Nagios Integration with COSMOS =
 +
This is the design document for [https://bugs.eclipse.org/bugs/show_bug.cgi?id=188390 bugzilla 188390].
  
 
== Change History ==
 
== Change History ==
Line 13: Line 14:
 
|Ali Mehregani
 
|Ali Mehregani
 
|11/27/2007
 
|11/27/2007
|<ul><li>Modified based on Mark Weitzel and Valentina Popescu's comments</li></ul>
+
|<ul><li>Modified based on Mark Weitzel and Valentina Popescu's comments</li><li>The document was re-written to incorporate industry standards</li></ul>
 +
|-
 +
|Ali Mehregani
 +
|11/30/2007
 +
|<ul><li>Modified based on Mark Weitzel's suggestions</li><li>The following sections were modified:  1.4.3, 1.5.1.2, and 1.7</li> </ul>
 
|}
 
|}
  
Line 26: Line 31:
 
|-
 
|-
 
| align="left" | Design
 
| align="left" | Design
| 15
+
| 4
 
|  
 
|  
 
|-
 
|-
 
| align="left" | Code
 
| align="left" | Code
| 25
+
| 14
 
|  
 
|  
 
|-
 
|-
 
| align="left" | Test
 
| align="left" | Test
| 15
+
| 4
 
|  
 
|  
 
|-
 
|-
 
| align="left" | Documentation
 
| align="left" | Documentation
| 2
+
| 1
 
|
 
|
 
|-
 
|-
 
| align="left" | Build and infrastructure
 
| align="left" | Build and infrastructure
| 1
+
| 0.5
 
|
 
|
 
|-
 
|-
 
| align="left" | Code review, etc.*
 
| align="left" | Code review, etc.*
| 2
+
| 0.5
 
|
 
|
 
|-
 
|-
 
! align="right" | TOTAL
 
! align="right" | TOTAL
| 60
+
| 24
 
|
 
|
 
|}
 
|}
Line 104: Line 109:
  
 
== Introduction ==
 
== Introduction ==
The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system".  The intention of the project is to apply the same concept to the world of system management.  Complementing standards such as CMDBf, SML, WS-Notification, and Web2.0 technologies are making this vision a reality.  The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.
+
The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system".  The intention of the project is to apply the same principle to the world of system management.  Complementing standards such as SML, CMDBf, WSDM Event Format, WS-Notification, and Web2.0 technologies are making this vision a reality.  The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.
  
Inline with that vision, Nagios can help to illustrate how standards deliver value to an open source community and adopters that intend to provide higher level management capabilities on top of COSMOSIndustry standards can help to integrate commercial solutions with well established monitoring infrastructures such as Nagios.  The end goal of this integration effort is to develop a framework around WS-Notification and the CMDBf specification using Nagios as an exemplary consumer.
+
Inline with this vision is the ability to integrate systems management environments through loosely coupled services exposed via interfaces defined in open standards.  In many circumstances, management environments are already well established and configured within an enterprise.  These environments typically use a wide variety of heterogeneous management software that is pieced together for form a complete solutionIt is not uncommon to find software from different vendors or open source for a particualar aspect of management, e.g. monitoring, configuration, et.  The goal of this enhancement request is to provide a standards based integration strategy, based on the CMDBf specification for exposing configuration data contained within a Nagios server.
  
 
+
The next three sections provide a brief overview of Nagios, WS-Notification, and WSDM Event Format.
The next two sections give a brief overview of Nagios and WS-Notification.
+
  
 
=== What is Nagios? ===
 
=== What is Nagios? ===
Line 131: Line 135:
 
[[Image:nagios-architecture.png]]
 
[[Image:nagios-architecture.png]]
  
<br>There is also a web-based UI included that provides reporting and limited administration capabilities.  A screen shot of the Nagios web-based UI is included below.  The next section describes the scope and the value of this enhancement.<br>
+
<br>There is also a web-based UI included that provides reporting and limited administration capabilities.  A screen shot of the Nagios web-based UI is included below. <br>
  
  
Line 137: Line 141:
  
 
<br>See [http://nagios.sourceforge.net/docs/nagios-3.pdf Nagios user guide] to find out more about its capabilities.
 
<br>See [http://nagios.sourceforge.net/docs/nagios-3.pdf Nagios user guide] to find out more about its capabilities.
 
  
 
=== What is WS-Notification? ===
 
=== What is WS-Notification? ===
Line 149: Line 152:
 
The first specification is used to describe the basic interfaces and calls required by notification producers and consumers, the second specification describes a middle tier between a producer and a consumer, and finally the third specification describes the structure of topics for publishing and subscription.
 
The first specification is used to describe the basic interfaces and calls required by notification producers and consumers, the second specification describes a middle tier between a producer and a consumer, and finally the third specification describes the structure of topics for publishing and subscription.
  
COSMOS intends to provide a notification broker as part of its framework for publication and subscription of events.  The notification broker should not be confused with the broker that resides in the management domain.  They are separate components with different functionalities.  There is a separate enhancement under development for the notification broker and its implementation detail will not be included in this document.
+
COSMOS intends to provide a notification broker as part of its framework for publication and subscription of events.  The notification broker should not be confused with the broker that resides in the management domain.  They are separate entities with different functionalities.  There are separate enhancements to cover the implementation detail of the notification broker and incident manager.  The incident manager will be discussed later.
  
 
The following is a list of terminologies commonly used in the context of WS-Notification:
 
The following is a list of terminologies commonly used in the context of WS-Notification:
Line 171: Line 174:
 
|-
 
|-
 
|Topic
 
|Topic
|Topics are used to categorize the notification messages produced.  Topics can be defined in the form of hierarchies
+
|A hierarchical structure to categorize the notification messages produced.   
 
|-
 
|-
 
|Topic Space
 
|Topic Space
|A forest of topic trees (i.e. sets of topic tress)
+
|A forest of topic trees (i.e. a series of topic tress)
 
|}
 
|}
 +
 +
WS-Notification falls short of defining a well structured event format as part of the notification message produced by a producer.  The structure of the message is left to the entity creating the message.  COSMOS will use WSDM Event Format (WEF) to report messages using a well defined structure.  The next section gives a brief overview of what WEF is.
 +
 +
=== What is WEF? ===
 +
 +
WEF or WSDM Event Format is a well-structured XML language used to represent management event information.  The format was established based on the submission of Common Base Event specification to OASIS by IBM and Cisco.  The base requirement of the event format is described in [http://www.oasis-open.org/committees/download.php/20576/wsdm-muws1-1.1-spec-os-01.pdf part 1 of WSDM:MUWS] and an extension is developed in [http://www.oasis-open.org/committees/download.php/20575/wsdm-muws2-1.1-spec-os-01.pdf part 2 of WSDM:MUWS].  COSMOS will leverage the situation element described in part 2.  The pseudo-schema of the event format as described in part 1 is shown below:
 +
 +
<code>
 +
<pre>
 +
<muws1:ManagementEvent ...
 +
  muws1:ReportTime=”xs:dateTime”?>
 +
 +
    <muws1:EventId>xs:anyURI</muws1:EventId>
 +
 +
    <muws1:SourceComponent ...>
 +
    <muws1:ResourceId>xs:anyURI</muws1:ResourceId> ?
 +
    <muws1:ComponentAddress>{any}</muws1:ComponentAddress> *
 +
    {any}*
 +
    </muws1:SourceComponent>
 +
 +
    <muws1:ReporterComponent ...>
 +
    <muws1:ResourceID>xs:anyURI</muws1:ResourceId> ?
 +
    <muws1:ComponentAddress>{any}</muws1:ComponentAddress> *
 +
    {any}*
 +
    </muws1:ReporterComponent> ?
 +
    {any}*
 +
</muws1:ManagementEvent>
 +
</pre>
 +
</code>
 +
 +
The pseudo-schema of the situation element as described in part 2 of the specification is shown below:
 +
 +
<code>
 +
<pre>
 +
<muws2:Situation>
 +
    <muws2:SituationCategory>
 +
    muws2:SituationCategoryType
 +
    </muws2:SituationCategory>
 +
   
 +
    <muws2:SuccessDisposition>
 +
    (Successful|Unsuccessful)
 +
    </muws2:SuccessDisposition> ?
 +
   
 +
    <muws2:SituationTime>xs:dateTime</muws2:SituationTime> ?
 +
    <muws2:Priority>xs:short</muws2:Priority> ?
 +
    <muws2:Severity>xs:short</muws2:Severity> ?
 +
    <muws2:Message>muws:LangString</muws2:Message> ?
 +
 
 +
    <muws2:SubstitutableMsg MsgId=”xs:string” MsgIdType=”xs:anyURI”>
 +
    <muws2:Value>xs:anySimpleType</muws2:Value>*
 +
    </muws2:SubstitutableMsg> ?
 +
</muws2:Situation>
 +
</pre>
 +
</code>
  
 
== Purpose ==
 
== Purpose ==
The purpose of this document is to describe how COSMOS, and by extension commercial vendors, can leverage an existing installation of Nagios via industry standard interfaces.
+
The purpose of this document is to describe how COSMOS, and by extension commercial vendors, can leverage standard interfaces to integrate with an existing Nagios sever via industry standard interfaces.
  
 
=== Scope ===
 
=== Scope ===
There are a number of areas where COSMOS can add value to Nagios. The areas can be summarized into three categories:
+
There are three areas where the standards supported and applied in the COSMOS project can help integrate existing management infrastructures.
  
# Standardized Query Capability
+
# Standardized query interfaces for access to management data
# Publication and Subscription of Nagios Events
+
# Integration through publication and subscription of events via standards based APIs in a standardized format
# Reporting on WS-Notifications
+
# Reporting and visualizations based on standard event format
 +
 
 +
 
 +
==== Standardized Query Interfaces ====
 +
The contribution of a CMDBf query service on top of a Nagios server will provide a standardized mechanism for querying the configuration items managed by Nagios. A CMDBf query service will also allow Nagios to participate in a federating CMDB environment. It will also make it easier to integrate multiple Nagios servers and/or commercial-based solutions under one infrastructure.
  
==== Standardized Query Capability ====
 
The contribution of a CMDBf query service on top of a Nagios server will provide a standardized mechanism for querying the configuration items managed by Nagios.  A CMDBf query service will also allow Nagios to participate in a federating CMDB environment.  A well-known query service will make it easier to integrate multiple Nagios servers and/or commercial-based solutions under one infrastructure.
 
  
 
There are 10 different object types defined in Nagios:  
 
There are 10 different object types defined in Nagios:  
Line 205: Line 264:
 
The first 6 object types are examples of configuration items that can be exposed via a CMDBf query service.  Operational data such as the status of a host/service will not be exposed via the query service.  This information will instead be published to a notification broker described in the next section.
 
The first 6 object types are examples of configuration items that can be exposed via a CMDBf query service.  Operational data such as the status of a host/service will not be exposed via the query service.  This information will instead be published to a notification broker described in the next section.
  
==== Publication and Subscription of Nagios Events ====
+
==== Publication and Subscription of Nagios Events in a Standard Format ====
Assuming the existence of a notification broker in COSMOS, Nagios can publish a set of topics to indicate the status of the monitored hosts and servicesThe notification broker can disseminate messages to any client that subscribe for the published topicsThis mechanism will provide the ability for clients to process events generated from multiple monitoring solutionsConsider the following example:
+
An existing Nagios infrastructure is typically setup to raise a set of events to the serverEach management products is forced into a pairwise integration if they would like to leverage information surfaced through an existing Nagios environmentAs part of this enhancement a mechanism and set of best practices will be provided that enable existing Nagios implementations to surface events in standardized format using standardized APIs.  In addition to loosely coupled integration, by adopting these standard WS based interfaces and management topics, commercial vendors may also provide value added event management systems.   
  
Assume the existence of three data managers:
+
Using this mechanism, Nagios can leverage WS-Notification to publish events on a set of topics to indicate the status of the monitored hosts and services.  These events will be delivered to ''any'' client that subscribe for the published topics.  Using standards, COSMOS can provide a framework to allow the publication and subscription of events in the context of web services.  This provides a mechanism to for integration with higher level of management capabilities using commercial based offerings.  See [[#Use_Cases|Use Cases]] for a concrete example.
* A provisioning solution capable of deploying software to multiple nodes
+
* A Nagios server monitoring a cluster of nodes
+
* A second Nagios server monitoring a different cluster of nodes
+
  
Also assume that the two Nagios servers are used to monitor the current patch level on a set of Windows nodes.  The Nagios servers can publish a topic to the notification broker to describe the patch level on each Windows-based node.  The provisioning solution can then subscribe to this topic and deploy any necessary update when available.  The figure below depicts the example.  The notification broker is assumed to reside in the management domain and the events are assumed to be persisted by a notification manager:
+
==== Reporting and visualizations based on standard event format ====
 
+
COSMOS can generate BIRT reports based on events in the standard format. These reports will be generated from events reported to the COSMOS data managersIn addition, adopters can provide a custom report template that generates a report tailored to produce the information they need. This will help facilitate the growth of an ecosystem of reports that can be consumed by any management application that supports the standard event format.
[[Image:nagios-cosmos-example.png]]
+
 
+
==== Reporting on WS-Notifications ====
+
A number of reports are available under the Nagios web-based UI: trends, availability, alert histograms/history/summary, notifications, and event logs.  The views are generated based on events per Nagios server.  There is no mechanism available to aggregate results from multiple Nagios servers.
+
 
+
COSMOS on the other hand can generate reports based on aggregated data from multiple notification producers. The reports will be generated from messages reported to the notification brokerAny Nagios server instance or commercial-based solution producing notification messages will be able to use a set of general reports that provide an overview of the events that have been reported.  Adopters can alternatively provide a custom report template that generate reports based on a subset of messages reported to the notification manager.
+
 
+
This assumes the existence of a data manager that subscribes to all topics published to the notification broker.  The data manager can persist all events reported by the notification broker.  This component is depicted as the notification manager in the figure shown in the previous section.
+
  
 
== Requirements ==
 
== Requirements ==
Line 233: Line 281:
  
 
== Use Cases ==
 
== Use Cases ==
The following use cases outline some of the typical tasks that COSMOS users will perform to accomplish an objective.
+
The following use cases outline some of the typical tasks that COSMOS adopters/end-users will perform to accomplish an objective.  
  
=== Use Case 1: Adding a machine to the asset database ===
+
=== Use Case 1: Leveraging Federating CMDB with Nagios MDRs ===
  
Assumption: The COSMOS framework is successfully installed with the asset repository MDR
+
A federating CMDB is not in the scope of the COSMOS project but a commercial vendor can register one with the COSMOS framework.  This use case explains how multiple instances of Nagios servers (equipped with COSMOS MDR/plug-in code) can participate in the presence of a commercial-based federating CMDB.
  
# User opens a browser and points to the URL of the COSMOS client
+
# A federating CMDB is registered with the COSMOS framework
# User right clicks the asset database and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
+
# The federating CMDB discovers the EPR of Nagios MDR registered with the COSMOS framework
# The fields are populated and the 'Finish' button is pressed.
+
# Using the pull-mode, the federating CMDB submits a CMDBf query to retrieve the configuration items managed by the Nagios servers
# Client will indicate that it is writing the data to the database.  The user is either prompted with an error message or a confirmation message to indicate success.  In case of user error, the form is returned to be corrected.
+
# The data retrieved from all servers are federated
  
=== Use Case 2: Monitoring of a host in Nagios ===
+
This capability enables adopters to create aggregated reports or views on resources managed by multiple Nagios MDR.  This is also possible with Nagios and a commercial management solution configuration.  The only requirement is the registration of the commercial solution as an MDR.
  
Assumption: The COSMOS framework is successfully installed with the asset repository MDR and the Nagios data collector.
+
The figure belows depicts a concrete example of Nagios instances participating in a federating CMDB environment.  The two servers monitor cluster of nodes that overlap (see highlighted section).  The federating CMDB can consolidate the data between the two servers to provide an aggregated view.
  
# User opens a browser and points to the URL of the COSMOS client
 
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
 
# User selects the type of object that is to be defined, which in this case happens to be a host.  The form is populated by all items from registered data managers that are candidates of the object type selected.  In this case all resources that are candidates of being monitored are queried from the asset database and displayed in the details pane.
 
# Where possible, items of corresponding data stores are displayed to make it easier for a field to be populated.  For example, the items of an employee database can be displayed for the contact/contact group fields of a host definition.
 
# The user either has the option of selecting a discovered host or populating the fields manually.  In the case where a discovered host is selected, the fields of the form are populated based on the selected host.
 
# User clicks 'Finish' to finalize the process of initiating the monitoring process of the host
 
  
=== Use Case 3: Monitoring of a service defined with Nagios ===
+
[[Image:nagios-cosmos-example3.png]]
 +
 
 +
=== Use Case 2: Retrieving the Configuration Items of a Nagios MDR ===
  
Assumption: The COSMOS framework is successfully installed with the asset repository MDR and the Nagios data collector.
+
This use case explains the steps required by an end-user to visualize the configuration items/manageable resources that is being monitored by a Nagios server.
  
 
# User opens a browser and points to the URL of the COSMOS client
 
# User opens a browser and points to the URL of the COSMOS client
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
+
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Submit CMDBf Query'
# User selects the type of object that is to be defined, which in this case happens to be a service.  The form is populated by all available services on the associated host.  For example, a configuration database can be queried by the client to retrieve all available services on an associated host.
+
# A CMDBf query is submitted
# Where possible items of corresponding data stores are displayed to make it easier for a field to be populated.  For example, the items of an employee database can be displayed for the contact/contact group fields of a host definition.
+
# The generic XML viewer is opened with the response to the query
# The user either has the option of selecting a discovered service or populate the fields manually.  In the case where a discovered service is selected, the fields of the form are populated based on the selected service.
+
# User clicks 'Finish' to finalize the process of initiating the monitoring process of the service
+
  
=== Use Case 4: Viewing the status of hosts/services being monitored ===
+
A view will be implemented to better visualize configuration items that conform to an SML model.  Similar to the CMDBf query action, this option will be available through Nagios' context menu.  Adopters can also leverage this view by conforming to the SML-based model defined in COSMOS.
  
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.  It's assumed that one or more host/service is configured for monitoring.
+
=== Use Case 3: Subscription to Nagios Events ===
  
# User opens a browser and points to the URL of the COSMOS client
+
This use case is relevant to adopters who intend to use Nagios events to provide higher level management capabilities in COSMOS.  The client subscribing to events can be a web service, data manager, or simply a standalone application that is capable of communicating with COSMOS frameworkThe steps below refer to the Nagios notification consumer as simply the client:
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Display Monitoring Resources'A tree of hosts and services are displayed in the details pane with corresponding icons that indicate the last status check of a host/service.  See use case 5 for details about finding more information about a failed host/service.
+
  
=== Use Case 5: Determining the problem associating with a host/service ===
+
# Client contacts the management domain
 +
# Client retrieves the broker(s) of the management domain
 +
# Client retrieves the EPR of a desired Nagios server
 +
# Client retrieves the notification broker using the management domain's API
 +
# Using the EPR of the Nagios server, client retrieves the topic space published by the Nagios server
 +
# Client subscribes to a set of topics from the topic space using the notification broker APIs
  
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.  It's assumed that one or more host/service is configured for monitoring and at least one host/service is down.  The context of the use case is the status navigation tree.
+
After the subscription, the client will be notified of any situations that correspond to the topics published by the Nagios server.  The diagram below pictorially describes a concrete example of how notification events from Nagios servers can be consumed by a commercial offering.
  
# User right clicks a host/service that is indicated to be down and selects 'Display Information'
+
Assume the existence of three data managers:
# A BIRT report is generated to display the events of the selected host/service that led to its downtime.
+
* A commercial-based provisioning solution capable of deploying software to multiple nodes
 +
* A Nagios server monitoring a cluster of nodes
 +
* A second Nagios server monitoring a different cluster of nodes
  
=== Use Case 6: Generating reports based on host availability ===
+
The two Nagios servers are used to monitor the current patch level on a set of Windows nodes.  The Nagios servers can publish a topic to the notification broker to describe the patch level  The provisioning solution can then subscribe to this topic and deploy any necessary update when available.  The notification broker is assumed to reside in the management domain and the events are assumed to be persisted by an incident manager:
  
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.
+
[[Image:nagios-cosmos-example.png]]
  
# User opens a browser and points to the URL of the COSMOS client
 
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Generate Report > Availability'.  A BIRT report is generated and displayed to show the general availability of hosts and services being monitored.
 
  
=== Use Case 7: Removing a host being monitored ===
+
=== Use Case 4: Generating Reports Based on WS-Notification Messages ===
  
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.  It's assumed that one or more host/service is configured for monitoring.
+
As discussed in section "[Publication and Subscription of Nagios Events]", it's assumed that a data manager called "incident Manager" persists all notification messages reported to the notification broker.  Just like any data manager, a set of associated reports can be used to visualize the events generated by multiple notification producers (e.g. Nagios server).  This use case explains the steps required in generating an availability report on WS-Notifications.
  
 
# User opens a browser and points to the URL of the COSMOS client
 
# User opens a browser and points to the URL of the COSMOS client
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Display Monitoring Resources'. A tree of hosts and services are displayed in the details pane.
+
# User right clicks the "Incident Manager" item displayed under the data manager navigator and selects 'Generate Report > Availability'.
# User right clicks a host and selects 'Remove'The user is prompted with a message that indicates the implication of removing a host.
+
# A report is generated and displayed based on notification messages persisted by the incident manager
# The user clicks OK to proceed.  The host is removed from Nagios and the status navigation tree is updated to reflect the user action.
+
 
 +
This low coupling architecture provides the ability to generate reports on messages produced by completely different management solutionsAn adopter desiring to use COSMOS notification reporting facility will only need to register as a data manager and a notification producer. The adopter will be able to reuse the same reports for as long as the message conforms to WSDM Event Format (WEF).
  
 
== Implementation Detail ==
 
== Implementation Detail ==
  
Integrating Nagios will span features across three sub-projects: Data Collection, Resource Modeling, and Data Visualization.  The features can be categorized into three different areas:
+
Integrating Nagios will span features across two sub-projects: Management Enablement and Data Visualization.  The integration can be separated into three enhancements.  Each enhancement indicates the subproject that it will reside in:
  
<b>Nagios Data Manager Package
+
# Registering Nagios as an MDR (Management Enablement)
Resource Modeling MDRs
+
# Making Nagios a Notification Producer (Management Enablement)
Nagios Client</b>
+
# Reporting on WS-Notification Messages (Data Visualization)
  
The diagram below displays the interaction between the COSMOS and Nagios components:
+
This integration depends on an implementation of a notification broker and an incident manager.  It's expected for the notification broker to conform to the WS-NotificationBroker standard.  The incident manager is expected to persist messages disseminated by the the notification broker.
[[Image:nagios-cosmos.png]]
+
  
 +
=== Registering Nagios as an MDR (Management Enablement) ===
 +
This enhancement will be concerned with the following tasks:
  
=== Nagios Data Manager Package ===
+
# Providing a CMDBf query for retrieving configuration items of a Nagios server
A package will need to be included to register the Nagios server as a data manager with COSMOS framework.  The package will need to:
+
# Mapping the configuration items to an SML model
 +
# Registering the Nagios server as an MDR with the COSMOS framework
 +
# Providing a hierarchical view to display the manageable resources of the Nagios server
 +
# Registering the view with the Data Visualization framework
  
# Discover the management domain and register itself as a data manager with the advertised brokers
+
An effective implementation of this enhancement will allow a user to:
# Discover the CBE data manager to determine its end point.  This may be code that needs to run periodically until the CBE repository registers itself with the management domain.  Keep in mind that there is no ordering of how data managers are registered.  There is always the possibility of the CBE data manager registering itself after the Nagios data manager.
+
# After discovering the CBE data manager, activate a Nagios plug-in that will redirect all events to the CBE repository.  This step will require Nagios to be reconfigured and restarted.  The user will need to be prompted before restarting the Nagios process.
+
  
The Nagios packaging code base is expected to be checked into the Data Collection subproject.
+
# View a configured Nagios server under the data manager navigator
 +
# Allow the user to retrieve and view the manageable resources monitored by the Nagios server
 +
# Provide the ability to submit CMDBf queries to Nagios servers
  
=== Resource Modeling MDRs ===
+
=== Making Nagios a Notification Producer (Management Enablement) ===
As part of illustrating the seamless integration of multiple MDRs with a system monitoring application, there will be two additional MDRs added to the Resource Modeling subproject.  The asset repository will also need to be modified to ensure a smooth integration.  The two new MDRs will be:
+
This enhancement will be concerned with the following tasks:
  
# Configuration MDR - contains configuration detail about what is stored and running on a host
+
# Providing a Nagios plug-in to capture all notification messages
# Employee MDR - contains information about staff members
+
# Define a mapping from the notification message to a WSDM event format
 +
# Publish a topic space related to the notification messages that can be produced
 +
# Notify the notification broker of any situations that occur
 +
# Provide a mechanism for adopters to extend the topic space published by Nagios servers
  
The first database will be used to discover services that can be monitored on a specific host and the second database will be used to display a list of employees that can be included in the contact list of a host or service definition.  Both MDRs are expected to be implemented on top of the SML repository which already includes a CMDBf query capability.  The SML repository code will need to be refactored to extract out any code that is specific to the asset repository.
+
An effective implementation of this enhancement will allow an adopter to:
  
=== Nagios Client ===
+
# Discover the Nagios topic space published to the notification broker
The data visualization subproject is expected to contribute the following functionalities:
+
# Subscribe to the Nagios topics
 +
# Receive notification on Nagios topics
  
# The ability to define hosts and services
+
=== Reporting on WS-Notification Messages (Data Visualization) ===
# The ability to generate reports on Nagios events, notifications, alerts, and etc...
+
This enhancement will be concerned with the following task:
# The ability to write objects to the asset, configuration, and employee MDRs
+
# Nagios specific views to visualize the status of the monitored objects
+
  
 +
# Define a set of report templates that can be associated with the incident manager
  
 +
An effective implementation of this enhancement will allow the user to:
  
== Task Breakdown ==
+
# Generate reports based on events reported by Nagios server(s)
The following section breaks down each individual task based on subproject.  Symbols are used to indicate the enhancement that each work item falls under.
+
  
=== Resource Modeling ===
+
== Open Issues/Questions ==
&#x3A6; Refactor any code necessary to provide write capability to the asset repository <br/>
+
All reviewer feedback should go in the [[Talk:Nagios_Integration_with_COSMOS|talk page]] for Nagios integration with COSMOS.
&#x3A6; Refactor the data center SML model to make it fit better with the resource model that Nagios uses <br/>
+
&#x3A6; Refactor the CMDBf query code for the asset based repository to provide any additional queries that the client will need <br/>
+
&#x3A6; Provide a model mapping from the asset model to the Nagios model <br/>
+
&#x3A8; Refactor the SML repository code to provide a common plug-in that multiple SML based repositories can use <br/>
+
&#x3A8; Provide an employee based model using SML <br/>
+
&#x3A8; Extend the SML repository code to provide an employee database <br/>
+
&#x3A8; Provide a CMDBf query implementation for the employee database <br/>
+
&#x3A8; Provide a model mapping from the employee model to the Nagios model <br/>
+
&#x3A9; Provide a configuration based model using SML  <br/>
+
&#x3A9; Extend the SML repository code to provide a configuration database <br/>
+
&#x3A9; Provide a CMDBf query implementation for the configuration database <br/>
+
&#x3A9; Provide a model mapping from the configuration model to the Nagios model <br/>
+
&#x3B2; Use the programming model to plug-in the employee MDR into COSMOS framework <br/>
+
&#x3B2; Use the programming model to plug-in the configuration MDR into COSMOS framework <br/>
+
  
  
<b>Enhancements:</b> <br/>
 
&#x3A6; [Nagios]Generalize the asset repository and the data center model <br/>
 
&#x3A8; [Nagios]Provide an employee MDR based on the SML repository <br/>
 
&#x3A9; [Nagios]Provide a configuration MDR based on the SML repository <br/>
 
&#x3B2; [Nagios]Add the employee and configuration MDRs to the COSMOS framework <br/>
 
  
=== Data Collection ===
+
----
&#x3A6; Define a mapping between Nagios events and CBE events <br/>
+
[[Category:COSMOS_Bugzilla_Designs]]
&#x3A6; Provide a Nagios plug-in to forward events to the CBE data manager <br/>
+
&#x3B2; Provide a mechanism to register a Nagios server as a data manager <br/>
+
&#x3B2; Provide administrative capabilities that the client can invoke <br/>
+
 
+
 
+
<b>Enhancements:</b> <br/>
+
&#x3A6; [Nagios]Provide a Nagios plug-in to log events as CBEs to the CBE data manager <br/>
+
&#x3B2; [Nagios]Register a Nagios monitoring server as a data manager <br/>
+
 
+
=== Data Visualization ===
+
&#x3A6; Provide actions to write to the asset MDR <br/>
+
&#x3A6; Provide the forms necessary to write data to the asset MDR <br/>
+
&#x3A6; Provide actions to write to the employee MDR <br/>
+
&#x3A6; Provide the forms necessary to write data to the employee MDR <br/>
+
&#x3A6; Provide actions to write to the configuration MDR <br/>
+
&#x3A6; Provide the forms necessary to write data to the configuration MDR <br/>
+
&#x3A8; Provide actions to define objects on the Nagios data manager <br/>
+
&#x3A8; Provide the forms necessary to define the Nagios objects <br/>
+
&#x3A8; Provide actions to perform administrative tasks on the Nagios data manager <br/>
+
&#x3A8; Define a framework that allows for an MDR to be replaced/added as part of defining objects for Nagios <br/>
+
&#x3A9; Provide a navigator that displays the status of hosts and services monitored on Nagios <br/>
+
&#x3B2; Provide reporting capabilities for viewing host/service events <br/>
+
&#x3B2; Provide two general reporting capabilities on Nagios events (e.g. availability and alert history) <br/>
+
 
+
 
+
 
+
<b>Enhancements:</b> <br/>
+
&#x3A6; [Nagios]Provide write actions and forms for the asset, configuration, and employee MDR <br/>
+
&#x3A8; [Nagios]Provide actions and forms for defining Nagios objects <br/>
+
&#x3A9; [Nagios] Provide a status navigator for the Nagios data manager <br/>
+
&#x3B2; [Nagios] Provide reporting capabilities for the Nagios events <br/>
+
 
+
== Open Issues/Questions ==
+
All reviewer feedback should go in the [[Talk:Nagios_Integration_with_COSMOS|talk page]] for Nagios integration with COSMOS.
+

Latest revision as of 18:14, 5 December 2007

Nagios Integration with COSMOS

This is the design document for bugzilla 188390.

Change History

Name: Date: Revised Sections:
Ali Mehregani 11/19/2007
  • Initial version
Ali Mehregani 11/27/2007
  • Modified based on Mark Weitzel and Valentina Popescu's comments
  • The document was re-written to incorporate industry standards
Ali Mehregani 11/30/2007
  • Modified based on Mark Weitzel's suggestions
  • The following sections were modified: 1.4.3, 1.5.1.2, and 1.7

Workload Estimation

Rough workload estimate in ONE person week
Process Sizing Names of people doing the work
Design 4
Code 14
Test 4
Documentation 1
Build and infrastructure 0.5
Code review, etc.* 0.5
TOTAL 24

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document. The list below defines each term regarding how it is used in this document:

Term Definition
MDR Management Data Repository
CMDBf Specification for a CMDB that federates between multiple MDRs
CMDB Configuration Management Database
CBE Common Base Event - A standard that defines a common format for logging
SML Service Modeling Language - An XML based language used for modeling
SML Model A set of SML compliant resources
SML Repository The SML Repository describes any SML model together with a set of COSMOS API used to add new SML resources to the SML model and to query the SML model.
CMDBf query MDRs make data available via a query service defined in the CMDBf specification. The input and output of a CMDBf query is a structured XML document described in the specification.
Host A host in Nagios terms is any entity on a network that can be monitored (e.g. desktop, router, printer, etc...)
Host check A host check in Nagios corresponds to running a command that will indicate the status of a host
Service There are two types of services that can be monitored by Nagios on a host: public and private. Examples of public services are HTTP, FTP, POP3, SSH, etc... and examples of private services are CPU utilization, memory consumption, disk space, power consumption, etc...
Service check Analogous to host check, a service check involves running a command that will check the status of a service
Command A Nagios command is either a shell executable or a Perl script that performs a specific task (e.g. host/service check)

Introduction

The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system". The intention of the project is to apply the same principle to the world of system management. Complementing standards such as SML, CMDBf, WSDM Event Format, WS-Notification, and Web2.0 technologies are making this vision a reality. The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.

Inline with this vision is the ability to integrate systems management environments through loosely coupled services exposed via interfaces defined in open standards. In many circumstances, management environments are already well established and configured within an enterprise. These environments typically use a wide variety of heterogeneous management software that is pieced together for form a complete solution. It is not uncommon to find software from different vendors or open source for a particualar aspect of management, e.g. monitoring, configuration, et. The goal of this enhancement request is to provide a standards based integration strategy, based on the CMDBf specification for exposing configuration data contained within a Nagios server.

The next three sections provide a brief overview of Nagios, WS-Notification, and WSDM Event Format.

What is Nagios?

Nagios is a system and network monitoring application that is capable of detecting and notifying abnormal behavior. The definition and monitoring behavior is defined by administrators using a set of flat-file configurations. The files indicate what and how things should be monitored. There are three primary atomic entities in Nagios:

Host A physical device on a network that is intended to be monitored (e.g. a desktop, printer, router, switch, hub, etc...).
Service Indicates the specific component of a host that should be monitored (e.g. CPU utilization, memory consumption, HTTP, etc...)
Command A utility that allows for a host/service check, notification handling, alerts, etc.... For example, check_CPU can be a command used to monitor the CPU utilization on a particular host.


An administrator is required to define hosts, services, and commands to effectively monitor a set of resources. The actual monitoring of a host/service is not done by Nagios. It is instead done by add-on plug-ins that are defined as individual commands. This architecture provides the capability to virtually monitor any aspect of a system that can be automated. There are already many available plug-ins for monitoring common hosts/services in a typical networking environment. Where limited, administrators can write their own plug-in to accomplish the monitoring of an uncommon host/service. The data collected from plug-ins are logged to flat files. Nagios itself doesn't persistent events to a database but plug-ins are available to direct events to an RDBMS such as MySQL.

The Nagios service runs on Linux but it is capable of monitoring desktops running Windows via its plug-in architecture. As part of its monitoring solution, Nagios also provides an alerting mechanism that broadcasts a problem to sets of contacts or contact groups. A notification handler can also be registered to take certain actions based on incoming events (e.g. storing status information in an RDBMS). The diagram below, extracted from Nagios documentation, pictorially depicts the components:

Nagios-architecture.png


There is also a web-based UI included that provides reporting and limited administration capabilities. A screen shot of the Nagios web-based UI is included below.


Nagios.png


See Nagios user guide to find out more about its capabilities.

What is WS-Notification?

WS-Notification is an umbrella for a set of specifications that describe the publishing and subscription of events in the context of Web services. There are three specifications that fall under WS-Notification:

  1. WS-BaseNotification
  2. WS-BrokeredNotification
  3. WS-Topics

The first specification is used to describe the basic interfaces and calls required by notification producers and consumers, the second specification describes a middle tier between a producer and a consumer, and finally the third specification describes the structure of topics for publishing and subscription.

COSMOS intends to provide a notification broker as part of its framework for publication and subscription of events. The notification broker should not be confused with the broker that resides in the management domain. They are separate entities with different functionalities. There are separate enhancements to cover the implementation detail of the notification broker and incident manager. The incident manager will be discussed later.

The following is a list of terminologies commonly used in the context of WS-Notification:

Term Definition
Notification Producer An entity that creates a notification message
Notification Consumer An entity receiving a notification message
Subscription The act of advertising the interest for listening on a set of topics
Publishing The act of advertising the interest for producing notification on a set of topics
Topic A hierarchical structure to categorize the notification messages produced.
Topic Space A forest of topic trees (i.e. a series of topic tress)

WS-Notification falls short of defining a well structured event format as part of the notification message produced by a producer. The structure of the message is left to the entity creating the message. COSMOS will use WSDM Event Format (WEF) to report messages using a well defined structure. The next section gives a brief overview of what WEF is.

What is WEF?

WEF or WSDM Event Format is a well-structured XML language used to represent management event information. The format was established based on the submission of Common Base Event specification to OASIS by IBM and Cisco. The base requirement of the event format is described in part 1 of WSDM:MUWS and an extension is developed in part 2 of WSDM:MUWS. COSMOS will leverage the situation element described in part 2. The pseudo-schema of the event format as described in part 1 is shown below:

 <muws1:ManagementEvent ...
  muws1:ReportTime=”xs:dateTime”?>

    <muws1:EventId>xs:anyURI</muws1:EventId>

    <muws1:SourceComponent ...>
    <muws1:ResourceId>xs:anyURI</muws1:ResourceId> ?
    <muws1:ComponentAddress>{any}</muws1:ComponentAddress> *
    {any}*
    </muws1:SourceComponent>

    <muws1:ReporterComponent ...>
    <muws1:ResourceID>xs:anyURI</muws1:ResourceId> ?
    <muws1:ComponentAddress>{any}</muws1:ComponentAddress> *
    {any}*
    </muws1:ReporterComponent> ?
    {any}*
 </muws1:ManagementEvent>

The pseudo-schema of the situation element as described in part 2 of the specification is shown below:

 <muws2:Situation>
    <muws2:SituationCategory>
    muws2:SituationCategoryType
    </muws2:SituationCategory>
    
    <muws2:SuccessDisposition>
    (Successful|Unsuccessful)
    </muws2:SuccessDisposition> ?
    
    <muws2:SituationTime>xs:dateTime</muws2:SituationTime> ?
    <muws2:Priority>xs:short</muws2:Priority> ?
    <muws2:Severity>xs:short</muws2:Severity> ?
    <muws2:Message>muws:LangString</muws2:Message> ?
   
    <muws2:SubstitutableMsg MsgId=”xs:string” MsgIdType=”xs:anyURI”>
    <muws2:Value>xs:anySimpleType</muws2:Value>*
    </muws2:SubstitutableMsg> ?
 </muws2:Situation>

Purpose

The purpose of this document is to describe how COSMOS, and by extension commercial vendors, can leverage standard interfaces to integrate with an existing Nagios sever via industry standard interfaces.

Scope

There are three areas where the standards supported and applied in the COSMOS project can help integrate existing management infrastructures.

  1. Standardized query interfaces for access to management data
  2. Integration through publication and subscription of events via standards based APIs in a standardized format
  3. Reporting and visualizations based on standard event format


Standardized Query Interfaces

The contribution of a CMDBf query service on top of a Nagios server will provide a standardized mechanism for querying the configuration items managed by Nagios. A CMDBf query service will also allow Nagios to participate in a federating CMDB environment. It will also make it easier to integrate multiple Nagios servers and/or commercial-based solutions under one infrastructure.


There are 10 different object types defined in Nagios:

1. Hosts
2. Host Groups
3. Services
4. Service Groups
5. Contacts
6. Contact Groups
7. Commands
8. Time Periods
9. Notification Escalations
10. Notification and Execution Dependencies

The first 6 object types are examples of configuration items that can be exposed via a CMDBf query service. Operational data such as the status of a host/service will not be exposed via the query service. This information will instead be published to a notification broker described in the next section.

Publication and Subscription of Nagios Events in a Standard Format

An existing Nagios infrastructure is typically setup to raise a set of events to the server. Each management products is forced into a pairwise integration if they would like to leverage information surfaced through an existing Nagios environment. As part of this enhancement a mechanism and set of best practices will be provided that enable existing Nagios implementations to surface events in standardized format using standardized APIs. In addition to loosely coupled integration, by adopting these standard WS based interfaces and management topics, commercial vendors may also provide value added event management systems.

Using this mechanism, Nagios can leverage WS-Notification to publish events on a set of topics to indicate the status of the monitored hosts and services. These events will be delivered to any client that subscribe for the published topics. Using standards, COSMOS can provide a framework to allow the publication and subscription of events in the context of web services. This provides a mechanism to for integration with higher level of management capabilities using commercial based offerings. See Use Cases for a concrete example.

Reporting and visualizations based on standard event format

COSMOS can generate BIRT reports based on events in the standard format. These reports will be generated from events reported to the COSMOS data managers. In addition, adopters can provide a custom report template that generates a report tailored to produce the information they need. This will help facilitate the growth of an ecosystem of reports that can be consumed by any management application that supports the standard event format.

Requirements

The following is a list of requirements that falls in the scope of the Nagios/COSMOS integration:

  1. Provide the capability of querying the configuration items of a Nagios server using the CMDBf query APIs
  2. Publish a topic space to the notification broker based on Nagios events being monitored
  3. Notify the notification broker when a situation related to a topic is reached
  4. Provide a set of effective reports in analyzing notification messages created by Nagios servers

Use Cases

The following use cases outline some of the typical tasks that COSMOS adopters/end-users will perform to accomplish an objective.

Use Case 1: Leveraging Federating CMDB with Nagios MDRs

A federating CMDB is not in the scope of the COSMOS project but a commercial vendor can register one with the COSMOS framework. This use case explains how multiple instances of Nagios servers (equipped with COSMOS MDR/plug-in code) can participate in the presence of a commercial-based federating CMDB.

  1. A federating CMDB is registered with the COSMOS framework
  2. The federating CMDB discovers the EPR of Nagios MDR registered with the COSMOS framework
  3. Using the pull-mode, the federating CMDB submits a CMDBf query to retrieve the configuration items managed by the Nagios servers
  4. The data retrieved from all servers are federated

This capability enables adopters to create aggregated reports or views on resources managed by multiple Nagios MDR. This is also possible with Nagios and a commercial management solution configuration. The only requirement is the registration of the commercial solution as an MDR.

The figure belows depicts a concrete example of Nagios instances participating in a federating CMDB environment. The two servers monitor cluster of nodes that overlap (see highlighted section). The federating CMDB can consolidate the data between the two servers to provide an aggregated view.


Nagios-cosmos-example3.png

Use Case 2: Retrieving the Configuration Items of a Nagios MDR

This use case explains the steps required by an end-user to visualize the configuration items/manageable resources that is being monitored by a Nagios server.

  1. User opens a browser and points to the URL of the COSMOS client
  2. User right clicks the Nagios item displayed under the data manager navigator and selects 'Submit CMDBf Query'
  3. A CMDBf query is submitted
  4. The generic XML viewer is opened with the response to the query

A view will be implemented to better visualize configuration items that conform to an SML model. Similar to the CMDBf query action, this option will be available through Nagios' context menu. Adopters can also leverage this view by conforming to the SML-based model defined in COSMOS.

Use Case 3: Subscription to Nagios Events

This use case is relevant to adopters who intend to use Nagios events to provide higher level management capabilities in COSMOS. The client subscribing to events can be a web service, data manager, or simply a standalone application that is capable of communicating with COSMOS framework. The steps below refer to the Nagios notification consumer as simply the client:

  1. Client contacts the management domain
  2. Client retrieves the broker(s) of the management domain
  3. Client retrieves the EPR of a desired Nagios server
  4. Client retrieves the notification broker using the management domain's API
  5. Using the EPR of the Nagios server, client retrieves the topic space published by the Nagios server
  6. Client subscribes to a set of topics from the topic space using the notification broker APIs

After the subscription, the client will be notified of any situations that correspond to the topics published by the Nagios server. The diagram below pictorially describes a concrete example of how notification events from Nagios servers can be consumed by a commercial offering.

Assume the existence of three data managers:

  • A commercial-based provisioning solution capable of deploying software to multiple nodes
  • A Nagios server monitoring a cluster of nodes
  • A second Nagios server monitoring a different cluster of nodes

The two Nagios servers are used to monitor the current patch level on a set of Windows nodes. The Nagios servers can publish a topic to the notification broker to describe the patch level The provisioning solution can then subscribe to this topic and deploy any necessary update when available. The notification broker is assumed to reside in the management domain and the events are assumed to be persisted by an incident manager:

Nagios-cosmos-example.png


Use Case 4: Generating Reports Based on WS-Notification Messages

As discussed in section "[Publication and Subscription of Nagios Events]", it's assumed that a data manager called "incident Manager" persists all notification messages reported to the notification broker. Just like any data manager, a set of associated reports can be used to visualize the events generated by multiple notification producers (e.g. Nagios server). This use case explains the steps required in generating an availability report on WS-Notifications.

  1. User opens a browser and points to the URL of the COSMOS client
  2. User right clicks the "Incident Manager" item displayed under the data manager navigator and selects 'Generate Report > Availability'.
  3. A report is generated and displayed based on notification messages persisted by the incident manager

This low coupling architecture provides the ability to generate reports on messages produced by completely different management solutions. An adopter desiring to use COSMOS notification reporting facility will only need to register as a data manager and a notification producer. The adopter will be able to reuse the same reports for as long as the message conforms to WSDM Event Format (WEF).

Implementation Detail

Integrating Nagios will span features across two sub-projects: Management Enablement and Data Visualization. The integration can be separated into three enhancements. Each enhancement indicates the subproject that it will reside in:

  1. Registering Nagios as an MDR (Management Enablement)
  2. Making Nagios a Notification Producer (Management Enablement)
  3. Reporting on WS-Notification Messages (Data Visualization)

This integration depends on an implementation of a notification broker and an incident manager. It's expected for the notification broker to conform to the WS-NotificationBroker standard. The incident manager is expected to persist messages disseminated by the the notification broker.

Registering Nagios as an MDR (Management Enablement)

This enhancement will be concerned with the following tasks:

  1. Providing a CMDBf query for retrieving configuration items of a Nagios server
  2. Mapping the configuration items to an SML model
  3. Registering the Nagios server as an MDR with the COSMOS framework
  4. Providing a hierarchical view to display the manageable resources of the Nagios server
  5. Registering the view with the Data Visualization framework

An effective implementation of this enhancement will allow a user to:

  1. View a configured Nagios server under the data manager navigator
  2. Allow the user to retrieve and view the manageable resources monitored by the Nagios server
  3. Provide the ability to submit CMDBf queries to Nagios servers

Making Nagios a Notification Producer (Management Enablement)

This enhancement will be concerned with the following tasks:

  1. Providing a Nagios plug-in to capture all notification messages
  2. Define a mapping from the notification message to a WSDM event format
  3. Publish a topic space related to the notification messages that can be produced
  4. Notify the notification broker of any situations that occur
  5. Provide a mechanism for adopters to extend the topic space published by Nagios servers

An effective implementation of this enhancement will allow an adopter to:

  1. Discover the Nagios topic space published to the notification broker
  2. Subscribe to the Nagios topics
  3. Receive notification on Nagios topics

Reporting on WS-Notification Messages (Data Visualization)

This enhancement will be concerned with the following task:

  1. Define a set of report templates that can be associated with the incident manager

An effective implementation of this enhancement will allow the user to:

  1. Generate reports based on events reported by Nagios server(s)

Open Issues/Questions

All reviewer feedback should go in the talk page for Nagios integration with COSMOS.



Back to the top