Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Nagios Integration with COSMOS"

(Purpose)
(Replacing page with 'This page has been moved to: COSMOS Design 188390')
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
= Nagios Integration with COSMOS =
+
This page has been moved to: [[COSMOS Design 188390]]
 
+
== Change History ==
+
{|{{BMTableStyle}}
+
!align="left"|Name:
+
!align="left"|Date:
+
!align="left"|Revised Sections:
+
|-
+
|Ali Mehregani
+
|11/19/2007
+
|<ul><li>Initial version</li></ul>
+
|-
+
|Ali Mehregani
+
|11/27/2007
+
|<ul><li>Modified based on Mark Weitzel and Valentina Popescu's comments</li></ul>
+
|}
+
 
+
== Workload Estimation ==
+
 
+
{|{{BMTableStyle}}
+
|+{{BMTableCaptionStyle}}|Rough workload estimate in ONE person week
+
|-{{BMTHStyle}}
+
! Process
+
! Sizing
+
! Names of people doing the work
+
|-
+
| align="left" | Design
+
| 15
+
|
+
|-
+
| align="left" | Code
+
| 25
+
|
+
|-
+
| align="left" | Test
+
| 15
+
|
+
|-
+
| align="left" | Documentation
+
| 2
+
|
+
|-
+
| align="left" | Build and infrastructure
+
| 1
+
|
+
|-
+
| align="left" | Code review, etc.*
+
| 2
+
|
+
|-
+
! align="right" | TOTAL
+
| 60
+
|
+
|}
+
 
+
== Terminologies/Acronyms ==
+
 
+
The terminologies/acronyms below are commonly used throughout this document.  The list below defines each term regarding how it is used in this document:
+
 
+
{|{{BMTableStyle}}
+
|-{{BMTHStyle}}
+
! Term
+
! Definition
+
|-
+
|MDR
+
|Management Data Repository
+
|-
+
|CMDBf
+
|Specification for a CMDB that federates between multiple MDRs
+
|-
+
|CMDB
+
|Configuration Management Database
+
|-
+
|CBE
+
|Common Base Event - A standard that defines a common format for logging
+
|-
+
|SML
+
|Service Modeling Language - An XML based language used for modeling
+
|-
+
|SML Model
+
|A set of SML compliant resources
+
|-
+
|SML Repository
+
|The SML Repository describes any SML model together with a set of COSMOS API used to add new SML resources to the SML model and to query the SML model.
+
|-
+
|CMDBf query
+
|MDRs make data available via a query service defined in the CMDBf specification. The input and output of a CMDBf query is a structured XML document described in the specification.
+
|-
+
|Host
+
|A host in Nagios terms is any entity on a network that can be monitored (e.g. desktop, router, printer, etc...)
+
|-
+
|Host check
+
|A host check in Nagios corresponds to running a command that will indicate the status of a host
+
|-
+
|Service
+
|There are two types of services that can be monitored by Nagios on a host: public and private.  Examples of public services are HTTP, FTP, POP3, SSH, etc... and examples of private services are CPU utilization, memory consumption, disk space, power consumption, etc...
+
|-
+
|Service check
+
|Analogous to host check, a service check involves running a command that will check the status of a service
+
|-
+
|Command
+
|A Nagios command is either a shell executable or a Perl script that performs a specific task (e.g. host/service check)
+
|}
+
 
+
== Introduction ==
+
The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system".  The intention of the project is to apply the same concept to the world of system management.  Complementing standards such as CMDBf, SML, WS-Notification, and Web2.0 technologies are making this vision a reality.  The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.
+
 
+
Inline with that vision, Nagios can help to illustrate how standards deliver value to an open source community and adopters that intend to provide higher level management capabilities on top of COSMOS.  Industry standards can help to integrate commercial solutions with well established monitoring infrastructures such as Nagios.  The end goal of this integration effort is to develop a framework around WS-Notification and the CMDBf specification using Nagios as an exemplary consumer.
+
 
+
 
+
The next two sections give a brief overview of Nagios and WS-Notification.
+
 
+
=== What is Nagios? ===
+
Nagios is a system and network monitoring application that is capable of detecting and notifying abnormal behavior.  The definition and monitoring behavior is defined by administrators using a set of flat-file configurations.  The files indicate <b>what</b> and <b>how</b> things should be monitored.  There are three primary atomic entities in Nagios:<br><br>
+
 
+
{|{{BMTableStyle}}
+
|'''Host'''
+
|A physical device on a network that is intended to be monitored (e.g. a desktop, printer, router, switch, hub, etc...).
+
|-
+
|'''Service'''
+
|Indicates the specific component of a host that should be monitored (e.g. CPU utilization, memory consumption, HTTP, etc...)
+
|-
+
|'''Command'''
+
|A utility that allows for a host/service check, notification handling, alerts, etc....  For example, check_CPU can be a command used to monitor the CPU utilization on a particular host.
+
|}
+
 
+
<br>An administrator is required to define hosts, services, and commands to effectively monitor a set of resources.  The actual monitoring of a host/service is not done by Nagios.  It is instead done by add-on plug-ins that are defined as individual commands.  This architecture provides the capability to virtually monitor any aspect of a system that can be automated.  There are already many available plug-ins for monitoring common hosts/services in a typical networking environment.  Where limited, administrators can write their own plug-in to accomplish the monitoring of an uncommon host/service.  The data collected from plug-ins are logged to flat files.  Nagios itself doesn't persistent events to a database but plug-ins are available to direct events to an RDBMS such as MySQL.
+
 
+
The Nagios service runs on Linux but it is capable of monitoring desktops running Windows via its plug-in architecture.  As part of its monitoring solution, Nagios also provides an alerting mechanism that broadcasts a problem to sets of contacts or contact groups.  A notification handler can also be registered to take certain actions based on incoming events (e.g. storing status information in an RDBMS).  The diagram below, extracted from [http://nagios.sourceforge.net/docs/nagios-3.pdf Nagios documentation], pictorially depicts the components:
+
 
+
[[Image:nagios-architecture.png]]
+
 
+
<br>There is also a web-based UI included that provides reporting and limited administration capabilities.  A screen shot of the Nagios web-based UI is included below.  The next section describes the scope and the value of this enhancement.<br>
+
 
+
 
+
[[Image:nagios.png]]
+
 
+
<br>See [http://nagios.sourceforge.net/docs/nagios-3.pdf Nagios user guide] to find out more about its capabilities.
+
 
+
 
+
=== What is WS-Notification? ===
+
 
+
WS-Notification is an umbrella for a set of specifications that describe the publishing and subscription of events in the context of Web services.  There are three specifications that fall under WS-Notification:
+
 
+
# [http://docs.oasis-open.org/wsn/wsn-ws_base_notification-1.3-spec-os.pdf WS-BaseNotification]
+
# [http://docs.oasis-open.org/wsn/wsn-ws_brokered_notification-1.3-spec-os.pdf WS-BrokeredNotification]
+
# [http://docs.oasis-open.org/wsn/wsn-ws_topics-1.3-spec-os.pdf WS-Topics]
+
 
+
The first specification is used to describe the basic interfaces and calls required by notification producers and consumers, the second specification describes a middle tier between a producer and a consumer, and finally the third specification describes the structure of topics for publishing and subscription.
+
 
+
COSMOS intends to provide a notification broker as part of its framework for publication and subscription of events.  The notification broker should not be confused with the broker that resides in the management domain.  They are separate components with different functionalities.  There is a separate enhancement under development for the notification broker and its implementation detail will not be included in this document.
+
 
+
The following is a list of terminologies commonly used in the context of WS-Notification:
+
 
+
{|{{BMTableStyle}}
+
|-{{BMTHStyle}}
+
! Term
+
! Definition
+
|-
+
|Notification Producer
+
|An entity that creates a notification message
+
|-
+
|Notification Consumer
+
|An entity receiving a notification message
+
|-
+
|Subscription
+
|The act of advertising the interest for listening on a set of topics
+
|-
+
|Publishing
+
|The act of advertising the interest for producing notification on a set of topics
+
|-
+
|Topic
+
|Topics are used to categorize the notification messages produced.  Topics can be defined in the form of hierarchies
+
|-
+
|Topic Space
+
|A forest of topic trees (i.e. sets of topic tress)
+
|}
+
 
+
== Purpose ==
+
The purpose of this document is to describe how COSMOS, and by extension commercial vendors, can leverage an existing installation of Nagios via industry standard interfaces.
+
 
+
=== Scope ===
+
There are a number of areas where COSMOS can add value to Nagios.  The areas can be summarized into three categories:
+
 
+
# Standardized Query Capability
+
# Publication and Subscription of Nagios Events
+
# Reporting on WS-Notifications
+
 
+
==== Standardized Query Capability ====
+
The contribution of a CMDBf query service on top of a Nagios server will provide a standardized mechanism for querying the configuration items managed by Nagios.  A CMDBf query service will also allow Nagios to participate in a federating CMDB environment.  A well-known query service will make it easier to integrate multiple Nagios servers and/or commercial-based solutions under one infrastructure.
+
 
+
There are 10 different object types defined in Nagios:
+
 
+
1. Hosts
+
2. Host Groups
+
3. Services
+
4. Service Groups
+
5. Contacts
+
6. Contact Groups
+
7. Commands
+
8. Time Periods
+
9. Notification Escalations
+
10. Notification and Execution Dependencies
+
 
+
The first 6 object types are examples of configuration items that can be exposed via a CMDBf query service.  Operational data such as the status of a host/service will not be exposed via the query service.  This information will instead be published to a notification broker described in the next section.
+
 
+
==== Publication and Subscription of Nagios Events ====
+
Assuming the existence of a notification broker in COSMOS, Nagios can publish a set of topics to indicate the status of the monitored hosts and services.  The notification broker can disseminate messages to any client that subscribe for the published topics.  This mechanism will provide the ability for clients to process events generated from multiple monitoring solutions.  Consider the following example:
+
 
+
Assume the existence of three data managers:
+
* A provisioning solution capable of deploying software to multiple nodes
+
* A Nagios server monitoring a cluster of nodes
+
* A second Nagios server monitoring a different cluster of nodes
+
 
+
Also assume that the two Nagios servers are used to monitor the current patch level on a set of Windows nodes.  The Nagios servers can publish a topic to the notification broker to describe the patch level on each Windows nodes.  The provisioning solution can then subscribe to this topic and deploy any necessary update to a Windows node.  The figure below depicts the example:
+
 
+
 
+
 
+
 
+
==== Reporting ====
+
The Nagios web-based UI is far from sleek or modern.  It provides a few primitive reporting capabilities:
+
 
+
* Trends
+
* Availability
+
* Alert Histogram
+
* Alert History
+
* Alert Summary
+
* Notifications
+
* Event Log
+
 
+
A BIRT integration with Nagios under the COSMOS framework can significantly add value by providing an extensible mechanism for generating customized reports.  Consumers can benefit by tunning reports to target a specific audience set.  The reports contributed by COSMOS will use the CBE database as its source but this does not prevent anyone from contributing reports that use Nagios logging information as its data source.  Consumers can also redirect Nagios events to a proprietary database registered as a data manager with customized reporting. 
+
 
+
COSMOS will need to provide a set of report templates as exemplars of the reporting capability that can be added on top of Nagios-based data.  See [[Nagios Integration with COSMOS#Implementation Detail|implementation detail]] for more information.
+
 
+
== Requirements ==
+
The following is a list of requirements that falls in the scope of the Nagios/COSMOS integration:
+
 
+
# Writing resource information to the SML repository via the client
+
# Viewing Nagios as a data manager in COSMOS framework
+
# Initiating and controlling the monitoring of resources via the client
+
# Generating reports on Nagios based events
+
# Viewing Nagios resources and their status
+
# Storing Nagios events in a CBE database
+
# Registering an employee database as a data manager
+
# Registering a configuration database as a data manager
+
 
+
== Use Cases ==
+
The following use cases outline some of the typical tasks that COSMOS users will perform to accomplish an objective.
+
 
+
=== Use Case 1: Adding a machine to the asset database ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the asset repository MDR
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the asset database and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
+
# The fields are populated and the 'Finish' button is pressed.
+
# Client will indicate that it is writing the data to the database.  The user is either prompted with an error message or a confirmation message to indicate success.  In case of user error, the form is returned to be corrected.
+
 
+
=== Use Case 2: Monitoring of a host in Nagios ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the asset repository MDR and the Nagios data collector.
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
+
# User selects the type of object that is to be defined, which in this case happens to be a host.  The form is populated by all items from registered data managers that are candidates of the object type selected.  In this case all resources that are candidates of being monitored are queried from the asset database and displayed in the details pane.
+
# Where possible, items of corresponding data stores are displayed to make it easier for a field to be populated.  For example, the items of an employee database can be displayed for the contact/contact group fields of a host definition.
+
# The user either has the option of selecting a discovered host or populating the fields manually.  In the case where a discovered host is selected, the fields of the form are populated based on the selected host.
+
# User clicks 'Finish' to finalize the process of initiating the monitoring process of the host
+
 
+
=== Use Case 3: Monitoring of a service defined with Nagios ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the asset repository MDR and the Nagios data collector.
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
+
# User selects the type of object that is to be defined, which in this case happens to be a service.  The form is populated by all available services on the associated host.  For example, a configuration database can be queried by the client to retrieve all available services on an associated host.
+
# Where possible items of corresponding data stores are displayed to make it easier for a field to be populated.  For example, the items of an employee database can be displayed for the contact/contact group fields of a host definition.
+
# The user either has the option of selecting a discovered service or populate the fields manually.  In the case where a discovered service is selected, the fields of the form are populated based on the selected service.
+
# User clicks 'Finish' to finalize the process of initiating the monitoring process of the service
+
 
+
=== Use Case 4: Viewing the status of hosts/services being monitored ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.  It's assumed that one or more host/service is configured for monitoring.
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Display Monitoring Resources'.  A tree of hosts and services are displayed in the details pane with corresponding icons that indicate the last status check of a host/service.  See use case 5 for details about finding more information about a failed host/service.
+
 
+
=== Use Case 5: Determining the problem associating with a host/service ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.  It's assumed that one or more host/service is configured for monitoring and at least one host/service is down.  The context of the use case is the status navigation tree.
+
 
+
# User right clicks a host/service that is indicated to be down and selects 'Display Information'
+
# A BIRT report is generated to display the events of the selected host/service that led to its downtime.
+
 
+
=== Use Case 6: Generating reports based on host availability ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the Nagios data collector. 
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Generate Report > Availability'.  A BIRT report is generated and displayed to show the general availability of hosts and services being monitored.
+
 
+
=== Use Case 7: Removing a host being monitored ===
+
 
+
Assumption: The COSMOS framework is successfully installed with the Nagios data collector.  It's assumed that one or more host/service is configured for monitoring.
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the Nagios item displayed under the data manager navigator and selects 'Display Monitoring Resources'.  A tree of hosts and services are displayed in the details pane.
+
# User right clicks a host and selects 'Remove'.  The user is prompted with a message that indicates the implication of removing a host.
+
# The user clicks OK to proceed.  The host is removed from Nagios and the status navigation tree is updated to reflect the user action.
+
 
+
== Implementation Detail ==
+
 
+
Integrating Nagios will span features across three sub-projects: Data Collection, Resource Modeling, and Data Visualization.  The features can be categorized into three different areas:
+
 
+
<b>Nagios Data Manager Package
+
Resource Modeling MDRs
+
Nagios Client</b>
+
 
+
The diagram below displays the interaction between the COSMOS and Nagios components:
+
[[Image:nagios-cosmos.png]]
+
 
+
 
+
=== Nagios Data Manager Package ===
+
A package will need to be included to register the Nagios server as a data manager with COSMOS framework.  The package will need to:
+
 
+
# Discover the management domain and register itself as a data manager with the advertised brokers
+
# Discover the CBE data manager to determine its end point.  This may be code that needs to run periodically until the CBE repository registers itself with the management domain.  Keep in mind that there is no ordering of how data managers are registered.  There is always the possibility of the CBE data manager registering itself after the Nagios data manager.
+
# After discovering the CBE data manager, activate a Nagios plug-in that will redirect all events to the CBE repository.  This step will require Nagios to be reconfigured and restarted.  The user will need to be prompted before restarting the Nagios process.
+
 
+
The Nagios packaging code base is expected to be checked into the Data Collection subproject.
+
 
+
=== Resource Modeling MDRs ===
+
As part of illustrating the seamless integration of multiple MDRs with a system monitoring application, there will be two additional MDRs added to the Resource Modeling subproject.  The asset repository will also need to be modified to ensure a smooth integration.  The two new MDRs will be:
+
 
+
# Configuration MDR - contains configuration detail about what is stored and running on a host
+
# Employee MDR - contains information about staff members
+
 
+
The first database will be used to discover services that can be monitored on a specific host and the second database will be used to display a list of employees that can be included in the contact list of a host or service definition.  Both MDRs are expected to be implemented on top of the SML repository which already includes a CMDBf query capability.  The SML repository code will need to be refactored to extract out any code that is specific to the asset repository.
+
 
+
=== Nagios Client ===
+
The data visualization subproject is expected to contribute the following functionalities:
+
 
+
# The ability to define hosts and services
+
# The ability to generate reports on Nagios events, notifications, alerts, and etc...
+
# The ability to write objects to the asset, configuration, and employee MDRs
+
# Nagios specific views to visualize the status of the monitored objects
+
 
+
 
+
 
+
== Task Breakdown ==
+
The following section breaks down each individual task based on subproject.  Symbols are used to indicate the enhancement that each work item falls under.
+
 
+
=== Resource Modeling ===
+
&#x3A6; Refactor any code necessary to provide write capability to the asset repository <br/>
+
&#x3A6; Refactor the data center SML model to make it fit better with the resource model that Nagios uses <br/>
+
&#x3A6; Refactor the CMDBf query code for the asset based repository to provide any additional queries that the client will need <br/>
+
&#x3A6; Provide a model mapping from the asset model to the Nagios model <br/>
+
&#x3A8; Refactor the SML repository code to provide a common plug-in that multiple SML based repositories can use <br/>
+
&#x3A8; Provide an employee based model using SML <br/>
+
&#x3A8; Extend the SML repository code to provide an employee database <br/>
+
&#x3A8; Provide a CMDBf query implementation for the employee database <br/>
+
&#x3A8; Provide a model mapping from the employee model to the Nagios model <br/>
+
&#x3A9; Provide a configuration based model using SML  <br/>
+
&#x3A9; Extend the SML repository code to provide a configuration database <br/>
+
&#x3A9; Provide a CMDBf query implementation for the configuration database <br/>
+
&#x3A9; Provide a model mapping from the configuration model to the Nagios model <br/>
+
&#x3B2; Use the programming model to plug-in the employee MDR into COSMOS framework <br/>
+
&#x3B2; Use the programming model to plug-in the configuration MDR into COSMOS framework <br/>
+
 
+
 
+
<b>Enhancements:</b> <br/>
+
&#x3A6; [Nagios]Generalize the asset repository and the data center model <br/>
+
&#x3A8; [Nagios]Provide an employee MDR based on the SML repository <br/>
+
&#x3A9; [Nagios]Provide a configuration MDR based on the SML repository <br/>
+
&#x3B2; [Nagios]Add the employee and configuration MDRs to the COSMOS framework <br/>
+
 
+
=== Data Collection ===
+
&#x3A6; Define a mapping between Nagios events and CBE events <br/>
+
&#x3A6; Provide a Nagios plug-in to forward events to the CBE data manager <br/>
+
&#x3B2; Provide a mechanism to register a Nagios server as a data manager <br/>
+
&#x3B2; Provide administrative capabilities that the client can invoke <br/>
+
 
+
 
+
<b>Enhancements:</b> <br/>
+
&#x3A6; [Nagios]Provide a Nagios plug-in to log events as CBEs to the CBE data manager <br/>
+
&#x3B2; [Nagios]Register a Nagios monitoring server as a data manager <br/>
+
 
+
=== Data Visualization ===
+
&#x3A6; Provide actions to write to the asset MDR <br/>
+
&#x3A6; Provide the forms necessary to write data to the asset MDR <br/>
+
&#x3A6; Provide actions to write to the employee MDR <br/>
+
&#x3A6; Provide the forms necessary to write data to the employee MDR <br/>
+
&#x3A6; Provide actions to write to the configuration MDR <br/>
+
&#x3A6; Provide the forms necessary to write data to the configuration MDR <br/>
+
&#x3A8; Provide actions to define objects on the Nagios data manager <br/>
+
&#x3A8; Provide the forms necessary to define the Nagios objects <br/>
+
&#x3A8; Provide actions to perform administrative tasks on the Nagios data manager <br/>
+
&#x3A8; Define a framework that allows for an MDR to be replaced/added as part of defining objects for Nagios <br/>
+
&#x3A9; Provide a navigator that displays the status of hosts and services monitored on Nagios <br/>
+
&#x3B2; Provide reporting capabilities for viewing host/service events <br/>
+
&#x3B2; Provide two general reporting capabilities on Nagios events (e.g. availability and alert history) <br/>
+
 
+
 
+
 
+
<b>Enhancements:</b> <br/>
+
&#x3A6; [Nagios]Provide write actions and forms for the asset, configuration, and employee MDR <br/>
+
&#x3A8; [Nagios]Provide actions and forms for defining Nagios objects <br/>
+
&#x3A9; [Nagios] Provide a status navigator for the Nagios data manager <br/>
+
&#x3B2; [Nagios] Provide reporting capabilities for the Nagios events <br/>
+
 
+
== Open Issues/Questions ==
+
All reviewer feedback should go in the [[Talk:Nagios_Integration_with_COSMOS|talk page]] for Nagios integration with COSMOS.
+

Latest revision as of 17:13, 27 November 2007

This page has been moved to: COSMOS Design 188390

Back to the top