Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Nagios Integration with COSMOS"

(Use Case 1: Adding a machine to the asset database)
(Replacing page with 'This page has been moved to: COSMOS Design 188390')
 
(60 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Nagios Integration with COSMOS =
+
This page has been moved to: [[COSMOS Design 188390]]
 
+
== Change History ==
+
{|{{BMTableStyle}}
+
!align="left"|Name:
+
!align="left"|Date:
+
!align="left"|Revised Sections:
+
|-
+
|Ali Mehregani
+
|11/19/2007
+
|<ul><li>Initial version</li></ul>
+
|}
+
 
+
== Workload Estimation ==
+
 
+
{|{{BMTableStyle}}
+
|+{{BMTableCaptionStyle}}|Rough workload estimate in person weeks
+
|-{{BMTHStyle}}
+
! Process
+
! Sizing
+
! Names of people doing the work
+
|-
+
| align="left" | Design
+
|
+
|
+
|-
+
| align="left" | Code
+
|
+
|
+
|-
+
| align="left" | Test
+
|
+
|
+
|-
+
| align="left" | Documentation
+
|
+
|
+
|-
+
| align="left" | Build and infrastructure
+
|
+
|
+
|-
+
| align="left" | Code review, etc.*
+
|
+
|
+
|-
+
! align="right" | TOTAL
+
|
+
|
+
|}
+
 
+
== Terminologies/Acronyms ==
+
 
+
The terminologies/acronyms below are commonly used throughout this document.  The list below defines each term regarding how it is used in this document:
+
 
+
== Introduction ==
+
The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system".  The intention of the project is to apply an orderly and harmonious behavior to the world of system management.  Complementing standards such as CMDBf, SML, and Web2.0 technologies are making this vision a reality.  The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.
+
 
+
Inline with that vision, Nagios can help to not only mature the COSMOS framework but it can also provide out-of-the-box value to COSMOS users.  This two-folded advantage has many positive implications:
+
 
+
# It is a step forward to evolving an open source code base to a framework that is usable in a production environment
+
# Makes the COSMOS project an attractable solution that provides value by its own
+
# Simplifies integration of proprietary solutions with Nagios, and
+
# Demonstrates a working example of a well-established system management application in COSMOS framework
+
 
+
The next section provides more detail about Nagios.
+
 
+
=== What is Nagios? ===
+
Nagios is a system and network monitoring application that is capable of detecting and notifying abnormal behavior.  The definition and monitoring behavior is defined by administrators using a set of flat-file configurations.  The files indicate <b>what</b> and <b>how</b> things should be monitored.  There are three primary atomic entities in Nagios:
+
 
+
<b>Host</b> - A physical device on a network that is intended to be monitored (e.g. a desktop, printer, router, switch, hub, etc...).
+
<b>Service</b> - Indicates the specific component of a host that should be monitored (e.g. CPU utilization, memory consumption, HTTP, etc...)
+
<b>Command</b> - A utility that allows for a service check.  For example, check_CPU can be a command used to monitor the CPU utilization on a particular host.
+
 
+
An administrator is required to define hosts, services, and commands to effectively monitor a set of resources.  The actual monitoring of a host/service is not done by Nagios.  The monitoring is done by add-on plug-ins that are defined as individual commands.  This architecture provides the capability to virtually monitor any aspect of a system that can be automated.  There are already many available plug-ins for monitoring common hosts/services in a typical networking environment.  Where limited, administrators can write their own plug-in to accomplish the monitoring of an uncommon host/service.
+
 
+
Nagios itself runs on Linux but it is capable of monitoring desktops running Windows via its plug-in architecture.  As part of its monitoring solution, Nagios also provides an alerting mechanism that broadcasts a problem to sets of contacts or contact groups.  A notification handler can also be registered to take certain actions based on incoming events (e.g. storing status information in an RDBMS).  The diagram below, extracted from Nagios documentation&#185;, pictorially depicts the components of Nagios:
+
 
+
 
+
[[Image:nagios-architecture.png]]
+
 
+
 
+
There is also a web-based UI included that provides reporting and limited administration capabilities.  An screen shot of the Nagios web-based UI is included below.  The next section describes the scope and the value of this enhancement.
+
 
+
 
+
[[Image:nagios.png]]
+
 
+
== Purpose ==
+
The purpose of this document is to outline the initial effort in bringing Nagios closer to COSMOS.  The integration points and their related value to the Nagios and COSMOS user base will be covered by subsequent sections.
+
 
+
=== Scope ===
+
There are a number of areas where COSMOS can add value to Nagios.  The areas can be summarized into three categories:
+
 
+
#Data Manager Integration
+
#Administration Capabilities
+
#Reporting
+
 
+
==== Data Manager Integration ====
+
The task of defining the required objects in Nagios is cumbersome, time-consuming, and error-prone.  It's usually the case that information required to define objects is stored in other data stores.  For example, a subset of configuration items stored in a CMDB can typically serve as the hosts that an administrator may want to monitor.  It could also be the case where host information is stored under an asset database.
+
 
+
COSMOS can significantly ease the task of defining objects by providing integration points between data managers and Nagios.  The ability of defining objects can be as simple as dragging and dropping a set of queried items from a data manager into the Nagios data collector.  There are 10 different object types defined in Nagios:
+
 
+
1. Hosts
+
2. Host Groups
+
3. Services
+
4. Service Groups
+
5. Contacts
+
6. Contact Groups
+
7. Commands
+
8. Time Periods
+
9. Notification Escalations
+
10. Notification and Execution Dependencies
+
 
+
It is often the case where hosts, services, and contacts are defined in other data stores.  COSMOS can use CMDBf to define a seamless integration between where the objects are stored and the Nagion monitoring framework.  See [http://wiki.eclipse.org/Nagios_Integration_with_COSMOS#Use_Cases use cases] and [http://wiki.eclipse.org/Nagios_Integration_with_COSMOS#Implementation_Detail implementation detail] for more information.
+
 
+
 
+
Another area where Nagios can be integrated with COSMOS in the context of data manager integration is via the Common Base Event (CBE) database.  As part of illustrating the framework COSMOS includes a CBE data manager that stores logging events in the form of CBEs.  Nagios keeps track of host/service status, alerts, and notifications by logging events.  The events are rotationally logged in flat files.  COSMOS can contribute a Nagios plug-in that will log events to the CBE data manager for persistent record keeping.  The list below is a number of advantages for providing this integration:
+
 
+
# Events are logged in a common format
+
# Exiting reporting capabilities can be reused
+
# Other monitoring tools that have similar characteristics can also log events to a CBE data manager
+
# Events are persisted in a database instead of a flat file structure
+
 
+
==== Administration Capabilities ====
+
This area of integration ties very closely to the previous one.  Once the user selects a set of hosts, services, and/or contacts from other data stores, the Nagios configuration needs to be changed to reflect the new resources being monitored.  There is a set of administration tasks that need to be provided for a system administrator to effectively work with the Nagios framework.  At a minimum the following administration tasks will be provided:
+
 
+
* Defining objects (hosts, services, contacts, time periods, etc...).  The definition of objects may originate from a data store.
+
* Enable/disable host checks
+
* Enable/disable service checks
+
* Restarting the Nagios process
+
 
+
Future releases of COSMOS may consider the following administration tasks:
+
 
+
* Automatic deployment of agents to machines
+
* Service discovery on selected hosts
+
* Controlling notifications
+
 
+
Where possible, common administration tasks need to be reusable by other monitoring tools.
+
 
+
==== Reporting ====
+
The Nagios web-based UI is far from sleek or modern.  It provides a few primitive reporting capabilities:
+
 
+
* Trends
+
* Availability
+
* Alert Histogram
+
* Alert History
+
* Alert Summary
+
* Notifications
+
* Event Log
+
 
+
A BIRT integration with Nagios under the COSMOS framework can significantly add value by providing an extensible mechanism for generating customized reports.  Consumers can significantly benefit by tunning reports to target a specific audience set.  The reports contributed by COSMOS will use the CBE database as its source but this does not prevent anyone from contributing reports that use Nagios logging information as its data source.  Consumers can also redirect Nagios events to a proprietary database registered as a data manager with customized reporting. 
+
 
+
COSMOS will need to provide a set of report templates as exemplars of the reporting capability that can be added on top of Nagios-based data.  The detail of the reports types will be covered in [http://wiki.eclipse.org/index.php?title=Nagios_Integration_with_COSMOS#Implementation_Detail implementation detail].
+
 
+
== Requirements ==
+
The following is a list of requirements that falls in the scope of the Nagios/COSMOS integration:
+
 
+
# The ability to write resource information to the SML repository via the client
+
# Viewing Nagios as a data manager in COSMOS framework
+
# Initiate and control the monitoring of resources via the client
+
# Generating reports on Nagios based events
+
# Viewing Nagios resources and their status
+
# An employee database registered as a data manager
+
 
+
== Use Cases ==
+
The following use cases outline some of the typical tasks that COSMOS users will perform to accomplish an objective.
+
 
+
=== Use Case 1: Adding a machine to the asset database ===
+
 
+
# User opens a browser and points to the URL of the COSMOS client
+
# User right clicks the asset database and selects 'Define Object'.  A form is displayed under the details pane for the user to populate the required fields.  The form can contain multiple pages that cleanly break down the flow of user actions.
+
# The fields are populated and the 'Finish' button is pressed.
+
# Client will indicate that it is writing the data to the database.  The user is either prompted with an error message or a confirmation message to indicate success.  In case of user error, the form returned so that it can be corrected with the required input.
+
 
+
=== Use Case 2: Monitoring of a host stored in the asset database ===
+
 
+
=== Use Case 3: Monitoring of a service defined with Nagios ===
+
 
+
=== Use Case 4: Viewing the status of hosts being monitored ===
+
 
+
=== Use Case 5: Generating reports based on Nagios events ===
+
 
+
=== Use Case 6: Removing a host being monitored ===
+
 
+
=== Use Case 7: Determining the problem of a host ===
+
 
+
== Implementation Detail ==
+
 
+
 
+
== Test Coverage ==
+
 
+
 
+
== Task Breakdown ==
+
 
+
== Open Issues/Questions ==
+

Latest revision as of 17:13, 27 November 2007

This page has been moved to: COSMOS Design 188390

Back to the top