Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Nagios Integration with COSMOS"

(Reporting)
(Reporting)
Line 138: Line 138:
 
Where possible, common administration tasks need to be reusable by other monitoring tools.
 
Where possible, common administration tasks need to be reusable by other monitoring tools.
  
==== Reporting ====
+
 
 +
 
 
==== Reporting ====
 
==== Reporting ====
 
The Nagios web-based UI is far from sleek or modern.  It provides a few primitive reporting capabilities:
 
The Nagios web-based UI is far from sleek or modern.  It provides a few primitive reporting capabilities:

Revision as of 14:51, 20 November 2007

Nagios Integration with COSMOS

Change History

Name: Date: Revised Sections:
Ali Mehregani 11/19/2007
  • Initial version

Workload Estimation

Rough workload estimate in person weeks
Process Sizing Names of people doing the work
Design
Code
Test
Documentation
Build and infrastructure
Code review, etc.*
TOTAL

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document. The list below defines each term regarding how it is used in this document:

Introduction

The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system". The intention of the project is to apply an orderly and harmonious behavior to the world of system management. Complementing standards such as CMDBf, SML, and Web2.0 technologies are making this vision a reality. The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.

Inline with that vision, Nagios can help to not only mature the COSMOS framework but it can also provide out-of-the-box value to COSMOS users. This two-folded advantage has many positive implications:

  1. It is a step forward to evolving an open source code base to a framework that is usable in a production environment
  2. Makes the COSMOS project an attractable solution that provides value by its own
  3. Simplifies integration of proprietary solutions with Nagios, and
  4. Demonstrates a working example of a well-established system management application in COSMOS framework

The next section provides more detail about Nagios.

What is Nagios?

Nagios is a system and network monitoring application that is capable of detecting and notifying abnormal behavior. The definition and monitoring behavior is defined by administrators using a set of flat-file configurations. The files indicate what and how things should be monitored. There are three primary atomic entities in Nagios:

Host - A physical device on a network that is intended to be monitored (e.g. a desktop, printer, router, switch, hub, etc...).
Service - Indicates the specific component of a host that should be monitored (e.g. CPU utilization, memory consumption, HTTP, etc...)
Command - A utility that allows for a service check.  For example, check_CPU can be a command used to monitor the CPU utilization on a particular host.

An administrator is required to define hosts, services, and commands to effectively monitor a set of resources. The actual monitoring of a host/service is not done by Nagios. The monitoring is done by add-on plug-ins that are defined as individual commands. This architecture provides the capability to virtually monitor any aspect of a system that can be automated. There are already many available plug-ins for monitoring common hosts/services in a typical networking environment. Where limited, administrators can write their own plug-in to accomplish the monitoring of an uncommon host/service.

Nagios itself runs on Linux but it is capable of monitoring desktops running Windows via its plug-in architecture. As part of its monitoring solution, Nagios also provides an alerting mechanism that broadcasts a problem to sets of contacts or contact groups. A notification handler can also be registered to take certain actions based on incoming events (e.g. storing status information in an RDBMS). The diagram below, extracted from Nagios documentation¹, pictorially depicts the components of Nagios:


Nagios-architecture.png


There is also a web-based UI included that provides reporting and limited administration capabilities. An screen shot of the Nagios web-based UI is included below. The next section describes the scope and the value of this enhancement.


Nagios.png

Purpose

The purpose of this document is to outline the initial effort in bringing Nagios closer to COSMOS. The integration points and their related value to the Nagios and COSMOS user base will be covered by subsequent sections.

Scope

There are a number of areas where COSMOS can add value to Nagios. The areas can be summarized into three categories:

  1. Data Manager Integration
  2. Administration Capabilities
  3. Reporting

Data Manager Integration

The task of defining the required objects in Nagios is cumbersome, time-consuming, and error-prone. It's usually the case that information required to define objects is stored in other data stores. For example, a subset of configuration items stored in a CMDB can typically serve as the hosts that an administrator may want to monitor. It could also be the case where host information is stored under an asset database.

COSMOS can significantly ease the task of defining objects by providing integration points between data managers and Nagios. The ability of defining objects can be as simple as dragging and dropping a set of queried items from a data manager into the Nagios data collector. There are 10 different object types defined in Nagios:

1. Hosts
2. Host Groups
3. Services
4. Service Groups
5. Contacts
6. Contact Groups
7. Commands
8. Time Periods
9. Notification Escalations
10. Notification and Execution Dependencies

It is often the case where hosts, services, and contacts are defined in other data stores. COSMOS can use CMDBf to define a seamless integration between where the objects are stored and the Nagion monitoring framework. See use cases and implementation detail for more information.


Another area where Nagios can be integrated with COSMOS in the context of data manager integration is via the Common Base Event (CBE) database. As part of illustrating the framework COSMOS includes a CBE data manager that stores logging events in the form of CBEs. Nagios keeps track of host/service status, alerts, and notifications by logging events. The events are rotationally logged in flat files. COSMOS can contribute a Nagios plug-in that will log events to the CBE data manager for persistent record keeping. The list below is a number of advantages for providing this integration:

  1. Events are logged in a common format
  2. Exiting reporting capabilities can be reused
  3. Other monitoring tools that have similar characteristics can also log events to a CBE data manager
  4. Events are persisted in a database instead of a flat file structure

Administration Capabilities

This area of integration ties very closely to the previous one. Once the user selects a set of hosts, services, and/or contacts from other data stores, the Nagios configuration needs to be changed to reflect the new resources being monitored. There is a set of administration tasks that need to be provided for a system administrator to effectively work with the Nagios framework. At a minimum the following administration tasks will be provided:

  • Defining objects (hosts, services, contacts, time periods, etc...). The definition of objects may originate from a data store.
  • Enable/disable host checks
  • Enable/disable service checks
  • Restarting the Nagios process

Future releases of COSMOS may consider the following administration tasks:

  • Automatic deployment of agents to machines
  • Service discovery on selected hosts
  • Controlling notifications

Where possible, common administration tasks need to be reusable by other monitoring tools.


Reporting

The Nagios web-based UI is far from sleek or modern. It provides a few primitive reporting capabilities:

  • Trends
  • Availability
  • Alert Histogram
  • Alert History
  • Alert Summary
  • Notifications
  • Event Log

A BIRT integration with Nagios under the COSMOS framework can significantly add value by providing an extensible mechanism for generating customized reports. Consumers of Nagios can significantly benefit by tunning reports to target a specific audience set. The reports contributed by COSMOS will use the CBE database as its source but this does not prevent anyone from contributing reports that use Nagios logging information as its data source. Consumers can also redirect Nagios events to a proprietary database registered as a data manager with customized reporting.

COSMOS will need to provide a set of report templates as exemplars of the reporting capability that can be added on top of Nagios-based data. The detail of the reports types will be covered in implementation detail.

Requirements

Use Cases

Implementation Detail

Test Coverage

Task Breakdown

Open Issues/Questions

Back to the top