Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Nagios Integration with COSMOS

Revision as of 12:01, 20 November 2007 by Unnamed Poltroon (Talk) (Data Manager Integration)

Nagios Integration with COSMOS

Change History

Name: Date: Revised Sections:
Ali Mehregani 11/19/2007
  • Initial version

Workload Estimation

Rough workload estimate in person weeks
Process Sizing Names of people doing the work
Design
Code
Test
Documentation
Build and infrastructure
Code review, etc.*
TOTAL

Terminologies/Acronyms

The terminologies/acronyms below are commonly used throughout this document. The list below defines each term regarding how it is used in this document:

Introduction

The COSMOS vision is entailed in the definition of what COSMOS is - "The world or universe regarded as an orderly, harmonious system". The intention of the project is to apply an orderly and harmonious behavior to the world of system management. Complementing standards such as CMDBf, SML, and Web2.0 technologies are making this vision a reality. The overall COSMOS vision is to provide an extensible framework, based on a set of acceptable standards, to simplify the task of building an ecosystem of existing system management tooling.

Inline with that vision, Nagios can help to not only mature the COSMOS framework but it can also provide out-of-the-box value to COSMOS users. This two-folded advantage has many positive implications:

  1. It is a step forward to evolving an open source code base to a framework that is usable in a production environment
  2. Makes the COSMOS project an attractable solution that provides value by its own
  3. Simplifies integration of proprietary solutions with Nagios, and
  4. Demonstrates a working example of a well-established system management application in COSMOS framework

The next section provides more detail about Nagios.

What is Nagios?

Nagios is a system and network monitoring application that is capable of detecting and notifying abnormal behavior. The definition and monitoring behavior is defined by administrators using a set of flat-file configurations. The files indicate what and how things should be monitored. There are three primary atomic entities in Nagios:

Host - A physical device on a network that is intended to be monitored (e.g. a desktop, printer, router, switch, hub, etc...).
Service - Indicates the specific component of a host that should be monitored (e.g. CPU utilization, memory consumption, HTTP, etc...)
Command - A utility that allows for a service check.  For example, check_CPU can be a command used to monitor the CPU utilization on a particular host.

An administrator is required to define hosts, services, and commands to effectively monitor a set of resources. The actual monitoring of a host/service is not done by Nagios. The monitoring is done by add-on plug-ins that are defined as individual commands. This architecture provides the capability to virtually monitor any aspect of a system that can be automated. There are already many available plug-ins for monitoring common hosts/services in a typical networking environment. Where limited, administrators can write their own plug-in to accomplish the monitoring of an uncommon host/service.

Nagios itself runs on Linux but it is capable of monitoring desktops running Windows via its plug-in architecture. As part of its monitoring solution, Nagios also provides an alerting mechanism that broadcasts a problem to sets of contacts or contact groups. A notification handler can also be registered to take certain actions based on incoming events (e.g. storing status information in an RDBMS). The diagram below, extracted from Nagios documentation¹, pictorially depicts the components of Nagios:


Nagios-architecture.png


There is also a web-based UI included that provides reporting and limited administration capabilities. An screen shot of the Nagios web-based UI is included below. The next section describes the scope and the value of this enhancement.


Nagios.png

Purpose

The purpose of this document is to outline the initial effort in bringing Nagios closer to COSMOS. The integration points and their related value to the Nagios and COSMOS user base will be covered by subsequent sections.

Scope

There are a number of areas where COSMOS can add value to Nagios. The areas can be summarized into three categories:

  1. Data Manager Integration
  2. Administration Capabilities
  3. Reporting

Data Manager Integration

The task of defining the required objects in Nagios is cumbersome, time-consuming, and error-prone. It's usually the case that information required to define objects is stored in other data stores. For example, a subset of configuration items stored in a CMDB can typically serve as the hosts that an administrator may want to monitor. It could also be the case where host information is stored under an asset database.

COSMOS can significantly ease the task of defining objects by providing integration points between data managers and Nagios. The ability of defining objects can be as simple as dragging and dropping a set of queried items from a data manager into the Nagios data collector. There are 10 different object types defined in Nagios:

1. Hosts
2. Host Groups
3. Services
4. Service Groups
5. Contacts
6. Contact Groups
7. Commands
8. Time Periods
9. Notification Escalations
10. Notification and Execution Dependencies

It is often the case where hosts, services, and contacts are defined in other data stores. COSMOS can use CMDBf to define a seamless integration between where the objects are stored and the Nagion monitoring framework. See use cases and implementation detail for more information.

Administration Capabilities

Reporting

Requirements

Use Cases

Implementation Detail

Test Coverage

Task Breakdown

Open Issues/Questions

Back to the top