Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Management"

(Crawlers performance counters)
(JMX Client in OSGi console)
 
(3 intermediate revisions by the same user not shown)
Line 15: Line 15:
 
Most components are now controlled via the HTTP REST API. Remaining JMX controlled components are:
 
Most components are now controlled via the HTTP REST API. Remaining JMX controlled components are:
  
== PerformanceCounter ==
+
* [[SMILA/Documentation/SesameOntologyManager]]
 +
* [[SMILA/Documentation/Solr]]
  
A [[SMILA/Project_Concepts/Performance_counters_API|PerformanceCounter]] monitors the activity of a component. In SMILA currently two kinds of PerformanceCounters are available, one for [[SMILA/Documentation/Crawler|Crawlers]] and another for Processing within the Data Flow Process. With the aid of jconsole you have the possibility to look at interesting counters of SMILA. There exist a lot of views that allow you to get information about different situations.
+
Also, some third-party components embedded in SMILA (e.g. Zookeeper) offer monitoring tools via JMX.
  
 +
== PerformanceCounter ==
  
 +
A [[SMILA/Project_Concepts/Performance_counters_API|PerformanceCounter]] monitors the activity of a component. In SMILA currently two kinds of PerformanceCounters are available, one for [[SMILA/Documentation/Crawler|Crawlers]] and another for Processing within the Data Flow Process. With the aid of jconsole you have the possibility to look at interesting counters of SMILA. There exist a lot of views that allow you to get information about different situations.
  
 
=== Processing performance counters ===
 
=== Processing performance counters ===
Line 40: Line 43:
 
== JMX Client ==
 
== JMX Client ==
  
The JMX Client is a lightweight and very easy to use command line driven component to use access most JMX Management operations. It works without jconsole and provides only a few commands. If you want to have full control over SMILA framework you have to use jconsole as described in the chapter above. But if you only want to start a crawl job or check if a crawl job is still active, you don’t have to use the jconsole. Furthermore you have the possibility to expand functionality of JMX Client. It is highly configurable with only one single configuration file.
+
The JMX Client is a lightweight and very easy to use command line driven component to use access most JMX Management operations. It works without jconsole and provides only a few commands. If you want to have full control over SMILA framework you have to use jconsole as described in the chapter above. Furthermore you have the possibility to expand functionality of JMX Client. It is highly configurable with only one single configuration file.
  
 
=== Pre-defined commands (batch-files) ===
 
=== Pre-defined commands (batch-files) ===
Line 147: Line 150:
 
== JMX Client in OSGi console ==
 
== JMX Client in OSGi console ==
  
The JMX client is also available in the Equinox OSGi console as a command provider. Thus you can now invoke the same configured actions also from the OSGi console without having to open a separate window. The command name is <tt>smila</tt> followed by the same arguments used with the <tt>run</tt> script in <tt>SMILA/jmxclient</tt>. Use <tt>help</tt> to get a description of the supported commands, the output should look like this:
+
The JMX client is also available in the Equinox OSGi console as a command provider. Thus you can now invoke the same configured actions also from the OSGi console without having to open a separate window. The command name is <tt>smila</tt> followed by the same arguments used with the <tt>run</tt> script in <tt>SMILA/jmxclient</tt>. Use <tt>help</tt> to get a description of the supported commands. Usually a lot more help output for the standard Equinox commands follows so you may need to scroll back a lot to find the description of the <tt>smila</tt> command. The commands are exactly like when using the <tt>run</tt> script, only the command name is <tt>smila</tt>, not <tt>run</tt>.
 
+
[[Image:SMILA-osgiconsole-help.png]]
+
 
+
Usually a lot more help output for the standard Equinox commands follows so you may need to scroll back a lot to find the description of the <tt>smila</tt> command.
+
 
+
See the next screenshot for an example session in which an agent and an crawler is controlled using the OSGi JMX client only. You'll see that the commands are exactly like when using the <tt>run</tt> script, only the command name is <tt>smila</tt>, not <tt>run</tt>.
+
 
+
[[Image:SMILA-osgiconsole-commands.png]]
+
  
 
== External links ==
 
== External links ==

Latest revision as of 02:12, 5 July 2012

Note.png
This is deprecated for SMILA 1.0, the JMX management framework is still functional but it's planned to replace it with management and monitoring HTTP ReST APIs.


SMILA is a framework with a lot of functionality. Most is invoke automatically by internal operations. Nevertheless, the user has to configure and start an initial operation. All functions a user can execute are accessible from the JMX Management Agent. On the following pages you will learn how to use SMILA with the aid of Java's built in JConsole and to handle the JMXClient which features access to SMILA commands via batch files.

Management with the aid of jconsole

The jconsole is a little tool for monitoring java applications nested in the JDK. Over a JMX connection it’s possible to connect an application with the swing UI of jconsole. If you start up SMILA engine and open jconsole you can connect the Jconsole to SMILA immediately.

jconsole

After connecting you can find SMILA operation on MBeans tab in the Tree on the left site.

Smila manageable Components

Most components are now controlled via the HTTP REST API. Remaining JMX controlled components are:

Also, some third-party components embedded in SMILA (e.g. Zookeeper) offer monitoring tools via JMX.

PerformanceCounter

A PerformanceCounter monitors the activity of a component. In SMILA currently two kinds of PerformanceCounters are available, one for Crawlers and another for Processing within the Data Flow Process. With the aid of jconsole you have the possibility to look at interesting counters of SMILA. There exist a lot of views that allow you to get information about different situations.

Processing performance counters

As soon as Router puts Records into MQ the Listener pushes them into Data Flow Process. This time a new section with the following hierarchy (only an example, because PerformanceCounters vary according to your personal usage of SMILA) appears in MBeans-tree:

  • Pipeline: lists all invoked pipelines.
    • AddPipeline
    • DeletePipeline
  • Processing Service: lists all processing services which were invoked, sorted by pipelines
    • AddPipeline
      • SimepleMimeTypeIdentifier
    • DeletePipeline
      • SolrIndexPipelet
  • Simple Pipelet: lists all pipelets which were used, sorted by pipelines
    • AddPipeline
      • HtmlToTextPipelet
      • SolrIndexPipelet

JMX Client

The JMX Client is a lightweight and very easy to use command line driven component to use access most JMX Management operations. It works without jconsole and provides only a few commands. If you want to have full control over SMILA framework you have to use jconsole as described in the chapter above. Furthermore you have the possibility to expand functionality of JMX Client. It is highly configurable with only one single configuration file.

Pre-defined commands (batch-files)

  • clearOntology: remove all statements from "native" ontology.
  • importRDF: import RDF file into "native" ontology. First argument is path to the RDF file (different formats are supported if the suffix is correct, see SMILA/Documentation/SesameOntologyManager#JMX Management Agent, second argument is the baseURI for all "relative" resources defined in the file. The value is irrelevant if the file contains only "absolute" URIs.
  • exportRDF: export all statements from "natvie" ontology to file "export.rdf" in RDF/XML format.

Usage

If you open command window in folder SMILA/jmxclient and execute run.bat you'll get very useful help.

JMX Client

The JMX Client can be used to simplify JMX Management while using batch-files for most important functions. But that’s not all. With the aid of JMX Client you have the possibility to use SMILA completely from your console or write own batch files which could invoke for example one method after another. The Client works with commands. These commands are managed in only one configuration file. In addition to the pre-defined commands you are able to create own commands. You only need to know the fully qualified class name and method name of function you want to invoke. To execute a command simply use this pattern: run.bat commandName commandParameters. The JMX client is able to execute any JMX operation and get any JMX attribute and to make it in one batch with reusing previous results.

Configuration

There is a configuration file located at org.eclipse.smila.management.jmx.client/schemas/jmxclient.xsd (Source) and jmxclient/schemas/jmxclient.xsd (Build). The default configuration file could be found at org.eclipse.smila.management.jmx.client/config.xml (Source) and jmxclient/config.xml.


Configuration explanation

To use commands which interact with JMX a connection to the JMX port of SMILA is needed
<connection id="local" host="localhost" port="9004"/>
Existing commands

The JMX client commands for SMILA are defined in the file config.xmk of the package org.eclipse.smila.management.jmx.client. The schema for the commands is defined in the folder schemas of the same package.

To create your own commands you have to use cmd command after the schema defined in the above folder
  • cmd:
    • id: the name of the command.
    • echo: information to display on console if command is executed.
      • operation
        • domain: the JMX property root. If not defined, it will be defaulted to "SMILA".
        • key: Class containing method.
        • name: name of the method to invoke.
        • echo: information to display on console if method is invoked.
          • parameter: one tag for each parameter.
            • echo: description of the parameter.
To keep the console open and inform you about actual status you can use the wait tag
  • STEP 1:
<cmd id="crawlW" echo="Starting crawler by datasource id and wait for finished">
  <operation
    key="CrawlerController"
    name="startCrawling"
    echo="Starting crawl [%1]">
    <parameter echo="data source id"/>
    <parameter echo="job name"/>
  </operation>
  ...
for the MBean of DOMAIN "SMILA" with KEY „CrawlerController" the operation "startCrawlerTask" with two input parameters (with String type - default) is executed. JMX will return a result to the client, e.g. "Crawler with the dataSourceId 'file' pushing to job 'indexUpdateJob' successfully started! (import run id: 595826)"
  • STEP 2:
we need to extract the hash code (which is the import run id) from the crawler's feedback to track its activities. This can be done by the following regexp tag which would return "595826" as its result for the above example.
<regexp pattern="^.*\(\D*(\d+)\).*$" group="1" echo="Extracting crawler hash code"/>
  • STEP 3 is an unconditional simple wait task. We have to wait for the jmx counters to be created before we can access them in the next step.
<wait echo="Waiting for jmx counters" pause="1000" />
  • STEP 4 is a wait task - the most complex task - we will wait until the crawl is finished. This wait tag is defined by using two subnodes
<wait echo="Waiting while crawl ends" pause="1000">
  <in>
    <cmd id="-" echo="Getting crawler status by datasource id">
      <operation
        key="CrawlerController"
        name="getStatus"
        echo="Crawl [%1] status">
        <!--  value="%1" -->
        <parameter echo="data source id"/>
      </operation>
    </cmd>
    <const value="Finished" echo="Crawling finished status"/>
    <const value="Stopped" echo="Crawling stopped status"/>
    <const value="Aborted" echo="Crawling aborted status"/>
  </in>
  <cmd id="-" echo="Reading crawler performance counters">
    <attribute
      key="Crawlers/%3/Total"
      name="Records"
      echo="Total records"/>
      ...
  </cmd>
</wait>
First subnode ( here its logical IN ) is a condition defining when to exit from the WAIT task. The second subnode is a command to execute in each iteration of the wait loop.
If the condition is not evaluated to true, the wait task will pause for the given amount of milliseconds before entering the next iteration. So each 1000 ms the following will be executed:
  • three performance counters defined in cmd with id="-" will be printed.
  • it will read the number of records, ask for the crawler's status checks if the status is "Finished", "Stopped" or "Aborted".
  • if it is, the crawling has finished and so the loop exits, otherwise the next iteration of the loop is started.

JMX Client in OSGi console

The JMX client is also available in the Equinox OSGi console as a command provider. Thus you can now invoke the same configured actions also from the OSGi console without having to open a separate window. The command name is smila followed by the same arguments used with the run script in SMILA/jmxclient. Use help to get a description of the supported commands. Usually a lot more help output for the standard Equinox commands follows so you may need to scroll back a lot to find the description of the smila command. The commands are exactly like when using the run script, only the command name is smila, not run.

External links