Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Management"

m (Crawlers performance counters)
m (Crawlers performance counters)
Line 73: Line 73:
 
*** <tt>Total</tt> - Crawler type sub-total
 
*** <tt>Total</tt> - Crawler type sub-total
 
** <tt>Total</tt>
 
** <tt>Total</tt>
 +
 +
The nodes contain a subset collection of these possible counters:
 +
 +
* <tt>Error</tt>: contains a collection of all errors occurred. On operation tab you can find a method to show all errors in a dialog box.
 +
<tt>Delta-indices</tt>: number of delta indices created in LuceneIndex.
 +
* <tt>Exceptions(critical)</tt>: number of critical exceptions.
 +
* <tt>Exceptions(non-critical)</tt>: number of non-critical exceptions.
 +
* <tt>Exceptions(producer)</tt>: number of producer exceptions.
 +
* <tt>Files</tt>: number of files which were crawled. (only FileSystemCrawler)
 +
* <tt>Folder</tt>: number of folder walked through. (only FileSystemCrawler)
 +
* <tt>Records</tt>: number of records created.
 +
* <tt>Bytes</tt>: how much bytes were downloaded
 +
* <tt>http-fetch-time</tt>: average of each http-fetch-time (how much time it costs to download webpage.
 +
* <tt>Pages</tt>: how many pages were visited.
  
 
=== Processing performance counters ===
 
=== Processing performance counters ===

Revision as of 02:43, 20 March 2009

SMILA is a framework with a lot of functionality. Most is invoke automatically by internal operations. Nevertheless, the user has to configure and start an initial operation. All functions a user can execute are accessible from the JMX Management Agent. On the following pages you will learn how to use SMILA with the aid of Java's built in JConsole and to handle the JMXClient which features access to SMILA commands via batch files.

Management with the aid of jconsole

The jconsole is a little tool for monitoring java applications nested in the JDK. Over a JMX connection it’s possible to connect an application with the swing UI of jconsole. If you start up SMILA engine and open jconsole you can connect the Jconsole to SMILA immediately.

jconsole

After connecting you can find SMILA operation on MBeans tab in the Tree on the left site.

Smila manageable Components

There are four components of SMILA which you can access over jconsole.

CrawlerController

Here you can manage the crawling jobs. The following commands are available:

  • startCrawl(String dataSourceID): starts a crawling job with the given dataSourceID, for example file or web.
  • stopCrawl(String dataSourceID): stops the crawling job for the given dataSourceID. Note: the crawler is only signaled to stop and may do so at its own discretion. In other words: depending on the implementation it might take a while until it actually stops crawling. It thus gives the crawler the chance to clean up all open resources and finish whatever business it needs to.
  • getActiveCrawls(): opens a dialog which show a list containing the dataSourceID for all active crawl jobs. If no job is running the dialog shows null.
  • getActiveCrawlsStatus(): opens a dialog telling you how many crawl jobs are active at the moment.
  • getStatus(String dataSourceID): opens a dialog which shows you the status of the crawling job for a given dataSoruceID. Possible states are: RUNNING, FINISHED, STOPPED or ABORTED.

RecordRecycler

The RecordRecycler gives you the possibly to push already crawled records into Data Flow Process again. For example it could be useful if you want to modify record in the index with another pipeline. To control RecordRecycler there are following operations available.

  • startRecycling(String configurationID, String dataSourceId): fires a recycling event with the given configurationID ( the configurationID must match the name of a configuration file located at configuration/org.eclipse.smila.connectivity.framework.queue.worker/recyclers) and dataSourceID (get records from RecordStorage which have this dataSourceID). See QueueWorker documentation for further enlightenment on the Recycler.
  • stopRecycling(String dataSourceID): stops the recycle event for the given dataSourceID.
  • getRecordsRecycled(String dataSourceID): opens a dialog shows how many records are recycled.
  • getConfigurations(String dataSourceID): show a list containing all available recycle configuration files.
  • getStatus(String dataSourceID): open dialog showing the status of recycling event for given dataSourceID. Possible states are: STARTED, IN_PROCESS, STOPPING, STOPPED, FINISHED.

DeltaIndexingManager

The DeltaIndexManager stores a hash value of each record. It is part of the Connectivity Framework and signals a crawler that a given record has (not) changed since the last crawl. See DeltaIndexing documentation. Within jconsole you can use the following commands:

  • clearAll(): clears all hashes thus enabling to reprocess all records
  • unlockAll(): unlock all datasources
  • clear(String dataSourceID): same as clearAll but limited to one data source

Lucene

With Lucene you have the possibility to invoke several method concerning the index. Following operation are available:

  • deleteIndex(String indexName): removes the index with the given name if available. Otherwise an error dialog is shown.
  • indexExists(String indexName): ask the framework if the given Index exists. Returns true or false.
  • createIndex (String indexName): creates an index with the given name.
  • reorganizeIndex(String indexName): reorganizes the index with the given name. This will clean up the index, in that deleted entries are physically removed resulting in a smaller index size.
  • renameIndex(String currentIndexName, String newIndexName): rename the index with the given name (currentIndexname) into the value of newIndexName.

PerformanceCounter

A PerformanceCounter monitors the activity of a component. In SMILA actually are two kinds of PerformanceCounters available, one for Crawlers and another for Processing within the Data Flow Process. With the aid of jconsole you have the possibility to look at interesting counters of SMILA. There exist a lot of views that allow you to get information about different situations.

Crawlers performance counters

After you start a crawl job immediately a new branch in MBeans-tree appears with the following nodes/values:

Crawler counters tree
  • Crawlers
    • FileSystem - Crawler type
      • Launches
        • file - Data source Id
          • 19786841 - Crawler instance, one node for every crawl job
      • Total - Crawler type sub-total
    • Web - Crawler type
      • Launches
        • web - Data source Id
          • 2611152 - Crawler instance, one node for every crawl job
      • Total - Crawler type sub-total
    • Total

The nodes contain a subset collection of these possible counters:

  • Error: contains a collection of all errors occurred. On operation tab you can find a method to show all errors in a dialog box.

Delta-indices: number of delta indices created in LuceneIndex.

  • Exceptions(critical): number of critical exceptions.
  • Exceptions(non-critical): number of non-critical exceptions.
  • Exceptions(producer): number of producer exceptions.
  • Files: number of files which were crawled. (only FileSystemCrawler)
  • Folder: number of folder walked through. (only FileSystemCrawler)
  • Records: number of records created.
  • Bytes: how much bytes were downloaded
  • http-fetch-time: average of each http-fetch-time (how much time it costs to download webpage.
  • Pages: how many pages were visited.

Processing performance counters

As soon as Router puts Records into MQ the Listener pushes them into Data Flow Process. This time a new section with the following hierarchy (only an example, because PerformanceCounters vary according to your personal usage of SMILA ) appears in MBeans-tree:

External links

Back to the top