Skip to main content
Jump to: navigation, search

SMILA/Documentation/Indexing

< SMILA‎ | Documentation
Revision as of 03:19, 13 April 2015 by Andreas.weber.empolis.com (Talk | contribs) (Latency vs. Throughput)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Add data to a Solr search index

SMILA comes with predefined workflows and jobs to add/delete data from/to a solr search index.

As described in the 5 Minutes tutorial there are separate predefined jobs for importing the data (crawl jobs) and indexing the data.

The common predefined job for indexing is "indexUpdate". It uses the ScriptProcessorWorker that executes JavaScript for inserting (add.js) and deleting (delete.js) records to the predefined solr search index ("collection1").

The JavaScripts can also be called directly from the REST API to add/delete data records, e.g.

Add a record to the solr index:

POST http://localhost:8080/smila/script/add.process
{
  "_recordid": "id1",
  "Title": "Scripting rules!",
  "Content": "yet another SMILA document",
  "MimeType": "text/plain"
}

Delete record from solr index:

POST http://localhost:8080/smila/script/delete.process
{
  "_recordid": "id1"
}

Delete by query (example to remove all records):

POST http://localhost:8080/smila/script/delete.process
{
  "_solr": {
    "update": {
      "operation": "DELETE_BY_QUERY",
      "deleteQuery": "_recordid:*"
    }
  }
}


  • For more details about the "indexUpdate" workflow and "indexUpdate" job definitions see SMILA/configuration/org.eclipse.smila.jobmanager/workflows.json and jobs.json).
  • For more information about job management in general please check the JobManager documentation.
  • For more information about script processing with JavaScripts check the Scripting documentation.

Latency vs. Throughput

The predefined add/delete scripts are set for a small latency. Therefore the solr commit interval is set to 1 sec. via the SolrUpdatePipelet's commitWithinMs parameter.

If you want to process a high amount of data, set the commitWithinMs to a greater value. This will result in a better throughput.

Back to the top