Skip to main content
Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/Indexing"

(Add data to a Solr search index)
m (Latency vs. Throughput)
 
(2 intermediate revisions by the same user not shown)
Line 27: Line 27:
 
}
 
}
 
</pre>
 
</pre>
 +
 +
Delete by query (example to remove all records):
 +
<pre>
 +
POST http://localhost:8080/smila/script/delete.process
 +
{
 +
  "_solr": {
 +
    "update": {
 +
      "operation": "DELETE_BY_QUERY",
 +
      "deleteQuery": "_recordid:*"
 +
    }
 +
  }
 +
}</pre>
  
  
 
* For more details about the "indexUpdate" workflow and "indexUpdate" job definitions see <tt>SMILA/configuration/org.eclipse.smila.jobmanager/workflows.json</tt> and <tt>jobs.json</tt>).  
 
* For more details about the "indexUpdate" workflow and "indexUpdate" job definitions see <tt>SMILA/configuration/org.eclipse.smila.jobmanager/workflows.json</tt> and <tt>jobs.json</tt>).  
 
* For more information about job management in general please check the [[SMILA/Documentation/JobManager|JobManager documentation]].
 
* For more information about job management in general please check the [[SMILA/Documentation/JobManager|JobManager documentation]].
* For more information about script processing with JavaScripts check the [[https://wiki.eclipse.org/SMILA/Documentation/Scripting Scripting documentation]].
+
* For more information about script processing with JavaScripts check the [[SMILA/Documentation/Scripting|Scripting documentation]].
  
 
=== Latency vs. Throughput ===
 
=== Latency vs. Throughput ===
Line 37: Line 49:
 
The predefined add/delete scripts are set for a small latency. Therefore the solr commit interval is set to 1 sec. via the SolrUpdatePipelet's <tt>commitWithinMs</tt> parameter.
 
The predefined add/delete scripts are set for a small latency. Therefore the solr commit interval is set to 1 sec. via the SolrUpdatePipelet's <tt>commitWithinMs</tt> parameter.
  
If you want to process a high amount of data, set the <tt>commitWithinMs</tt> to a greater value. This will result in a better throuhput.
+
If you want to process a high amount of data, set the <tt>commitWithinMs</tt> to a greater value. This will result in a better throughput.

Latest revision as of 02:19, 13 April 2015

Add data to a Solr search index

SMILA comes with predefined workflows and jobs to add/delete data from/to a solr search index.

As described in the 5 Minutes tutorial there are separate predefined jobs for importing the data (crawl jobs) and indexing the data.

The common predefined job for indexing is "indexUpdate". It uses the ScriptProcessorWorker that executes JavaScript for inserting (add.js) and deleting (delete.js) records to the predefined solr search index ("collection1").

The JavaScripts can also be called directly from the REST API to add/delete data records, e.g.

Add a record to the solr index:

POST http://localhost:8080/smila/script/add.process
{
  "_recordid": "id1",
  "Title": "Scripting rules!",
  "Content": "yet another SMILA document",
  "MimeType": "text/plain"
}

Delete record from solr index:

POST http://localhost:8080/smila/script/delete.process
{
  "_recordid": "id1"
}

Delete by query (example to remove all records):

POST http://localhost:8080/smila/script/delete.process
{
  "_solr": {
    "update": {
      "operation": "DELETE_BY_QUERY",
      "deleteQuery": "_recordid:*"
    }
  }
}


  • For more details about the "indexUpdate" workflow and "indexUpdate" job definitions see SMILA/configuration/org.eclipse.smila.jobmanager/workflows.json and jobs.json).
  • For more information about job management in general please check the JobManager documentation.
  • For more information about script processing with JavaScripts check the Scripting documentation.

Latency vs. Throughput

The predefined add/delete scripts are set for a small latency. Therefore the solr commit interval is set to 1 sec. via the SolrUpdatePipelet's commitWithinMs parameter.

If you want to process a high amount of data, set the commitWithinMs to a greater value. This will result in a better throughput.

Back to the top