Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

SMILA/Documentation/Scripting

< SMILA‎ | Documentation
Revision as of 13:19, 30 October 2014 by Nadine.auslaender.empolis.com (Talk | contribs) (Using Pipelets)

Scripting SMILA using JavaScript

Work In Progress

Service Description

  • Bundles: org.eclipse.smila.scripting(.test)
  • OSGi service interface: org.eclipse.smila.scripting.ScriptingEngine
  • Service implementation: org.eclipse.smila.scripting.internal.JavascriptEngine

The ScriptingEngine provides an alternative for describing "synchronous workflows" by using JavaScript functions instead of BPEL processes. This approach is easier, more flexible, and more maintainable (e.g. debugable), so one day the BPEL approach might be removed completely.

Script Basics

A JavaScript function for SMILA scripting takes one record (including attachments) as an argument, and can return one record (other return types are supported, too and wrapped in a record automatically). For example, a file helloWorld.js (the suffix must be ".js") could look like this:

function greetings(record) {
  record.greetings = "Hello " + record.name + "!";
  return record;
}

Script files to execute are added by default to SMILA/configuration/org.eclipse.smila.scripting/js. They are currently loaded "on-demand" and not stored in the service for reuse, so changes in the files will be effective for the next execution.

A script is invoked using the ScriptingEngine.callScript() methods. The first argument of both methods is a scriptName string in format <file>.<function> where the <file> part is the name of the script file (without path and ".js" suffix) and the <function> part is the name of a function defined in this file.

Exposing Script Functions

The script directory can contain "script catalog" files. They can be used to expose and describe available scripts in the ReST API so that a client can detect available scripts. Such a file must be named <prefix>ScriptCatalog.js, e.g. smilaScriptCatalog.js, and must have this format:

[
  {
    name: "helloWorld.greetings",
    description: "Get a Hello from SMILA!"
  },
  // ... more function descriptions
]

A catalog file does not define functions, it just produces an array of script function descriptions. A description object must contain a name property, we also recommend including a description property. Other properties can be added as you like (e.g. a structured description of expected parameters in the passed record).

The ScriptingEngine.listScripts() method merges the arrays produced by all catalog scripts into one array (elements that are not objects or do not have a name property are ignored) and sorts them by name.

The name property must be in format <file>.<function>, as described above for the scriptName parameter of the callScripts() functions.

Configuration

The script directory can be changed on startup using a system property: SMILA -Dsmila.scripting.dir=/home/smila/js .... The system property can also be added to SMILA.ini, of course.

Scripting Features ("SDK")

See the Rhino Documentation for special JavaScript features available in Rhino. They should work in SMILA, too. Especially the predefined functions available in Rhino Shell should work in SMILA, too (if they are useful). For example, you can use print(...) to write something to the console:

  print("Hello World!");

(However, the quit() function will do nothing ;-)

Working with Records

The record passed to the script can be accessed just like a native JavaScript object. The record attributes are just treated as object properties:

  record.string = "a string";
  record["integer"] = 42;
  record.double = 3.14;
  record.boolean = true;
  record.map = {
    key : "value"
  };
  record.sequence = [ "Hello", record.string, record.integer, record.double ];

  delete record.name;

Iterating over maps and sequences is possible, too:

  for ( var key in record.map) {
    print("map " + key + " to " + record.map.key);
  }

  for ( var index in record.sequence) {
    print("element " + index + ": " + record.sequence[index]);
  }  

The record object has the following three special properties whose names start with a dollar sign ($):

  • $id: The string value of the attribute _recordid. This is just a convenience property. It can be used to read and write the record ID:
  var recordId = record.$id;
  record.$id = "changed-id";
  • $metadata: In some cases it is necessary to use the actual AnyMap object containing the record metadata, for example if you want to call a Java method that defines a parameter of type Any or AnyMap:
  var writer = new org.eclipse.smila.datamodel.ipc.IpcAnyWriter(true);
  var recordAsJson = writer.writeJsonObject(record.$metadata);
  • $attachments: Contains an object that provides access to the record's attachments. Its properties correspond to attachment names and can be used to get and set attachment contents of the record.
    When reading an attachment, an actual org.eclipse.smila.datamodel.Attachment object is returned that can be access by using the Java methods and passed to other Java objects:
  var attachment = record.$attachments.Content;
  var contentLength = attachment.size();
  var contentAsByteArray = attachment.getAsBytes();
  var contentAsStream = attachment.getAsStream();
  
  var contentAsString = new java.lang.String(contentAsByteArray, "utf-8");

To set an attachment, several types of objects are supported to provide the content:

  • Java byte Arrays, of course:
  record.$attachment.fromBytes = contentAsByteArray;
  • String (more exactly, java.lang.CharSequence) objects are converted to byte arrays using UTF-8 encoding:
  record.$attachments.fromString = "string attached";
  • java.io.InputStream objects are read into an byte array and set as an attachment. The stream will be closed after the operation:
  var stream = new FileInputStream(filename);
  record.$attachments.fromStream = stream
  • An org.eclipse.smila.datamodel.Attachment can be used, too. If the names match, the actual Attachment object will just be attached to the record. Else the implementation will fetch the content from the source attachment and create a new Attachment object from it (with the current implementation of attachments in SMILA this will NOT result in copying the actual byte[]). If getting the content does not work, an error will be thrown (however, this cannot happen currently).
  record.$attachments.copyAttachment = record.$attachments.originalAttachment

To delete an attachment, use the delete operator:

  delete record.$attachments.Content;

record.$attachments and record.$metadata cannot be used for write access themselves. The delete operator will not work on any of the special properties.

Accessing OSGi services

Any active OSGi services in the SMILA VM can be easily accessed from within a script. Just use the globally registered services object. For example:

  • Use LanguageIdentifier service:
  var languageId = services.find("org.eclipse.smila.common.language.LanguageIdentifyService");
  record.language = languageId.identify(record.Content).getIsoLanguage();
  • Write record to ObjectStore:
  var objectstore = services.find("org.eclipse.smila.objectstore.ObjectStoreService");
  objectstore.ensureStore("store-created-by-script");

  var bonWriter = new org.eclipse.smila.datamodel.ipc.IpcAnyWriter(true);
  var bonObject = bonWriter.writeBinaryObject(record.$metadata);
  
  objectstore.putObject("store-created-by-script", "bon-object", bonObject);

See the service documentations for details on how to use them.

Using Pipelets

It is also possible to use pipelets. You must create a pipelet instance first using the global pipelets.create function and a configuration object, then you can invoke the created pipelet instance using the process function of the instance:

function processTika(record)
  var tikaConfig = {
    "inputType" : "ATTRIBUTE",
    "outputType" : "ATTRIBUTE",
    "inputName" : "Content",
    "outputName" : "PlainContent",
    "contentTypeAttribute" : "MimeType",
    "exportAsHtml" : false,
    "maxLength" : "-1",
    "extractProperties" : [ {
      "metadataName" : "title",
      "targetAttribute" : "Title",
      "singleResult" : true
    } ]
  };
  var tika = pipelets.create("org.eclipse.smila.tika.TikaPipelet", tikaConfig);
  tika.process(record);
  return record;

The process() function accepts single records and arrays of records as well as single or arrays of JavaScript objects that can be converted to AnyMap objects. Arrays of records or objects will be processed in a single pipelet invocation.

The process function always returns an array of records, even if only one record was given as input. That's due to the fact that some pipelets create new records resp. split the input record into multiple output records.

So the signature of the process function looks like this:

  Record[] process(Record)
  Record[] process(Record[])
  Record[] process(AnyMap)
  Record[] process(AnyMap[])
  Record[] process(<Javascript-Map>)
  Record[] process(<Javascript-Map>[])

The result of a pipelet invocation can be given to another pipelet for further processing or returned as the final function result.

Using Pipelets - best practice:

In normal case, pipelets will just work on (resp. modify) given input records, but not create new records. In this case, don't use the result of a pipelet for further script processing but just work with the input record. So you don't have to care about the process function always returning an array as result.

Example-1: Best practice

function processTika(record)
  ... 
  my1stPipelet.process(record);
  record.greetings = "Hello world";
  my2ndPipelet.process(record);
  ...
  return record;

Example-2: When working with the pipelet result, you'll have to deal with arrays:

function processTika(record)
  ... 
  var result1 = my1stPipelet.process(record);
  result1[0].greetings = "Hello world";
  var result2 = my2ndPipelet.process(result1);
  ...
  return result2[0];

Using other scripts: require

This is basically an implementation of the CommonJS Module Specification, so you may want to refer to details there.

Scripts can use functions resp. objects from other scripts (aka "modules") using the global require function. The argument to require is the path to the imported script without the ".js" suffix, relative to the SMILA script directory.

The prerequisite for using an object from another script is that it has been made available via registration in the "exports" object. The result of require is this "exports" object, so the exported functions can be accessed in the importing script via this object.

Within one script execution, multiple require calls for one module (even from different scripts) cause the module to be loaded only once and return the same "exports" object. So the scope to which the module was loaded is shared by each importer, local variables in this context are the same regardless from where the module is called.

Example: We call a function in script helloWorld.js which uses a function from script utils/myUtils.js:

helloWorld.js

// required scripts
var myUtils = require("utils/myUtils");

function greetings(record) {
  var normalizedName = myUtils.normalize(record.name)
  record.greetings = "Hello " + normalizedName + "!";
  return record;
}

utils/myUtils.js

// objects used in other scripts
exports.normalize = normalize

function normalize(str) {  
  return str.toUpperCase()
}

Conventions:

  • Exported functions should be exported under their original function name.
  • The object created with require() should be named like the required script.
  • If a script contains requires and/or exports, they should be listed at the beginning of the script, starting with the requires.
// required scripts
var myUtils = require("utils/myUtils");
var myCommons = require("commons/myCommons");
...

// objects used in other scripts
exports.myFunction1 = myFunction1;
exports.myFunction2 = myFunction2;
...

function myFunction1() {
  ...
}

function myFunction2() {
  ...
}
...

Logging

To output log messages from within your scripting environment, SMILA provides two ways to do this. One way is to use the built in default logger which is accessible via the log object. This object is already provided within your scope.

function logDemo() {
  log.info("This msg will be logged on level >>info<<");
  log.error("This msg will be logged on level >>error<<");
  log.warn("This msg will be logged on level >>warn<<");
}

The messages can then be found in the smila.log file. Levels available are trace, debug, info, warn, error and fatal. Depending on the levels set in SMILA log4.properties, your log message may be discarded.

If a certain level is enabled can be checked with:

  boolean log.isTraceEnabled();
  boolean log.isDebugEnabled();
  boolean log.isInfoEnabled();
  boolean log.isWarnEnabled();
  boolean log.isErrorEnabled();
  boolean log.isFatalEnabled();

The other way is to create a custom logger. To do so the environment exposes a vanila org.apache.commons.logging.LogFactory to the script contexts.

function logFactoryDemo() {
  var myLoggger = LogFactory.getLog("myLogger");
  myLogger.info("This is a log message");	
}

Debugging

Yes, it is possible. See SMILA/Documentation/Scripting/Debugging.

ScriptProcessorWorker

The ScriptProcessorWorker is a worker designed to process (synchronous) script calls inside an asynchronous workflow.

HTTP REST API

Scripts handled via ReST API must be located on top level of the configured script folder, i.e. scripts in subfolders can currently not be called via ReST API.

Manage Scripts

__URL:__ http://<hostname>:8080/smila/script

Methods:

  • GET: list exposed scripts - show the result of ScriptingEngine.listScripts(). No parameters, no request body.

Response:

  • result of ScriptingEngine.listScripts(), wrapped as a JSON object with property "scripts" containing the array of script descriptions:
{
    "scripts": [
        {
            "name": "helloWorld.greetings",
            "description": "Get a Hello from SMILA!",
            "url": "http://localhost:8080/smila/script/helloWorld.greetings/"
        }
    ]
}


__URL:__ http://<hostname>:8080/smila/script/<scriptfile>.<function>

Methods:

  • GET: show script description. No parameters, no request body.

Response:

  • description object from above list with the matching name.

Response-Codes

  • 200 OK: Success
  • 404 Not Found: Function is not exposed in any ScriptCatalog file.

Example: GET /smila/script/helloWorld.greetings yields:

{
    "name": "helloWorld.greetings",
    "description": "Get a Hello from SMILA!",
    "url": "http://localhost:8080/smila/script/helloWorld.greetings/"
}

Execute a script

__URL:__ http://<hostname>:8080/smila/script/<script-file>.<function>

Methods

  • POST: execute script with record in request body. Attachments are supported, too.

Response:

  • Metadata part of result of ScriptingEngine.callScript("<script-file>.<function>", requestRecord). If the result contains attachments they are not returned via the ReST-API.

Response-Codes

  • 200 OK: Script executed successfully.
  • 400 Bad Request: Last URL part does not have <script-file>.<function> format, or error in Script execution
  • 404 Not Found: Script file does not exist or does not contain the function

Example request:

POST http://localhost:8080/smila/script/helloWorld.greetings
{
  "name": "Juergen"
}

Response:

{
    "name": "Juergen",
    "greetings": "Hello Juergen!"
}

Examples

Simple example using SMILA's default scripts for indexing and search.

add some records to index:

POST http://localhost:8080/smila/script/add.process
{
  "_recordid": "id1",
  "Title": "Scripting rules!",
  "Content": "yet another SMILA document",
  "MimeType": "text/plain"
}

search in index:

POST http://localhost:8080/smila/script/search.process
{
  "query": "SMILA",
  "resultAttributes": ["Title", "Content"]
}

delete first record from index:

POST http://localhost:8080/smila/script/delete.process
{
  "_recordid": "id1"
}

Back to the top