Jump to: navigation, search

Difference between revisions of "SMILA/Documentation/JdbcLoggingPipelet"

(Configuration)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
= Bundle: <tt>org.eclipse.smila.jdbc</tt>  =
 
= Bundle: <tt>org.eclipse.smila.jdbc</tt>  =
== Bundle: <tt>org.eclipse.smila.jdbc.JdbcLoggingPipelet</tt>  ==
+
== <tt>org.eclipse.smila.jdbc.JdbcLoggingPipelet</tt>  ==
  
 
=== Description  ===
 
=== Description  ===
Line 50: Line 50:
  
 
The ''valuePaths'' parameter is optional, so it's ok to leave it empty, or to leave it at all. However, this must fit to the given log statement, if no value paths are specified, the statement musn't have parameters.
 
The ''valuePaths'' parameter is optional, so it's ok to leave it empty, or to leave it at all. However, this must fit to the given log statement, if no value paths are specified, the statement musn't have parameters.
 +
 +
===== Processing =====
 +
 +
The pipelet works asynchronously: That means that it does not execute the SQL requests immediately, but only prepares them and adds them to a in-memory queue which is maintained by the <tt>JdbcWriterService</tt> in bundle <tt>org.eclipse.smila.jdbc</tt>. To prevent OutOfMemory errors this queue has a maximum capacity which is configurable in configuration file <tt>org.eclipse.smila.jdbc/jdbcwriterservice.properties</tt>. The default capacity per database connection is 100:
 +
 +
<source lang="text">
 +
# capacity of each database queue
 +
capacity=100
 +
</source>
 +
 +
If the queue is still full when a new request should be added by the pipelet, the request will be discarded and the pipelet will log a warning like this:
 +
 +
<source lang="text">
 +
2013-09-17 09:03:55,892 (MESZ) WARN  [ODEServerImpl-14                            ]  jdbc.JdbcLoggingPipelet                      - Error while processing record with id 'record-72-1522951520'.
 +
org.eclipse.smila.jdbc.JdbcWriterServiceException: Error queueing data for queue jdbc:oracle:thin:@host:1521:SID
 +
                at org.eclipse.smila.jdbc.internal.JdbcWriterServiceImpl.write(JdbcWriterServiceImpl.java:83)
 +
                at org.eclipse.smila.jdbc.JdbcLoggingPipelet.process(JdbcLoggingPipelet.java:54)
 +
...
 +
Caused by: java.lang.IllegalStateException: Queue full
 +
                at java.util.AbstractQueue.add(AbstractQueue.java:98)
 +
                at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:283)
 +
                at org.eclipse.smila.jdbc.internal.JdbcWriterServiceImpl.write(JdbcWriterServiceImpl.java:81)
 +
                ... 36 more
 +
</source>
  
 
=== Example  ===
 
=== Example  ===
Line 93: Line 117:
  
  
== Bundle: <tt>org.eclipse.smila.jdbc.JdbcFetcherPipelet</tt>  ==
+
== <tt>org.eclipse.smila.jdbc.JdbcFetcherPipelet</tt>  ==
  
 
=== Description  ===
 
=== Description  ===
Line 163: Line 187:
  
 
</pre>
 
</pre>
 +
 +
 +
== <tt>org.eclipse.smila.jdbc.JdbcSelectPipelet</tt>  ==
 +
 +
=== Description  ===
 +
 +
The '''JdbcSelectPipelet''' returns the results of a given statement string (= PreparedStatement) from a database via JDBC for each processed record. The PreparedStatement typically has parameters '?', these are filled with the values which are referenced by the ''valuePaths'' parameter. The valuePaths parameter contains a list of strings, each string contains a path to a (sub-)attribute in the currently processed record's metadata.
 +
 +
 +
 +
=== Configuration  ===
 +
 +
{| border="1"
 +
|-
 +
! Property
 +
! Type
 +
! Read Type
 +
! Required
 +
! Description
 +
|-
 +
| ''dbUrl''
 +
| String
 +
| runtime
 +
| yes
 +
| The (JDBC driver) dependent URL which is used to connect to the database.
 +
|-
 +
| ''dbProps''
 +
| Map
 +
| runtime
 +
| yes
 +
| Database connection properties, e.g. 'user' and 'password'.
 +
|-
 +
| ''stmt''
 +
| String
 +
| runtime
 +
| yes
 +
| The (Prepared)Statement with the SELECT statement to fetch data from the database, may have parameters.
 +
|-
 +
| ''valuePaths''
 +
| String (multi)
 +
| runtime
 +
| no
 +
| List of paths which point to the record's metadata (sub)attributes that are used as parameter values in the SELECT statement. A path is separated by '/'.
 +
|-
 +
| ''result''
 +
| String ('single' / 'multi')
 +
| runtime
 +
| no
 +
| whether resulting rows selected from the database should be returned in a new 'records' section of the input record ('single'), or as multiple records - i.e. one record per row ('multi').
 +
|}
 +
 +
 +
===== Configuring value paths =====
 +
 +
Same way as for JdbcFetcherPipelet (see above).
 +
 +
=== Examples ===
 +
 +
The following example shows sample pipelet results:
 +
 +
Single input record:
 +
<pre>
 +
{
 +
  "recordid": "inputId",
 +
  "att1": "value"
 +
}
 +
</pre>
 +
 +
Result ('single') when selecting two rows:
 +
<pre>
 +
{
 +
  "recordid": "inputId",
 +
  "att1": "value",
 +
  "records": [  {"DB-COLUMN-1":"value1", "DB-COLUMN-2": "value2"},
 +
                {"DB-COLUMN-1":"value3", "DB-COLUMN-2": "value4"} ]
 +
}
 +
</pre>
 +
 +
Result ('multi') when selecting two rows:
 +
<pre>
 +
{
 +
  "recordid": "inputId#1",     
 +
  "DB-COLUMN-1":"value1",
 +
  "DB-COLUMN-2": "value2"
 +
}
 +
{
 +
  "recordid": "inputId#2", 
 +
  "DB-COLUMN-1":"value3",
 +
  "DB-COLUMN-2": "value4"
 +
}
 +
</pre>
 +
 +
 +
Multiple input records:
 +
<pre>
 +
{
 +
  "recordid": "inputId1", 
 +
  "att1": "value1",
 +
  "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v1'" }
 +
}
 +
{
 +
  "recordid": "inputId2", 
 +
  "att1": "value2",
 +
  "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v2'" }
 +
}
 +
</pre>
 +
 +
Sample Result ('single'):
 +
<pre>
 +
{
 +
  "recordid": "inputId1",
 +
  "att1": "value1",
 +
  "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v1'" },
 +
  "records": [  {"DB-COLUMN-1":"v1", "DB-COLUMN-2": "v2"},
 +
                {"DB-COLUMN-1":"v1", "DB-COLUMN-2": "v3"} ]
 +
}
 +
{
 +
  "recordid": "inputId2",
 +
  "att1": "value2",
 +
  "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v2'" },
 +
  "records": [  {"DB-COLUMN-1":"v2", "DB-COLUMN-2": "v4"} ]
 +
}
 +
</pre>
 +
 +
Sample Result ('multi'):
 +
<pre>
 +
{
 +
  "recordid": "inputId1#1",     
 +
  "DB-COLUMN-1":"v1",
 +
  "DB-COLUMN-2": "v2"
 +
}
 +
{
 +
  "recordid": "inputId1#2",     
 +
  "DB-COLUMN-1":"v1",
 +
  "DB-COLUMN-2": "v3"
 +
}
 +
{
 +
  "recordid": "inputId2#1", 
 +
  "DB-COLUMN-1":"v2",
 +
  "DB-COLUMN-2": "v4"
 +
}
 +
</pre>
 +
 +
== Adding JDBC Drivers ==
 +
 +
By default SMILA includes only JDBC drivers for Derby. If you want to access other databases then you have to provide according JDBC drivers.
 +
Have a look [[SMILA/Documentation/Adding_JDBC_Drivers|here]] to learn how to add JDBC drivers to SMILA.
 +
 +
 +
 
[[Category:SMILA]] [[Category:SMILA/Pipelet]]
 
[[Category:SMILA]] [[Category:SMILA/Pipelet]]

Latest revision as of 10:20, 12 August 2014

Bundle: org.eclipse.smila.jdbc

org.eclipse.smila.jdbc.JdbcLoggingPipelet

Description

The JdbcLoggingPipelet logs a given statement string (= PreparedStatement) into a database via JDBC for each processed record. The PreparedStatement typically has parameters '?', these are filled with the values which are referenced by the valuePaths parameter. The valuePaths parameter contains a list of strings, each string contains a path to a (sub-)attribute in the currently processed record's metadata.

Configuration

Property Type Read Type Required Description
dbUrl String runtime yes The (JDBC driver) dependent URL which is used to connect to the database.
dbProps Map runtime yes Database connection properties, e.g. 'user' and 'password'.
stmt String runtime yes The (Prepared)Statement which is logged to the database, may have parameters.
valuePaths String (multi) runtime no List of paths which point to the record's metadata (sub)attributes that are used as parameter values in the logged statement. A path is separated by '/'.
Configuring value paths

The following should be taken into account when specifying a value path:

  • If a value path references a single value, this is used for the PreparedStatement.
  • If a value path references a sequence of values, (only) the first value of the sequence is used.
  • In any other case, the value is set 'null'. (Keep in mind that a 'null' value is valid for a PreparedStatement)

The valuePaths parameter is optional, so it's ok to leave it empty, or to leave it at all. However, this must fit to the given log statement, if no value paths are specified, the statement musn't have parameters.

Processing

The pipelet works asynchronously: That means that it does not execute the SQL requests immediately, but only prepares them and adds them to a in-memory queue which is maintained by the JdbcWriterService in bundle org.eclipse.smila.jdbc. To prevent OutOfMemory errors this queue has a maximum capacity which is configurable in configuration file org.eclipse.smila.jdbc/jdbcwriterservice.properties. The default capacity per database connection is 100:

# capacity of each database queue
capacity=100

If the queue is still full when a new request should be added by the pipelet, the request will be discarded and the pipelet will log a warning like this:

2013-09-17 09:03:55,892 (MESZ) WARN  [ODEServerImpl-14                             ]  jdbc.JdbcLoggingPipelet                       - Error while processing record with id 'record-72-1522951520'. 
org.eclipse.smila.jdbc.JdbcWriterServiceException: Error queueing data for queue jdbc:oracle:thin:@host:1521:SID
                at org.eclipse.smila.jdbc.internal.JdbcWriterServiceImpl.write(JdbcWriterServiceImpl.java:83)
                at org.eclipse.smila.jdbc.JdbcLoggingPipelet.process(JdbcLoggingPipelet.java:54)
...
Caused by: java.lang.IllegalStateException: Queue full
                at java.util.AbstractQueue.add(AbstractQueue.java:98)
                at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:283)
                at org.eclipse.smila.jdbc.internal.JdbcWriterServiceImpl.write(JdbcWriterServiceImpl.java:81)
                ... 36 more

Example

The following example shows a sample pipelet configuration and the resulting log statement when logging the given record.

Pipelet configuration:

   <proc:configuration>
     <rec:Val key="dbUrl">jdbc:postgresql://localhost/mydb</rec:Val>
     <rec:Val key="stmt">INSERT INTO myTable VALUES (?, ?, 100, ?, ?)</rec:Val>
     <rec:Map key="dbProps">                    
        <rec:Val key="user">Andreas</rec:Val>
        <rec:Val key="password">top_secret</rec:Val>                                      
     </rec:Map>
     <rec:Seq key="valuePaths">
          <rec:Val>_recordid</rec:Val>
          <rec:Val>_parameters/session/id</rec:Val>
          <rec:Val>Authors</rec:Val>
          <rec:Val>Size</rec:Val>
     </rec:Seq>
   </proc:configuration>   

Sample record and resulting logged SQL statement:

 {
   "_recordid":"web:http://example.org",   
   "_parameters": {
      "maxCount": 100,
      "session": {
         "timestamp": "2012-10-12T14:00:00",
         "id": 4711
      }
   }
   "Path": "http://example.org/index.html",
   "Authors": ["Andreas Weber", "Jürgen Schumacher", "Andreas Schank"]   
 }

-> INSERT INTO myTable VALUES ('web:http://example.org', 4711, 100, 'Andreas Weber', null)


org.eclipse.smila.jdbc.JdbcFetcherPipelet

Description

The JdbcFetcherPipelet fetches the results of a given statement string (= PreparedStatement) from a database via JDBC for each processed record. The PreparedStatement typically has parameters '?', these are filled with the values which are referenced by the valuePaths parameter. The valuePaths parameter contains a list of strings, each string contains a path to a (sub-)attribute in the currently processed record's metadata.

Configuration

Property Type Read Type Required Description
dbUrl String runtime yes The (JDBC driver) dependent URL which is used to connect to the database.
dbProps Map runtime yes Database connection properties, e.g. 'user' and 'password'.
stmt String runtime yes The (Prepared)Statement with the SELECT statement to fetch data from the database, may have parameters.
valuePaths String (multi) runtime no List of paths which point to the record's metadata (sub)attributes that are used as parameter values in the SELECT statement. A path is separated by '/'.
Configuring value paths

The following should be taken into account when specifying a value path:

  • If a value path references a single value, this is used for the PreparedStatement.
  • If a value path references a sequence of values, (only) the first value of the sequence is used.
  • In any other case, the value is set 'null'. (Keep in mind that a 'null' value is valid for a PreparedStatement)

The valuePaths parameter is optional, so it's ok to leave it empty, or to leave it at all. However, this must fit to the given SELECT statement, if no value paths are specified, the statement musn't have parameters.

Example

The following example shows a sample pipelet configuration:

Pipelet configuration:

   <proc:configuration>
     <rec:Val key="dbUrl">jdbc:postgresql://localhost/mydb</rec:Val>
     <rec:Val key="stmt">SELECT COUNT(*) FROM myTable WHERE action=?</rec:Val>
     <rec:Map key="dbProps">                    
        <rec:Val key="user">Andreas</rec:Val>
        <rec:Val key="password">top_secret</rec:Val>                                      
     </rec:Map>
     <rec:Seq key="valuePaths">
          <rec:Val>Action</rec:Val>
     </rec:Seq>
   </proc:configuration>   


org.eclipse.smila.jdbc.JdbcSelectPipelet

Description

The JdbcSelectPipelet returns the results of a given statement string (= PreparedStatement) from a database via JDBC for each processed record. The PreparedStatement typically has parameters '?', these are filled with the values which are referenced by the valuePaths parameter. The valuePaths parameter contains a list of strings, each string contains a path to a (sub-)attribute in the currently processed record's metadata.


Configuration

Property Type Read Type Required Description
dbUrl String runtime yes The (JDBC driver) dependent URL which is used to connect to the database.
dbProps Map runtime yes Database connection properties, e.g. 'user' and 'password'.
stmt String runtime yes The (Prepared)Statement with the SELECT statement to fetch data from the database, may have parameters.
valuePaths String (multi) runtime no List of paths which point to the record's metadata (sub)attributes that are used as parameter values in the SELECT statement. A path is separated by '/'.
result String ('single' / 'multi') runtime no whether resulting rows selected from the database should be returned in a new 'records' section of the input record ('single'), or as multiple records - i.e. one record per row ('multi').


Configuring value paths

Same way as for JdbcFetcherPipelet (see above).

Examples

The following example shows sample pipelet results:

Single input record:

 { 
   "recordid": "inputId",
   "att1": "value"
 }

Result ('single') when selecting two rows:

 { 
   "recordid": "inputId",
   "att1": "value",
   "records": [  {"DB-COLUMN-1":"value1", "DB-COLUMN-2": "value2"},
                 {"DB-COLUMN-1":"value3", "DB-COLUMN-2": "value4"} ]
 }

Result ('multi') when selecting two rows:

 { 
   "recordid": "inputId#1",      
   "DB-COLUMN-1":"value1", 
   "DB-COLUMN-2": "value2"
 }
 { 
   "recordid": "inputId#2",   
   "DB-COLUMN-1":"value3", 
   "DB-COLUMN-2": "value4"
 }


Multiple input records:

 { 
   "recordid": "inputId1",  
   "att1": "value1",
   "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v1'" }
 }
 { 
   "recordid": "inputId2",  
   "att1": "value2",
   "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v2'" }
 }

Sample Result ('single'):

 { 
   "recordid": "inputId1",
   "att1": "value1",
   "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v1'" },
   "records": [  {"DB-COLUMN-1":"v1", "DB-COLUMN-2": "v2"},
                 {"DB-COLUMN-1":"v1", "DB-COLUMN-2": "v3"} ]
 }
 { 
   "recordid": "inputId2",
   "att1": "value2",
   "_parameters": { "stmt": "SELECT * FROM t WHERE DB-COLUMN-1 = 'v2'" },
   "records": [  {"DB-COLUMN-1":"v2", "DB-COLUMN-2": "v4"} ]
 }

Sample Result ('multi'):

 { 
   "recordid": "inputId1#1",      
   "DB-COLUMN-1":"v1", 
   "DB-COLUMN-2": "v2"
 }
 { 
   "recordid": "inputId1#2",      
   "DB-COLUMN-1":"v1", 
   "DB-COLUMN-2": "v3"
 }
 { 
   "recordid": "inputId2#1",   
   "DB-COLUMN-1":"v2", 
   "DB-COLUMN-2": "v4"
 }

Adding JDBC Drivers

By default SMILA includes only JDBC drivers for Derby. If you want to access other databases then you have to provide according JDBC drivers. Have a look here to learn how to add JDBC drivers to SMILA.