https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&feed=atom&action=history
SMILA/Documentation/Default configuration workflow overview - Revision history
2024-03-29T13:54:20Z
Revision history for this page on the wiki
MediaWiki 1.26.4
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=286143&oldid=prev
Juergen.schumacher.attensity.com: SMILA/Default configuration workflow overview moved to SMILA/Documentation/Default configuration workflow overview
2012-01-24T13:34:33Z
<p><a href="/SMILA/Default_configuration_workflow_overview" class="mw-redirect" title="SMILA/Default configuration workflow overview">SMILA/Default configuration workflow overview</a> moved to <a href="/SMILA/Documentation/Default_configuration_workflow_overview" title="SMILA/Documentation/Default configuration workflow overview">SMILA/Documentation/Default configuration workflow overview</a></p>
<table class='diff diff-contentalign-left'>
<tr style='vertical-align: top;' lang='en'>
<td colspan='1' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='1' style="background-color: white; color:black; text-align: center;">Revision as of 13:34, 24 January 2012</td>
</tr><tr><td colspan='2' style='text-align: center;' lang='en'><div class="mw-diff-empty">(No difference)</div>
</td></tr></table>
Juergen.schumacher.attensity.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=286033&oldid=prev
Juergen.schumacher.attensity.com at 11:48, 24 January 2012
2012-01-24T11:48:42Z
<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:48, 24 January 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[<del class="diffchange diffchange-inline">Image:DefaultConfigurationWorkflow.png</del>]]</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">This pages given a short explanation of what happens behind the scenes when executing the </ins>[[<ins class="diffchange diffchange-inline">SMILA/Documentation_for_5_Minutes_to_Success|SMILA in 5 Minutes</ins>]] <ins class="diffchange diffchange-inline">example.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">(original slides can be found here</del>: [[Media:DefaultConfigurationWorkflow.<del class="diffchange diffchange-inline">zip|DefaultConfigurationWorkflow</del>.zip]])</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">[[Image</ins>:<ins class="diffchange diffchange-inline">DefaultConfigurationWorkflow-1.0.png]]</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><font size="-1"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">(download </ins>[[Media:DefaultConfigurationWorkflow<ins class="diffchange diffchange-inline">-1</ins>.<ins class="diffchange diffchange-inline">0</ins>.zip<ins class="diffchange diffchange-inline">|this archive</ins>]] <ins class="diffchange diffchange-inline">to get the original Powerpoint file of this diagram</ins>)</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"></font></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">When crawling a web site with SMILA the following happens:</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># The user starts a job with workflow ''updateIndex''. Nothing else happens yet, the job waits for input to process.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># The user starts a job with workflow ''updateIndex''. Nothing else happens yet, the job waits for input to process.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l12" >Line 12:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># The PipelineProcessor worker picks up those record bulks and puts each record (in manageable numbers) on the blackboard ...</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># The PipelineProcessor worker picks up those record bulks and puts each record (in manageable numbers) on the blackboard ...</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># ... and invokes a configured pipeline for either adding/updating or deleting records.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># ... and invokes a configured pipeline for either adding/updating or deleting records.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div># The pipelets in the pipelines take the record data from the blackboard</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div># The pipelets in the pipelines take the record data from the blackboard<ins class="diffchange diffchange-inline">, transform the data, extract further metadata and plain text ...</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># ... and manipulate the SolrIndex accordingly. The index can now be searched using yet another pipeline (not shown here).</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># ... and manipulate the SolrIndex accordingly. The index can now be searched using yet another pipeline (not shown here).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># Finally (and not yet implemented), when the crawl workflow is done, the DeltaService can be asked for all records that have not been crawled in this run, so that ''delete'' records can be sent to the indexing workflow to remove these resources from the index.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div># Finally (and not yet implemented), when the crawl workflow is done, the DeltaService can be asked for all records that have not been crawled in this run, so that ''delete'' records can be sent to the indexing workflow to remove these resources from the index.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">All records produced in this are stored in the ObjectStore while being passed from one worker to the next. The Job/TaskManagement uses Apache Zookeeper to coordinate the work when using multiple SMILA nodes for parallelizing the work to be done.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">Crawling a filesystem works similar, the "fileCrawling" workflow just replaces the "WebCrawler" and "WebFetcher" workers by "FileCrawler" and "FileFetcher" workers.</ins></div></td></tr>
</table>
Juergen.schumacher.attensity.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=286031&oldid=prev
Juergen.schumacher.attensity.com: /* The diagram description */
2012-01-24T11:39:32Z
<p><span dir="auto"><span class="autocomment">The diagram description</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 11:39, 24 January 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">== </del>The <del class="diffchange diffchange-inline">diagram description==</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># </ins>The <ins class="diffchange diffchange-inline">user starts </ins>a job <ins class="diffchange diffchange-inline">with workflow ''updateIndex''. Nothing else happens yet, </ins>the <ins class="diffchange diffchange-inline">job waits for input to process</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># The user starts a job with workflow ''webCrawling'' in <tt>runOnce<</ins>/<ins class="diffchange diffchange-inline">tt> mode</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 1. Data is imported via [[SMILA/Documentation/Crawler|Crawler]] (or [[SMILA/Documentation/Agent|Agent]]) by configuring </del>a <del class="diffchange diffchange-inline">data source and a [[SMILA/Glossary#J|</del>job<del class="diffchange diffchange-inline">]] name via </del>the <del class="diffchange diffchange-inline">[[SMILA/Documentation/CrawlerController|Crawler Controller]] (resp</del>. <del class="diffchange diffchange-inline">[[SMILA</del>/<del class="diffchange diffchange-inline">Documentation/AgentController|Agent Controller]]) JMX API</del>.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># </ins>The <ins class="diffchange diffchange-inline">WebCrawler worker initiates </ins>the <ins class="diffchange diffchange-inline">crawl process </ins>by <ins class="diffchange diffchange-inline">reading </ins>the <ins class="diffchange diffchange-inline">configured start URL</ins>. <ins class="diffchange diffchange-inline">It extracts links and feeds them back to itself, and produces records with metadata and content. Additionally it marks links as visited so that other crawler worker instances will not produce duplicates.</ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 2. </del>The <del class="diffchange diffchange-inline">[[SMILA/Documentation/CrawlerController|Crawler Controller]] initializes </del>the <del class="diffchange diffchange-inline">[[SMILA/Documentation/Crawler|Crawler]] </del>by <del class="diffchange diffchange-inline">assigning a data source and starting </del>the <del class="diffchange diffchange-inline">import</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># </ins>The <ins class="diffchange diffchange-inline">DeltaChecker worker reads </ins>the <ins class="diffchange diffchange-inline">records produced by the crawler </ins>and <ins class="diffchange diffchange-inline">checks in the DeltaService if the crawled resources have changed since a previous crawl run. Unchanged resources are filtered out, only changed and new resources are sent </ins>to the <ins class="diffchange diffchange-inline">next worker</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 3</del>. The <del class="diffchange diffchange-inline">[[SMILA/Documentation/Crawler|Crawler]] retrieves data references from </del>the <del class="diffchange diffchange-inline">'''Data Source''' </del>and <del class="diffchange diffchange-inline">returns them </del>to the <del class="diffchange diffchange-inline">[[SMILA/Documentation/CrawlerController|Crawler Controller]]</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># </ins>The <ins class="diffchange diffchange-inline">WebFetcher worker fetches content of resources that do not have content yet. In </ins>this <ins class="diffchange diffchange-inline">case this would be non-HTML resources because their content </ins>was <ins class="diffchange diffchange-inline">not needed </ins>by the <ins class="diffchange diffchange-inline">crawler worker for link extraction</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 4. </del>The <del class="diffchange diffchange-inline">[[SMILA/Documentation/CrawlerController|Crawler Controller]] determines whether </del>this <del class="diffchange diffchange-inline">particular data is new/modified or </del>was <del class="diffchange diffchange-inline">already indexed </del>by <del class="diffchange diffchange-inline">querying </del>the <del class="diffchange diffchange-inline">[[SMILA/Documentation/DeltaIndexingManager|Delta Indexing Service]]</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># At the end of </ins>the <ins class="diffchange diffchange-inline">crawl workflow</ins>, the <ins class="diffchange diffchange-inline">UpdatePusher worker sends </ins>the <ins class="diffchange diffchange-inline">crawled records with their content </ins>to the <ins class="diffchange diffchange-inline">indexing job </ins>as ''<ins class="diffchange diffchange-inline">added</ins>'' <ins class="diffchange diffchange-inline">records </ins>and <ins class="diffchange diffchange-inline">saves their current state in the delta service</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 5. If </del>the <del class="diffchange diffchange-inline">data was not previously indexed</del>, the <del class="diffchange diffchange-inline">[[SMILA/Documentation/CrawlerController|Crawler Controller]] instructs </del>the <del class="diffchange diffchange-inline">[[SMILA/Documentation/Crawler|Crawler]] </del>to <del class="diffchange diffchange-inline">retrieve </del>the <del class="diffchange diffchange-inline">full data plus content </del>as <del class="diffchange diffchange-inline">[[SMILA/Glossary#R|Record]] (metadata + attachment). </del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># Now </ins>the <ins class="diffchange diffchange-inline">indexing job starts </ins>to <ins class="diffchange diffchange-inline">work: </ins>The <ins class="diffchange diffchange-inline">Bulkbuilder writes </ins>the records to <ins class="diffchange diffchange-inline">index to bulks, depending on if they are to be added to or updated in </ins>the <ins class="diffchange diffchange-inline">index, or if they are </ins>to <ins class="diffchange diffchange-inline">be deleted (which does not happen at this point)</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 6. The [[SMILA/Documentation/Crawler|Crawler]] fetches the complete record from the </del>'''<del class="diffchange diffchange-inline">Data Source</del>'<del class="diffchange diffchange-inline">''. Each record has an ID </del>and <del class="diffchange diffchange-inline">can contain metadata and attachments (binary content)</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># </ins>The <ins class="diffchange diffchange-inline">PipelineProcessor worker picks up those </ins>record <ins class="diffchange diffchange-inline">bulks and puts each record (</ins>in <ins class="diffchange diffchange-inline">manageable numbers) on </ins>the <ins class="diffchange diffchange-inline">blackboard </ins>...</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 7. The [[SMILA/Documentation/CrawlerController|Crawler Controller]] sends </del>the <del class="diffchange diffchange-inline">complete retrieved records </del>to <del class="diffchange diffchange-inline">the [[SMILA/Documentation/ConnectivityManager|Connectivity Manager]].</del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># </ins>.<ins class="diffchange diffchange-inline">.. and invokes a configured pipeline </ins>for <ins class="diffchange diffchange-inline">either adding</ins>/<ins class="diffchange diffchange-inline">updating or deleting records</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 8. </del>The <del class="diffchange diffchange-inline">[[SMILA/Documentation/ConnectivityManager|Connectivity Manager]] routes </del>the records to the <del class="diffchange diffchange-inline">configured job by pushing them </del>to <del class="diffchange diffchange-inline">the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]]</del>.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div># <ins class="diffchange diffchange-inline">The pipelets in the pipelines take </ins>the record <ins class="diffchange diffchange-inline">data from the blackboard</ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 9. </del>The <del class="diffchange diffchange-inline">[[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] persists the </del>record<del class="diffchange diffchange-inline">'s attachment content via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] </del>in the <del class="diffchange diffchange-inline">[[SMILA/Documentation/Binary_Storage|Binary Storage]]</del>. <del class="diffchange diffchange-inline">Only attachment references remanin in the records</del>. <del class="diffchange diffchange-inline">Should any subsequent processes require the record’s full content, they can access it via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]]</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># .</ins>.<ins class="diffchange diffchange-inline">. and manipulate </ins>the <ins class="diffchange diffchange-inline">SolrIndex accordingly</ins>. <ins class="diffchange diffchange-inline">The index can now be searched using yet another pipeline (not shown here)</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 10</del>. <del class="diffchange diffchange-inline">Records are cumulated in [[SMILA/Glossary#B|bulks]] </del>for <del class="diffchange diffchange-inline">asynchronous workflow processing. [[SMILA</del>/<del class="diffchange diffchange-inline">Glossary#R|Record bulks]] are stored in '''ObjectStore'''</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"># Finally </ins>(<ins class="diffchange diffchange-inline">and not yet implemented</ins>)<ins class="diffchange diffchange-inline">, when </ins>the <ins class="diffchange diffchange-inline">crawl </ins>workflow <ins class="diffchange diffchange-inline">is done, the DeltaService can be asked for all </ins>records <ins class="diffchange diffchange-inline">that have not been crawled in this run, so that </ins>''<ins class="diffchange diffchange-inline">delete</ins>'' records <ins class="diffchange diffchange-inline">can be sent </ins>to the <ins class="diffchange diffchange-inline">indexing workflow </ins>to <ins class="diffchange diffchange-inline">remove these resources </ins>from the <ins class="diffchange diffchange-inline">index</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 11. An [[SMILA/Glossary</del>#<del class="diffchange diffchange-inline">W|asynchronous workflows]] is executed triggered by </del>the <del class="diffchange diffchange-inline">[[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated </del>record <del class="diffchange diffchange-inline">bulk</del>. <del class="diffchange diffchange-inline">This is managed by </del>the <del class="diffchange diffchange-inline">[[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components</del>. <del class="diffchange diffchange-inline">Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessorWorker|BPEL worker]] for embedding </del>(<del class="diffchange diffchange-inline">resp. executing</del>) <del class="diffchange diffchange-inline">synchronous BPEL pipelines in </del>the <del class="diffchange diffchange-inline">asynchronous </del>workflow<del class="diffchange diffchange-inline">. Added </del>records <del class="diffchange diffchange-inline">are passed to the predefined BPEL pipeline </del>''<del class="diffchange diffchange-inline">AddPipeline</del>''<del class="diffchange diffchange-inline">, deleted </del>records to the <del class="diffchange diffchange-inline">''DeletePipeline''. </del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 13. A BPEL pipeline uses a set of ''Pipelets'' </del>to <del class="diffchange diffchange-inline">process a record's data (e.g. extracting text </del>from <del class="diffchange diffchange-inline">various document or image file types). After processing </del>the <del class="diffchange diffchange-inline">records the pipelets can store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service. </del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 14. The Add- and DeletePipeline contain a [[SMILA/Documentation/Solr#How_to_use_Solr_with_SMILA|SolrIndexPipelet]] which is finally invoked to update the '''Solr/Lucene Index'''</del>.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div></div></td></tr>
</table>
Juergen.schumacher.attensity.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=284642&oldid=prev
Daniel.stucky.attensity.com at 09:18, 16 January 2012
2012-01-16T09:18:56Z
<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 09:18, 16 January 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l18" >Line 18:</td>
<td colspan="2" class="diff-lineno">Line 18:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessorWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessorWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. A BPEL pipeline uses a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types). After processing the records the pipelets can store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. A BPEL pipeline uses a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types). After processing the records the pipelets can store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline contain a [[SMILA/Documentation/<del class="diffchange diffchange-inline">LuceneIndexPipelet</del>|<del class="diffchange diffchange-inline">LuceneIndexPipelet</del>]] which is finally invoked to update the '''Lucene Index'''.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline contain a [[SMILA/Documentation/<ins class="diffchange diffchange-inline">Solr#How_to_use_Solr_with_SMILA</ins>|<ins class="diffchange diffchange-inline">SolrIndexPipelet</ins>]] which is finally invoked to update the '''<ins class="diffchange diffchange-inline">Solr/</ins>Lucene Index'''.</div></td></tr>
</table>
Daniel.stucky.attensity.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=269342&oldid=prev
Drazen.cindric.attensity.com at 09:26, 21 September 2011
2011-09-21T09:26:54Z
<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 09:26, 21 September 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l16" >Line 16:</td>
<td colspan="2" class="diff-lineno">Line 16:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated in [[SMILA/Glossary#B|bulks]] for asynchronous workflow processing. [[SMILA/Glossary#R|Record bulks]] are stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated in [[SMILA/Glossary#B|bulks]] for asynchronous workflow processing. [[SMILA/Glossary#R|Record bulks]] are stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/<del class="diffchange diffchange-inline">PipelineProcessingWorker</del>|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/<ins class="diffchange diffchange-inline">PipelineProcessorWorker</ins>|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. A BPEL pipeline uses a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types). After processing the records the pipelets can store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. A BPEL pipeline uses a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types). After processing the records the pipelets can store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline contain a [[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] which is finally invoked to update the '''Lucene Index'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline contain a [[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] which is finally invoked to update the '''Lucene Index'''.</div></td></tr>
</table>
Drazen.cindric.attensity.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=267033&oldid=prev
Andreas.Weber.empolis.com at 13:39, 5 September 2011
2011-09-05T13:39:17Z
<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 13:39, 5 September 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[Image:DefaultConfigurationWorkflow.<del class="diffchange diffchange-inline">jpg</del>]]</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Image:DefaultConfigurationWorkflow.<ins class="diffchange diffchange-inline">png</ins>]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</div></td></tr>
</table>
Andreas.Weber.empolis.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=267031&oldid=prev
Andreas.Weber.empolis.com: /* The diagramme description */
2011-09-05T13:34:08Z
<p><span dir="auto"><span class="autocomment">The diagramme description</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 13:34, 5 September 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>== The <del class="diffchange diffchange-inline">diagramme </del>description==</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== The <ins class="diffchange diffchange-inline">diagram </ins>description==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 1. Data is imported via [[SMILA/Documentation/Crawler|Crawler]] (or [[SMILA/Documentation/Agent|Agent]]) by configuring a data source and a [[SMILA/Glossary#J|job]] name via the [[SMILA/Documentation/CrawlerController|Crawler Controller]] (resp. [[SMILA/Documentation/AgentController|Agent Controller]]) JMX API.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 1. Data is imported via [[SMILA/Documentation/Crawler|Crawler]] (or [[SMILA/Documentation/Agent|Agent]]) by configuring a data source and a [[SMILA/Glossary#J|job]] name via the [[SMILA/Documentation/CrawlerController|Crawler Controller]] (resp. [[SMILA/Documentation/AgentController|Agent Controller]]) JMX API.  </div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l16" >Line 16:</td>
<td colspan="2" class="diff-lineno">Line 16:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated in [[SMILA/Glossary#B|bulks]] for asynchronous workflow processing. [[SMILA/Glossary#R|Record bulks]] are stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated in [[SMILA/Glossary#B|bulks]] for asynchronous workflow processing. [[SMILA/Glossary#R|Record bulks]] are stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessingWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''. A BPEL pipeline <del class="diffchange diffchange-inline">is a process using </del>a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types)<del class="diffchange diffchange-inline">. </del></div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessingWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''<ins class="diffchange diffchange-inline">. </ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* 13</del>. After processing the records the pipelets store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* 13</ins>. A BPEL pipeline <ins class="diffchange diffchange-inline">uses </ins>a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types). After processing the records the pipelets <ins class="diffchange diffchange-inline">can </ins>store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline <del class="diffchange diffchange-inline">finally invoke the </del>[[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] to update the '''Lucene Index'''.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline <ins class="diffchange diffchange-inline">contain a </ins>[[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] <ins class="diffchange diffchange-inline">which is finally invoked </ins>to update the '''Lucene Index'''.</div></td></tr>
</table>
Andreas.Weber.empolis.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=267028&oldid=prev
Andreas.Weber.empolis.com at 13:28, 5 September 2011
2011-09-05T13:28:52Z
<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 13:28, 5 September 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:DefaultConfigurationWorkflow.jpg]]</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>[[Image:DefaultConfigurationWorkflow.jpg]]</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">(original slides can be found here: [[Media:DefaultConfigurationWorkflow.zip|DefaultConfigurationWorkflow.zip]])</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== The diagramme description==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== The diagramme description==</div></td></tr>
</table>
Andreas.Weber.empolis.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=267026&oldid=prev
Andreas.Weber.empolis.com: /* The diagramme description */
2011-09-05T13:20:51Z
<p><span dir="auto"><span class="autocomment">The diagramme description</span></span></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 13:20, 5 September 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l7" >Line 7:</td>
<td colspan="2" class="diff-lineno">Line 7:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 3. The [[SMILA/Documentation/Crawler|Crawler]] retrieves data references from the '''Data Source''' and returns them to the [[SMILA/Documentation/CrawlerController|Crawler Controller]].</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 3. The [[SMILA/Documentation/Crawler|Crawler]] retrieves data references from the '''Data Source''' and returns them to the [[SMILA/Documentation/CrawlerController|Crawler Controller]].</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 4. The [[SMILA/Documentation/CrawlerController|Crawler Controller]] determines whether this particular data is new/modified or was already indexed by querying the [[SMILA/Documentation/DeltaIndexingManager|Delta Indexing Service]].</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 4. The [[SMILA/Documentation/CrawlerController|Crawler Controller]] determines whether this particular data is new/modified or was already indexed by querying the [[SMILA/Documentation/DeltaIndexingManager|Delta Indexing Service]].</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 5. If the data was not previously indexed, the [[SMILA/Documentation/CrawlerController|Crawler Controller]] instructs the [[SMILA/Documentation/Crawler|Crawler]] to retrieve the full data plus content as <del class="diffchange diffchange-inline">record </del>(metadata + attachment).  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 5. If the data was not previously indexed, the [[SMILA/Documentation/CrawlerController|Crawler Controller]] instructs the [[SMILA/Documentation/Crawler|Crawler]] to retrieve the full data plus content as <ins class="diffchange diffchange-inline">[[SMILA/Glossary#R|Record]] </ins>(metadata + attachment).  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 6. The [[SMILA/Documentation/Crawler|Crawler]] fetches the complete record from the '''Data Source'''. Each record has an ID and can contain metadata and attachments (binary content).</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 6. The [[SMILA/Documentation/Crawler|Crawler]] fetches the complete record from the '''Data Source'''. Each record has an ID and can contain metadata and attachments (binary content).</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 7. The [[SMILA/Documentation/CrawlerController|Crawler Controller]] sends the complete retrieved records to the [[SMILA/Documentation/ConnectivityManager|Connectivity Manager]].</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 7. The [[SMILA/Documentation/CrawlerController|Crawler Controller]] sends the complete retrieved records to the [[SMILA/Documentation/ConnectivityManager|Connectivity Manager]].</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 8. The [[SMILA/Documentation/ConnectivityManager|Connectivity Manager]] routes the records to the configured job by pushing them to the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]].  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 8. The [[SMILA/Documentation/ConnectivityManager|Connectivity Manager]] routes the records to the configured job by pushing them to the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]].  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 9. The [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] persists the record's attachment content via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] in the [[SMILA/Documentation/Binary_Storage|Binary Storage]]. Only attachment references remanin in the records. Should any subsequent processes require the record’s full content, they can access it via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]].</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 9. The [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] persists the record's attachment content via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] in the [[SMILA/Documentation/Binary_Storage|Binary Storage]]. Only attachment references remanin in the records. Should any subsequent processes require the record’s full content, they can access it via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]].</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated <del class="diffchange diffchange-inline">as </del>bulks for asynchronous workflow processing. Record bulks are stored in '''ObjectStore'''.</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated <ins class="diffchange diffchange-inline">in [[SMILA/Glossary#B|</ins>bulks<ins class="diffchange diffchange-inline">]] </ins>for asynchronous workflow processing. <ins class="diffchange diffchange-inline">[[SMILA/Glossary#R|</ins>Record bulks<ins class="diffchange diffchange-inline">]] </ins>are stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessingWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''. A BPEL pipeline is a process using a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types).  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessingWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''. A BPEL pipeline is a process using a set of ''Pipelets'' to process a record's data (e.g. extracting text from various document or image file types).  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. After processing the records the pipelets store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. After processing the records the pipelets store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline finally invoke the [[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] to update the '''Lucene Index'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline finally invoke the [[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] to update the '''Lucene Index'''.</div></td></tr>
</table>
Andreas.Weber.empolis.com
https://wiki.eclipse.org/index.php?title=SMILA/Documentation/Default_configuration_workflow_overview&diff=267024&oldid=prev
Andreas.Weber.empolis.com at 12:56, 5 September 2011
2011-09-05T12:56:06Z
<p></p>
<table class='diff diff-contentalign-left'>
<col class='diff-marker' />
<col class='diff-content' />
<col class='diff-marker' />
<col class='diff-content' />
<tr style='vertical-align: top;' lang='en'>
<td colspan='2' style="background-color: white; color:black; text-align: center;">← Older revision</td>
<td colspan='2' style="background-color: white; color:black; text-align: center;">Revision as of 12:56, 5 September 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[Image:<del class="diffchange diffchange-inline">Schema_</del>.jpg]]</div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Image:<ins class="diffchange diffchange-inline">DefaultConfigurationWorkflow</ins>.jpg]]</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== The diagramme description==</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>== The diagramme description==</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l14" >Line 14:</td>
<td colspan="2" class="diff-lineno">Line 14:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated as bulks for asynchronous workflow processing. Record bulks are stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 10. Records are cumulated as bulks for asynchronous workflow processing. Record bulks are stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 11. An [[SMILA/Glossary#W|asynchronous workflows]] is executed triggered by the [[SMILA/Documentation/Bulkbuilder|Bulkbuilder]] generated record bulk. This is managed by the [[SMILA/Documentation/JobManager|Jobmanager]] and [[SMILA/Documentation/TaskManager|Taskmanager]] components. Runtime/Synchronization data is stored in '''Zookeeper''', persistent data is stored in '''ObjectStore'''.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessingWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''. A BPEL pipeline is a process using a set of ''<del class="diffchange diffchange-inline">pipelets</del>'' to process a record's data (e.g. extracting text from various document or image file types).  </div></td><td class='diff-marker'>+</td><td style="color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* 12. Predefined asynchronous workflow ''indexUpdate'' contains [[SMILA/Documentation/Worker/PipelineProcessingWorker|BPEL worker]] for embedding (resp. executing) synchronous BPEL pipelines in the asynchronous workflow. Added records are passed to the predefined BPEL pipeline ''AddPipeline'', deleted records to the ''DeletePipeline''. A BPEL pipeline is a process using a set of ''<ins class="diffchange diffchange-inline">Pipelets</ins>'' to process a record's data (e.g. extracting text from various document or image file types).  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. After processing the records the pipelets store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 13. After processing the records the pipelets store the gathered additional data via the [[SMILA/Documentation/Usage_of_Blackboard_Service|Blackboard]] service.  </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline finally invoke the [[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] to update the '''Lucene Index'''.</div></td><td class='diff-marker'> </td><td style="background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;"><div>* 14. The Add- and DeletePipeline finally invoke the [[SMILA/Documentation/LuceneIndexPipelet|LuceneIndexPipelet]] to update the '''Lucene Index'''.</div></td></tr>
</table>
Andreas.Weber.empolis.com