SMILA/Project Concepts/Core Indexing Process (global view)

Description

This concept describes the general system architecture. Additionally core or general concepts are referenced. Good and bad practices to keep a ceirtain maintainability are handled also.

Key parts of the architecture could be found in the following concepts:

SMILA/Project_Concepts/Components and Modules

Discussion

Technical proposal

{info} Note: This section may only be edited by assigned developer(s). His responsibility is also to reflect any agreed changes/details in discussion section. {info}

SMILA Core Indexing Process (Global View)

Process overview

Create/Delete Record

Compounds
Optimized queue access -> could obsolete requests be avoided. An example for a obsolete request is the indexing of a document while a delete operation is already present in the queue. It must be guaranteed that its not required to read all messages in the queue (in client process); The question is are queues available that could cover such issues?
Parallel processing could lead to difficulties when concurrently performing create/delete operations
The update of fields of a document in the storage must be possible (e.g. the user rights field)
Delete by Query (several objects, XQuery, Source, ...)

Delta-Indexing

Source or subset (a part of information in storage) based
Compounds
Status storage using an interface (status for delta discovery at IRM, probably Lucene or an Indexer as storage (e.g. hashes, data, URLs, modifications in user rights)

Index creation

Pre/post actions of an index process (e.g. starting of services, invoke a functionality of another external system)

Due to queue usage we did not have a real end "of a indexing process"; how do we solve this?

Initial index creation
Delta indexing
Continue indexing (start at point XY)
Stupid append of information (from any origin/source)

Compound Management

Processing via BPEL or via a sole pipelet (which approach is better?)
How do we cover filters (is it possible to design a relationship between IRM and filter configuration \[P2 for this remark\])
Warning: Large streams
Recursion
Delta indexing
Extensibility of compound management (e.g. using extensions points)
Ability for debugging
Project templates for covering best practices
Inheritance of data to child records (e.g. user rights)
MIME Type detection
In den unterschiedlichen Ausprägungen der Installationen

Maintenance Operations

CRUD (e.g. collections, indexes)
Backup/Restore/Reset (remove all process related data;

Empty temp storage for delta indexing; empty collection XY)

Backup/Restore/Reset (? Maintenance concept; Probably hosted ad eccenca)
Migration of software versions (including the data)
Reorganization, save/security points, training (e.g. for search, what's related)
Adding of nodes (indices, SMILA, ...)
Creation of reports or statistics

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Project Concepts/Core Indexing Process (global view)

Contents

Description

Discussion

Technical proposal

SMILA Core Indexing Process (Global View)

Process overview

Create/Delete Record

Delta-Indexing

Index creation

Compound Management

Maintenance Operations

Breadcrumbs

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

SMILA/Project Concepts/Core Indexing Process (global view)

Contents

Description

Discussion

Technical proposal

SMILA Core Indexing Process (Global View)

Process overview

Create/Delete Record

Delta-Indexing

Index creation

Compound Management

Maintenance Operations