Jump to: navigation, search

SMILA/Project Concepts/Core Indexing Process (global view)

< SMILA‎ | Project Concepts
Revision as of 03:18, 15 August 2008 by Daniel.stucky.empolis.com (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Description

This concept describes the general system architecture. Additionally core or general concepts are referenced. Good and bad practices to keep a ceirtain maintainability are handled also.

Key parts of the architecture could be found in the following concepts:

Discussion

Technical proposal

{info} Note: This section may only be edited by assigned developer(s). His responsibility is also to reflect any agreed changes/details in discussion section. {info}


SMILA Core Indexing Process (Global View)

Process overview

Create/Delete Record

  • Compounds
  • Optimized queue access -> could obsolete requests be avoided. An example for a obsolete request is the indexing of a document while a delete operation is already present in the queue. It must be guaranteed that its not required to read all messages in the queue (in client process); The question is are queues available that could cover such issues?
  • Parallel processing could lead to difficulties when concurrently performing create/delete operations
  • The update of fields of a document in the storage must be possible (e.g. the user rights field)
  • Delete by Query (several objects, XQuery, Source, ...)

Delta-Indexing

  • Source or subset (a part of information in storage) based
  • Compounds
  • Status storage using an interface (status for delta discovery at IRM, probably Lucene or an Indexer as storage (e.g. hashes, data, URLs, modifications in user rights)

Index creation

  • Pre/post actions of an index process (e.g. starting of services, invoke a functionality of another external system)

Due to queue usage we did not have a real end "of a indexing process"; how do we solve this?

  • Initial index creation
  • Delta indexing
  • Continue indexing (start at point XY)
  • Stupid append of information (from any origin/source)

Compound Management

  • Processing via BPEL or via a sole pipelet (which approach is better?)
  • How do we cover filters (is it possible to design a relationship between IRM and filter configuration \[P2 for this remark\])
  • Warning: Large streams
  • Recursion
  • Delta indexing
  • Extensibility of compound management (e.g. using extensions points)
  • Ability for debugging
  • Project templates for covering best practices
  • Inheritance of data to child records (e.g. user rights)
  • MIME Type detection
  • In den unterschiedlichen Ausprägungen der Installationen

Maintenance Operations

  • CRUD (e.g. collections, indexes)
  • Backup/Restore/Reset (remove all process related data;

Empty temp storage for delta indexing; empty collection XY)

  • Backup/Restore/Reset (? Maintenance concept; Probably hosted ad eccenca)
  • Migration of software versions (including the data)
  • Reorganization, save/security points, training (e.g. for search, what's related)
  • Adding of nodes (indices, SMILA, ...)
  • Creation of reports or statistics