Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "SMILA/Project Concepts/Binary Storage"

(B. RDBMS)
(Cluster configuration)
Line 239: Line 239:
  
 
The two cluster options shall be available even configuring both at the same time, or only one, or non of them.
 
The two cluster options shall be available even configuring both at the same time, or only one, or non of them.
 +
 +
Both options must be supported by the frameworks / RDBMS that Binary Storage Service will use.
  
 
== Concurrent access ==
 
== Concurrent access ==

Revision as of 10:31, 22 October 2008

Overview

Design a service to easy store / access binary data documents.

Description

Client components will access the Binary Storage Service for persisting binary data (attachments) into the binary storage. The binary data shall be simply identified by a unique key / identifier as a String data type. No directly client component access to the persistence storage shall be available; the persistence storage will be only accessible through the Binary Storage Service API which provides the needed CRUD operations.

Backend mechanism of Binary Storage shall be completely transparent to the client, thus user shall have the opportunity to setup basic configuration of the service. Binary Storage shall be able to determine and use default/optimistic configuration in case no one is specified by the user.

Storage Mechanism Internal Structure.

Binary Storage will depend on the amount of data it needs to persist/manage. Because of this the persistence storage of service shall be able to deal with fallowing persistence structures/techniques, depending on service configuration:

  • A. File System
    • I. Local hard drive
      • 1. Flat structure
      • 2. Hierarchical structure
    • II. Distributed file system (SFTP, FTP)
      • 1. Flat structure
      • 2. Hierarchical structure
  • B. RDBMS
  • C. Object DataBase (ODBMS)

One of the persistence options will be used by the Binary Storage Service at the running time. Internally, the DAO & DAO Factory concept provides the appropriate/configured persistence option implementation to the Binary Storage Service independently. User can shall configure its appropriate persistence option that satisfies his/her needs.

SMILA-BinaryStorage-HighLevel.jpg

A. File System

Binary Storage Service saves the binary data directly in the file system.

I. Local hard drive

The service saves data in the local drive using a predefined persistence storage location - binary.storage.root.path. Under this root path Binary Storage will create its files system structures flat or hierarchical, depending on the configuration.

1. Flat structure

The file system - flat structure configuration shall be used in case of small amount of data, since all the attachments will be saved in the same path location. For huge amount of data the systems becomes very slow, time responding increases significantly.

This option shall only be used for debugging purpose, since it offers an easy way for locating a specific persisted attachment.

In case of no initial configuration is provided by the user, the file system - flat structure option shall not be used as default.

SMILA-BinaryStorage-Flat.jpg

2. Hierarchical structure

Through the hierarchical file system persistence, Binary Storage Service will manage by itself a configurable & hierarchically & internal structure under the configured persistence storage root path. This is to be considered the default configuration. Following parameters are available for configuring the hierarchical structure:

  • q - Maximum number of subfolders per folder
  • r - Maximum number of persisted attachments per folder

The hierarchical (tree) structure is being created during the the storage of new data (the initial number of binary data which is going to be persisted is unknown for Binary Storage Service).

Hierarchical structure nomenclature. Test scenario

Fallowing picture outlines the hierarchic overview, file system nomenclature and binary data distribution (persistence) inside the tree structure. In the illustrated sample , our test scenario uses a total number of attachments to be stored 360. The configuration values are :

q = 3 (maximum 3 subfolders per folder in the hierarchy)
r = 10 (maximum 10 files stored in a folder inside of the hierarchic structure)
t = 360 (total number of attachments-files to be stored)

As it is outlined in the picture, the total number of subfolders from hierarchy structure represents a geometric progression (in mathematics also known as a geometric sequence), where each term (number of folders from the same level of the hierarchy) after the first is found by multiplying the previous one by a fixed non-zero number called the common ratio. The common ration identifies with maximum number of sub-folders per folder - q.

For a optimistic hierarchy, it is possible to determine fallowing parameters in order to configure the hierarchy as user expects:

  • total number of folders from structure (t/r) - 36
  • total number of subfolder at each level from the structure - Bn formula. (b1= 1 folder at the first level; b2=3 folders at the second level; b3=9 folders at the third level; b4=23 folders at the fourth level)
  • total number of subfolders from the hierarchy (in full mode) - Sn formula
  • total number of subfolders from the deepest level : 23 folders, Bn' = Bn - (Sn - t/r). (Bn=27; Sn=40; t/r=36)
  • hierarchy level - n=4 (logarithmical function)

SMILA-BinaryStorage-Hierarchical.jpg

Binary Storage Service has to map each attachment identifier to the path where the binary data is being stored. The mappings needs to be persisted so they can be reused if the system gets restarted.

An external manipulation of persistence storage structure (like deleting data form inside of it) will break the mappings. This is considered an exceptional case and it is not in the Binary Storage Service scope.

Fallowing table provides an overview of the hierarchical structure based on configured parameters:

t (number of attachments) q (max number of subfolders in folder) r (max number of att.in folder) TF (number of folders) n (hierarchical level) Sn (full mode hierarchy) M (missing subfolders in deepest level)
360 3 10 36 4 40 4
1000000 25 200 5000 4 16276 11276
1000000 50 200 5000 4 127551 122551
1000000 75 300 3333 3 5701 2368
1000000 100 300 3333 3 10101 6768
1000000 150 300 3333 3 22651 19318
1000000 200 350 2857 3 40201 37344
1000000 250 500 2000 3 62751 60751
1000000 300 750 1333 3 90301 88968
1000000 100 400 2500 3 10101 7601
1000000 200 500 2000 3 40201 38201
1000000 250 600 1666 3 62751 61085
1000000 300 750 1333 3 90301 88968
1000000 300 1000 1000 3 90301 89301

II. Distributed file system (SFTP, FTP)

The purpose of distributed file system option is to allow storing/accessing of binary data into network storage devices transparently through SFTP, FTP. User needs to provide configuration data related to distributed systems (host, user, password). It is considered that user has write access to the distributed system at the configured persistence location path.

1. Flat structure

It has the same characteristics like in the local flat configuration, but it is applied on the persistence location from the distributed system.

2. Hierarchical structure

It has the same characteristics like in the local hierarchic configuration, but it is applied on the persistence location from the distributed system.

B. RDBMS

Binary Storage Service shall also be able to store the records (blob) into RDBMS. Url connection-string shall be configured (driver, host, database name, port, user, password).

C. Object DataBase

Binary Storage Service will use existing Open Source Object Database Engine to store the binary data. It is not Binary Storage Service responsibility to directly access the stored data; the service will store/fetch the data by using the exposed API.

Open Source Database Engines :

  • Oracle Berkeley DB
  • Berkeley DB Java Edition

Cluster configuration

Binary Storage Service must offer fallowing cluster configurations:

  • Client components clustering (like blackboard service) - where the client services run in cluster and all the cluster nodes needs to share the same data (which means, all the nodes have to share the same binary persistence storage). This will be only possible by configuring the Binary Storage Service to use the distributed file system option or object database option (where the Object Database Engine is capable to be remotely accessed - not an embedded designed database which does not allow directly remote connections);
  • Persistence storage clustering - where binary data will be stored in more than one nodes.

The two cluster options shall be available even configuring both at the same time, or only one, or non of them.

Both options must be supported by the frameworks / RDBMS that Binary Storage Service will use.

Concurrent access

Binary Storage Service shall allow multiple clients accessing (read/write) the storage. It is Binary Storage Service responsibilities to synchronize the operation in order to avoid deadlocks.

Technical aspects for designing the Binary Storage Service

  • Binary data compressing and encryption shall be available via configuration
  • Binary storage shall internally manage its persistence hierarchy.
  • The binary service shall be designed as a single bundle / service.
  • Exception handling mechanism should treat all internal binary storage (logical and unexpected) errors and wrap the exceptions into a “binary storage exception” that makes sense for the Blackboard service
  • Resources synchronization shall be done at the lowest possible level
  • Binary Storage shall manage its configuration internally (highly couple classes are difficult to maintain and hard to understand in isolation – they tend to introduce internal dependencies). Decouple binary storage configuration from blackboard service

2.Sequence Diagram NewBinaryStorae.jpg

  • The Binary Storage Service API shall stay as simple as possible
void store(String id, InputStream stream);
void store(String id, byte[] blob)
byte[] fetchAsByte(String id)
InputStream fetchAsStream(String id)
void remove(String id)
int fetchSize(String id)

UML Class Diagram

SMILA-BinaryStorage-ClassDiagram.jpg

Back to the top