TPTP data persistence layer

Overview

Scalability, scalability, and scalability with appropriate performance are the main themes for today's TPTP model/data persistence layer, with some stringent solution targeted for TPTP 4.4.

We should be limited only by the amount of disk space (for local and distributed cases) in our scenarios. Log analysis, trace analysis, symptom analysis, test execution results analysis and statistical data analysis are case where the amount of data generated can become extremely large and a scalable data store mechanism with appropriate performance becomes an important requirement.

During our first 3 TPTP releases we realized that EMF file based approach would not provide us enough scalability, although for those cases where the model fits in the available memory (RAM) it is very performant.

In the following sections we will discuss the different approaches from the current EMF based approach to the service oriented backed by a database data store approach.

By looking at the current TPTP use cases where the persistence layer is involved we will also drive the intended complexity of the first implementation.

The intention is to produce something that would also be directly reusable in other projects, COSMOS being the first candidate, being also one that will rely on other parts of TPTP.

We need to cover all scenarios (this could happen incrementally with initial emphasis on the main pain points) that are currently available in TPTP 4.3 and have a low impact on the UI required changes to support the new approach, in the same time keep the user experience close to what we have in TPTP 4.3, but with much improved performance and scalability.

EMF based approach

This section is under construction. Here are some quick notes.

Today we use EMF for most of our data modeling/manipulation needs. EMF is a very popular modeling framework which provides a powerful and extensible modeling infrastructure including very performant runtime and good integrated XML based persistence layer.

The main problem that we have been tackling in TPTP was to tweak our EMF based implementation so it can scale appropriately. We worked around some problems and we will continue to see how we can still leverage EMF in TPTP, but in the same time we are looking to move toward a more controllable infrastructure regarding memory footprint and simpler data manipulation using specialized services instead of the complex and flexible approach that we have today. Less should be better in this case.

Most of the things we learnt using (and tweaking) EMF should help us define and approach a better infrastructure for TPTP 4.4 and later versions.

A presentation which shows some of the things we tried to do in order to improve this approach TPTPModel-EMF-scalability.zip

Service based approach

This section is under construction. Here are some quick notes.

File vs database

This section is under construction. Here are some quick notes.

Simple vs complex queries/results structures

This section is under construction. Here are some quick notes.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

TPTP data persistence layer

Contents

Overview

EMF based approach

Service based approach

File vs database

Simple vs complex queries/results structures

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

TPTP data persistence layer

Contents

Overview

EMF based approach

Service based approach

File vs database

Simple vs complex queries/results structures