Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "TPTP DMS"

(Event and schema driven contracts)
(Event and schema driven contracts)
Line 69: Line 69:
 
====Event and schema driven contracts====
 
====Event and schema driven contracts====
 
As can be seen from the previous information, the bulk of DMS is a framework for building domain specific solutions. The domains that TPTP has captured in the past are as follows:
 
As can be seen from the previous information, the bulk of DMS is a framework for building domain specific solutions. The domains that TPTP has captured in the past are as follows:
*Trace - stack and heap information captured as graphs and/or statistical counters
+
*'''Trace''' - stack and heap information captured as graphs and/or statistical counters
*Test - test definition and behavior as well as related execution logs
+
*'''Test''' - test definition and behavior as well as related execution logs
*Log - logs that have been transformed into the generic common base event format
+
*'''Log''' - logs that have been transformed into the generic common base event format
*Symptoms - pattern matching database used to analyse the log data
+
*'''Symptoms''' - pattern matching database used to analyse the log data
*Statistics - generic hierarchy of snapshots of statistical data over time
+
*'''Statistics''' - generic hierarchy of snapshots of statistical data over time
 
Each of these domains have a related set of event specifications, loaders as well as a EMF based model. The EMF model was common to the client model and the storage system. This simple structure works well, but does not enforce a complete separation of concerns or provide a way to scale to large volumes of data. However much of this structure can and will be carried forward in spirit, if not in implementation.
 
Each of these domains have a related set of event specifications, loaders as well as a EMF based model. The EMF model was common to the client model and the storage system. This simple structure works well, but does not enforce a complete separation of concerns or provide a way to scale to large volumes of data. However much of this structure can and will be carried forward in spirit, if not in implementation.
  
 
If we take a simple log event through DMS we can highlight each of the domain specific contracts that will be in place.  
 
If we take a simple log event through DMS we can highlight each of the domain specific contracts that will be in place.  
#(Event producer)the XML based event can continue to be used.  
+
# '''(Event producer)'''the XML based event can continue to be used.  
#(Event parser)The event parser that exists can also continue more or less as it is today. However the bean used to hold the event data must have an extensible "add yourself" implementation that will more dynamically resolve and implementation.  
+
# '''(Event parser)'''The event parser that exists can also continue more or less as it is today. However the bean used to hold the event data must have an extensible "add yourself" implementation that will more dynamically resolve and implementation.  
#(Store specific loader)The current implementation of the add yourself for the EMF model can continue to be used to populate an EMF store, however it needs to be configured rather than being compiled in as part of the bean.
+
# '''(Store specific loader)'''The current implementation of the add yourself for the EMF model can continue to be used to populate an EMF store, however it needs to be configured rather than being compiled in as part of the bean.
 
##A RDB implementation will need to be implemented that leverages JDBC and other standard RDB infrastructure
 
##A RDB implementation will need to be implemented that leverages JDBC and other standard RDB infrastructure
#(Data store)The current EMF CBE model can be used for the storage in zipped xmi resources
+
# '''(Data store)'''The current EMF CBE model can be used for the storage in zipped xmi resources
 
##A RDB will have to be put in place to support large scale persistence. Views will be provided for all read access
 
##A RDB will have to be put in place to support large scale persistence. Views will be provided for all read access
#(Store specific query/access api)The current EMF query infrastructure used in TPTP can provide the implementation of the EMF storage specific component
+
# '''(Store specific query/access api)'''The current EMF query infrastructure used in TPTP can provide the implementation of the EMF storage specific component
 
##A JDBC based implementation will be provided for the RDB based store
 
##A JDBC based implementation will be provided for the RDB based store
#The current TPTP Eclipse workbench can be reused as a client once adjusted to use DMS rather the current direct leveraging of paging lists etc.. Note that this would at least logically be a separate instance of the EMF model from the storage model
+
# '''(Client)'''The current TPTP Eclipse workbench can be reused as a client once adjusted to use DMS rather the current direct leveraging of paging lists etc.. Note that this would at least logically be a separate instance of the EMF model from the storage model
#Although not the simplest implementation the current EMF model can be used as the client object model
+
# '''(Client Object Model)'''Although not the simplest implementation the current EMF model can be used as the client object model
 
##A simple pojo model can be used for more direct application clients
 
##A simple pojo model can be used for more direct application clients
#Logically a new layer of mapping has to be introduced to separate the storage model from the client model this would for example access a list of common base events and provide a formater to the client types.
+
# '''(Result specific access api)'''Logically a new layer of mapping has to be introduced to separate the storage model from the client model this would for example access a list of common base events and provide a formater to the client types.
  
 
====Control mechanisms====
 
====Control mechanisms====

Revision as of 15:31, 27 March 2007

TPTP DMS

Requirements Overview

High Level Structure

This image shows a very high level rendering of the runtime components involved in the data flow through the data management services of TPTP.

Basic DMS structure

The following gives an overview of each component in this block architecture.

Event Source

An event source is nothing more than a data provider to the loader. In TPTP this often thought to be an agent, but even though that may be where an event originated, the event source in context of this discussion is the input stream to the loader.

There is an implicit contract between the event source and the loader that the data provided is consumable by the loader, however since the transport layers available in TPTP are nothing but a transport service this contract is delegated to the data source pushing data into the transport.

TPTP maintains a set of specifications for event formats it can create, transport and consume. These events will continue to be extended in TPTP as the need arises. Prior to 4.4 events were small and incremental. In 4.4 bulk events will be introduced. These already in effect exist in the form of a "large" XML stream of already known events, wrapped in some sort of envelope. This is how the current trace file import works. In 4.4 we will make this more scalable from a transport and loader perspective.

The design is not complete but may include options of simply passing a Uri to an batch resource of a known format. Also under consideration is embedding and CSV stream. This is not really an event source specific issue, however it is mentioned here to prevent any assumption about the structure or format of the data that may arrive through the stream.

Event Parser

The event loader consists of a few simple processing steps. The first is the event parser, the second is the store specific loader. The contract between these two components is a strongly type data object. In TPTP this object is a simple Java object. The binding between these two steps is directly controlled by the object created. Once the parser has populated the members of the object it will invoke the "add yourself" method, which will trigger the loader behaviour. This may migrate to an asynchronous event model in the future but in 4.4, parsing and loading will be synchronized operations.

The role of the event parser is to instantiate the object(s) needed by the loader, and set the members as specified by the input stream. All default content must be provided by the pojo. In the current TPTP implementation, the specific event parser is selected based on the initial part of the event stream. At this time, events are formated as XML so the element name is the key used to select the parser. Alternate parsers can be easily registered for XML formated events, and with some additional work the same could be done for other stream formats.

This framework allows for a complete decoupling of the input stream format from the loader by using the intermeadiate pojo.

Store specific loader

The store specific loader takes an input pojo and does what ever is needed to process and likely persist it. The historical TPTP implementation instantiated an EMF model and deferred persistence to the EMF resource serialization. Although this is still of interest for quick in memory models, it is not a completely scalable for large volumes of related data by default.

Going forward, the loader is clearly bound to the storage system, and should be optimized for the data store of choice. This implies that the loader needs to be registered independent from the event parser implementation, but still needs to be coupled to it at run time. This registrations will be configurable via plug-in extensions, a configuration file or dynamically. The "add yourself" implementation of the pojo will deal with this aspect of the configuration. This approach will maintain compatibility for current extenders of the loader infrastructure.

Data store

The data store is just that, a service that will persist and retrieve data on demand and manage a consistent state of that data. An example data store is EMF which will manage an in memory data representation, and will also serialize the data into a zipped XML serialization in the case of TPTP. This format will in fact be very similar if not identical to the structures historically used in TPTP. A more scalable example will be a relational database. TPTP will provide a relational database implementation for it's models. This includes a set of DDL and contracts of behavior with the store specific loaders and the store specific query system. This implementation is intended to be portable to various RDB implementations, however as long as the contracts are maintained, alternate schema can be used that may be more optimal is certain configurations.

The loader contract includes a specific schema for storage as well as a consistent state or context for the loaders to operate in. In the case of EMF this is the object graph itself. In this case, due to the fact that the graph is considered to be all in memory, there is no support for commit scopes or data stability beyond basic synchronization locking done in the object graph itself.

In the case of the relational store, this is the RDB schema. Where possible, update able views will be provided to isolate the actual tables for insert and update actions. Views will be provided for all supported read patterns. This will allow for some schema evolution and implementation optimizations. It is up to the loader to declare and use commit scopes.

The query and access contract is provided through views in the case of the relational implementation. In the case of EMF, the object graph itself is exposed.

Client

The client is not a part of the DMS but is included here for completeness and to facilitate various use case discussions.

In TPTP the primary client is the Eclipse user interface, however this will not be supported any differently than a thick client perhaps implemented in SWT or a web client that may or may not be an AJAX based implementation. TPTP will extend it's current user interface to exploit the DMS and this includes the Eclipse workbench, RCP applications and BIRT reports.

Client object model

The client object model is nothing more than the name implies, and is not directly affected by DMS. In use cases where DMS is involved, it is assumed that the client has a need to hold some data that has been retrieved via the DMS in memory.

The typical role of the client object model is to hold data of value to the client in a format that is optimal for that client. Since this is obviously not predictable by TPTP in advance a flexible contract is needed. TPTP will provide an implementation of what are expected to be very common use cases and bases on that experience will evolve an api for extension. This has been referred to as a registered formatting function but requires further investigation before exposure. In the meantime TPTP will provide a simple pojo and EMF contract between the client the access api. This means a specific object graph for simple tables, trees and graphs will be used to access returned results from the access apis. At a minimum this will be pojo based, and an EMF format may also be provided. A formatting to other object models or serializations such as XML are conceivable, but not currently in plan.

Note that it is not assumed that the client is user based. It is expected to be very normal for a data warehouse to be a consumer of data managed by the TPTP DMS. The contract to the result specific api and optionally the store specific query and access component is unchanged.

Result specific access api

The role of the result specific api is to bind the client object model to the the results of storage system specific queries and access. This will be done without exposing the storage specific api itself.

For example the client may invoke a request for most recent CPU utilization of a specific CPU separated by processes. The result of such a request when bound to the the TPTP simple pojo client model, would return a simple table of process names and the related percentage utilization value. The signature may look like " Object retrieveCpuUtilization(OutputFormater, MachineFilter, CpuFilter, ProcessFilter)". An alternate signature may be in the context of a previous request that returned a collection of CPUs. For example " Object retrieveCpuUtilization(OutputFormater, ProcessFilter)" may be an operation on a CPU object. The significance is that the client is only exposed to the concepts understood, but not the actual storage or query mechanism. Note the contract with the client is managed via the implementation of the mapping in the formater.

The implementation of this example function will in turn use the storage specific data query and access layer. This function understands the concepts of the data being managed, and exploits the appropriate access api. Yet the most important role of this implementation is to bind the results of the access request to the formater.

Store specific data query/access

The query and access apis are by definition storage system agnostic and are also storage content agnostic. Therefore the request will have the form of a generic query and will return primitive constructs in the initial implementations. This is similar to the popular usage of JDBC for example, however since the storage system may not have a direct or even possible JDBC mapping this api will be more general.

The implementation however will be optimized to the storage system. In the initial release we will likely provide a JDBC implementation to target a relational database, and perhaps a solution to support and EMF based storage system.

The results from this component will initially be constrained to simple tables and trees.

Event and schema driven contracts

As can be seen from the previous information, the bulk of DMS is a framework for building domain specific solutions. The domains that TPTP has captured in the past are as follows:

  • Trace - stack and heap information captured as graphs and/or statistical counters
  • Test - test definition and behavior as well as related execution logs
  • Log - logs that have been transformed into the generic common base event format
  • Symptoms - pattern matching database used to analyse the log data
  • Statistics - generic hierarchy of snapshots of statistical data over time

Each of these domains have a related set of event specifications, loaders as well as a EMF based model. The EMF model was common to the client model and the storage system. This simple structure works well, but does not enforce a complete separation of concerns or provide a way to scale to large volumes of data. However much of this structure can and will be carried forward in spirit, if not in implementation.

If we take a simple log event through DMS we can highlight each of the domain specific contracts that will be in place.

  1. (Event producer)the XML based event can continue to be used.
  2. (Event parser)The event parser that exists can also continue more or less as it is today. However the bean used to hold the event data must have an extensible "add yourself" implementation that will more dynamically resolve and implementation.
  3. (Store specific loader)The current implementation of the add yourself for the EMF model can continue to be used to populate an EMF store, however it needs to be configured rather than being compiled in as part of the bean.
    1. A RDB implementation will need to be implemented that leverages JDBC and other standard RDB infrastructure
  4. (Data store)The current EMF CBE model can be used for the storage in zipped xmi resources
    1. A RDB will have to be put in place to support large scale persistence. Views will be provided for all read access
  5. (Store specific query/access api)The current EMF query infrastructure used in TPTP can provide the implementation of the EMF storage specific component
    1. A JDBC based implementation will be provided for the RDB based store
  6. (Client)The current TPTP Eclipse workbench can be reused as a client once adjusted to use DMS rather the current direct leveraging of paging lists etc.. Note that this would at least logically be a separate instance of the EMF model from the storage model
  7. (Client Object Model)Although not the simplest implementation the current EMF model can be used as the client object model
    1. A simple pojo model can be used for more direct application clients
  8. (Result specific access api)Logically a new layer of mapping has to be introduced to separate the storage model from the client model this would for example access a list of common base events and provide a formater to the client types.

Control mechanisms

filler


API issues

local versus remote

pojo, JPO, JDO


Relationship to heritage TPTP models

hierarchy model

statistical model

log model

test model

trace model

symptom database model

Back to the top