Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

COSMOS DC Design Discussions

  • Binding service: The binding service is used for instantiating and initializing components. The current design requires exploiters of the framework to provide a binding service for user defined components. However, the information required for the binding service can be extracted from the assembly XML and can be abstracted into the framework level. If you look at the SampleBindingService and GLABindingService, they are almost in the same pattern.
  • The coupling between components
    • The components should be loosely coupled.
    • event handling: need to add listeners, events are raised on each object that passes through the assembly.
    • after each event is processed, the object is dispatched by calling the wire method of the next "component" in the assembly. This is potentially inefficient and make optimization difficult. For example, when I wrote the CBE data sink, I had to do an insert for each CBE. If I have access to the whole list of CBE objects, I can do a batch data insert operation.
  • wire method: The method name is not in the interface. Reason is probably because we need a different name for each component, and the parameter object type is also different. Assumption of the method name and introspection on the available methods and comparison with a list of acceptable object types is done. I find this process quite complicated, and potentially difficult to debug. An alternative is to use a generic method name such as process, that pass in an array of Objects. It's up to the implementation of the component to decide how to handle the object.
  • Separation of concerns: There are three crosscutting concerns in the data collection code:
    1. assembly definition: the definition of the assembly XML files, and writing code for a assembly component (e.g. a transformer).
    2. metadata definition: Mapping of legacy data models to the generic metadata specification (SDMX-like model)
    3. WSDM management features
  • Currently, the above three concerns are too intertwined, making the framework very difficult to use.
  • Query framework: I find the query framework still quite ad hoc. There is no standard query mechanism. The query assembly binds a query name with a class, which implements an application specific query interface. The implementation of the query can use any method. I somehow find the concept of "inbound" vs. "outbound" in a data assembly unnecessary. We can provide standard datasource that makes SQL queries to a relation database, or XPATH queries on XML documents. These kinds of generic "data source" or "queries" can be used in inbound or outbound data collection.
  • Other things that can be done for data collection:
    • Hello world example for using the dc framework
    • More useful data source and sinks:
      • GLA datasource: The current example uses an "embedded GLA source". The GLA source can be an external GLA instance.
      • Revisit CBE schema and decision regarding interoperability with TPTP.
    • a more user friendly interface (command line or UI) for interacting with the dc framework.
    • documentation

Back to the top