Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Talk:COSMOS Design 197867"

Line 6: Line 6:
 
<Hubert's comment>
 
<Hubert's comment>
 
Joel, can you explain why data structure specification flows from DC to CMDB?  The data direction of the two concerns are not very intuitive to me.  To me, all metadata information come from data managers or MDRs.  The first concern (data structure) is provided to the broker (federating CMDB) by data managers (MDRs) when they register.   
 
Joel, can you explain why data structure specification flows from DC to CMDB?  The data direction of the two concerns are not very intuitive to me.  To me, all metadata information come from data managers or MDRs.  The first concern (data structure) is provided to the broker (federating CMDB) by data managers (MDRs) when they register.   
 +
 +
<Joel's response>Sure. The Data Broker is the real 'owner' of the DC keyset information. If the DataBroker acts as an MDR for the purpose of supporting CMDB federation, it makes sense to me that the bit of metadata that's being federated out of the DataBroker MDR is this keyset information. The CMDB shouldn't have to know about all of the DataManagers - let the DataBroker act as the aggregator.</Joel's response>
 +
</Hubert's comment>
 +
 
</Hubert's comment>
 
</Hubert's comment>
  
Line 21: Line 25:
 
<Hubert's comment>
 
<Hubert's comment>
 
Regarding mapping existing data to SDMX, we need to take care of the usecase where I have an existing database of system management data (stat data, log data, or hardware config data).  The data is collected by existing applications, such as TPTP or IBM Tivoli products like ITM. In these cases, we need to a generic way to map the data models of these existing databases to the SDMX data model.  In many cases, the mapping is non-trivial.  We will also need to have extension code written to process SDMX queries into native query languages, and transformers to convert native data structures to the SDMX data formats (e.g. from jdbc result set to ket family and key concepts).  These extra mapping and processing logic will become a performance issue.   
 
Regarding mapping existing data to SDMX, we need to take care of the usecase where I have an existing database of system management data (stat data, log data, or hardware config data).  The data is collected by existing applications, such as TPTP or IBM Tivoli products like ITM. In these cases, we need to a generic way to map the data models of these existing databases to the SDMX data model.  In many cases, the mapping is non-trivial.  We will also need to have extension code written to process SDMX queries into native query languages, and transformers to convert native data structures to the SDMX data formats (e.g. from jdbc result set to ket family and key concepts).  These extra mapping and processing logic will become a performance issue.   
 +
 +
<Joel's Response>This is only an issue if we make it one. If the existing dataset just provides a description of it's record format in 'SDMX' form, then there's nothing to map. Also, our assemblies support Transformers, so we can always deal with this issue at the component level. </Joel's Response>
 
</Hubert's comment>
 
</Hubert's comment>
  
Line 27: Line 33:
 
<Hubert's comment>
 
<Hubert's comment>
 
I don't like the idea of truncating the SDMX model at a point beyond which it can't be used efficiently.  For example, represent WEF events up to the keyset, and don't map attribute of WEF events to keys.  We are simply saying SDMX is not suitable for all data types.  It won't be very meaningful to just map a WEF event to a keyset because we can't use form an SDMX query with the contents of the event attribute values, which are very important in forming the search criteria in a query.  (e.g. get all events with high severity)
 
I don't like the idea of truncating the SDMX model at a point beyond which it can't be used efficiently.  For example, represent WEF events up to the keyset, and don't map attribute of WEF events to keys.  We are simply saying SDMX is not suitable for all data types.  It won't be very meaningful to just map a WEF event to a keyset because we can't use form an SDMX query with the contents of the event attribute values, which are very important in forming the search criteria in a query.  (e.g. get all events with high severity)
 +
 +
<Joel's Response>
 +
So map the bits of WEF that make sense into a keyset, and truncate the rest. There's no reason to take an extremist stance here. Of course, you could also (once you've determined the fact that the data format is WEF) use the convinience API that would be surfaced as a MUSE capability on the DataManager endpoint to do your more granular query. Remember, we have the capability model to exploit - there's no reason to force fit anything here.
 +
</Joel's Response>
 
</Hubert's comment>
 
</Hubert's comment>
  

Revision as of 14:52, 14 August 2007

[Joel's comments] (Hubert's comments are within the <Hubert's comment>...</Hubert's comment> markers)

First, some general comments. One of the nice things that SDMX does is make a distinction between the structure of a set of data and the source of the set of data. If you take a look at the SDMX DataFlow structure, you'll see that it breaks down into two distinct concerns. The first is the type of data (the keyset/dimension concern), and the second is the type of entity that created the data (the source concern). It makes tons of sense to me to see information related to the first concern being pushed up from DC into CMDB land, and the information related to the second concern being pushed down from CMDB into DC land. The nice thing about supporting this type of symmetry is that you can use the results of a CMDBf graph query as input to a DataBroker graph query.

<Hubert's comment> Joel, can you explain why data structure specification flows from DC to CMDB? The data direction of the two concerns are not very intuitive to me. To me, all metadata information come from data managers or MDRs. The first concern (data structure) is provided to the broker (federating CMDB) by data managers (MDRs) when they register.

<Joel's response>Sure. The Data Broker is the real 'owner' of the DC keyset information. If the DataBroker acts as an MDR for the purpose of supporting CMDB federation, it makes sense to me that the bit of metadata that's being federated out of the DataBroker MDR is this keyset information. The CMDB shouldn't have to know about all of the DataManagers - let the DataBroker act as the aggregator.</Joel's response> </Hubert's comment>

</Hubert's comment>

  • How does the query work? Need better example of how to query for data.
  • Need example of how to map existing data to SDMX structure.
    • SDMX DataSet definition - DataFlow URI+ DataSource URI+ Time period
    • SDMX DataFlow definition - DataSource URI + Keyset URI
    • SDMX DataSource definition - DataSourceURI + DataSourceTypeURI
    • SDMX DataSourceType definition - DataSourceTypeURI
    • SDMX Keyset definition - Keyset URI + a list of DimensionURIs
    • SDMX Dimension definition - DimensionURI + Type enumeration. SDMX restricts the type of information that a dimension may hold to enumerations and simple scalar types.
      • We've completely ignored the SDMX concepts um... concept, as it has a high degree of overlap with our intended use of SML/CML.

<Hubert's comment> Regarding mapping existing data to SDMX, we need to take care of the usecase where I have an existing database of system management data (stat data, log data, or hardware config data). The data is collected by existing applications, such as TPTP or IBM Tivoli products like ITM. In these cases, we need to a generic way to map the data models of these existing databases to the SDMX data model. In many cases, the mapping is non-trivial. We will also need to have extension code written to process SDMX queries into native query languages, and transformers to convert native data structures to the SDMX data formats (e.g. from jdbc result set to ket family and key concepts). These extra mapping and processing logic will become a performance issue.

<Joel's Response>This is only an issue if we make it one. If the existing dataset just provides a description of it's record format in 'SDMX' form, then there's nothing to map. Also, our assemblies support Transformers, so we can always deal with this issue at the component level. </Joel's Response> </Hubert's comment>

I think that most of our mapping issues can be solved by deciding what time periods we want to support (for a lot of cases it may be 'for all time' as a means of effectively saying that there is only one dataset), and at what level we want the URIs to actually point to more granular definitions. For example, if we say that a WEF event's URI is a Keyset URI instance, and that the WEF event's URI is terminating (in other words, doesn't have a correpsonding set of Dimensions), then we can effectively map our type information into SDMX by truncation. All that's required it to relax the schema requirements on the higher level constructs in SDMX to be 'could have' instead of 'must have'.

<Hubert's comment> I don't like the idea of truncating the SDMX model at a point beyond which it can't be used efficiently. For example, represent WEF events up to the keyset, and don't map attribute of WEF events to keys. We are simply saying SDMX is not suitable for all data types. It won't be very meaningful to just map a WEF event to a keyset because we can't use form an SDMX query with the contents of the event attribute values, which are very important in forming the search criteria in a query. (e.g. get all events with high severity)

<Joel's Response> So map the bits of WEF that make sense into a keyset, and truncate the rest. There's no reason to take an extremist stance here. Of course, you could also (once you've determined the fact that the data format is WEF) use the convinience API that would be surfaced as a MUSE capability on the DataManager endpoint to do your more granular query. Remember, we have the capability model to exploit - there's no reason to force fit anything here. </Joel's Response> </Hubert's comment>

  • Have we agreed that SDMX is the default model we support? Are we talking about query models? If so, then no - I don't think that SDMX is appropriate as our default, or at least given the proposed mapping solution outlined above, we won't be able to support the full SDMX query interface. Of course, providing limited support for SDMX queries as a WSDM capability on the DataBroker should be feasible.
  • What about other models, e.g. SQL?
  • Does the user NEED to understand SDMX to use this? It would be helpful to understand the spirit of SDMX, at least enough to know what an

appropriate mapping would be.

<Hubert's comment> I would suggest that we do not mention the term "SDMX" in any of the COSMOS documentation, even if we are borrowing some concepts from it. Here are the reasons:

  • The SDMX concepts can be hidden behind some easy-to-use APIs.
  • It's a partial implementation (in fact a very small subset). People who really know about SDMX will say we are not using SDMX.
  • Most people have not seen the term "SDMX" in their lives, and let's not say they need to learn something new to use COSMOS. It will affect adoption acceptance.

</Hubert's comment>

  • How does the design support other things than SDMX. I've vote for using the WSDM capability model, so you can always support additional requirements through composition.
  • How do we reconcile the CMDBf query structure? This is where we can start to do some interesting things. For example, CMDBf has the concept of a 'Graph Query', which allows you to do pattern-oriented queries against the CMDB structure. This is really handy, because what we have in DC (and SDMX, for that matter) is pretty much unaware of structural implications - particularly at the instance level. So imagine that we use a CMDBf query against a compliant CMDB to identify things that are "close to" the problem we're trying to track down (and we can imagine one or more definitions of 'close to' based on puggable heuristics or norms). If we've done a proper job of mapping our SDMX-like URIs, then we are well set up to support a DataBroker query that can provide a set of EPRs that have information "close to" what's needed for problem resolution. So if we implement the DataBroker as an MDR (which means supporting the CMDBf Query API), we can use the implementation to return the set of DataManager EPRs as Target Items in the GraphQueryResponse structure. I think that's pretty cool!

Back to the top