Jump to: navigation, search

Talk:COSMOS Design 197870

[Joel's comments] Client API

  • provide an alternate implementations of get that takes an EndpointReference.
  • Why do many of the 'factory' methods return WsResourceClient? Why not an extentsion of WsRescourceClient (like DataBrokerClient)?
  • Why are all of the instance-oriented methods static? Making these static and limiting ourselves to WsResourceClient makes it hard to expose session-oriented information later on. I'd vote for creating client classes that extend WsResourceClient, and have the static 'get' methods return instances of these classes. That way, if the static 'get' method wants to implement singleton semantics, it can easily do so without painting the extender of these client API into a corner.

[Hubert's comments]

  1. This design is about "client API". I propose to drop the word "Client" in the design topic. This design is about "Public API of the COSMOS framework". This is to avoid confusion with the "Client" component in the COSMOS architecture, which is the application that interacts with the COSMOS data collection framework. The APIs discussed are for the Data Broker and the Data Manager, not the Client.
  2. getDataBroker:
    • Which component does this interface belong? It is a (factory) method for getting a reference to the broker. It cannot a method of the data broker API. It can belong to the "management domain", but this term is not mentioned in the whole design document. I assume the client can contact the broker through a "well known address". If we make the assumption that the client knows where the data broker is, we may not need an API to get the data broker.
    • The method returns "WsResourceClient" object. What is this object? Is it a remote proxy for the broker? The client and the broker are running in different processes.
  3. getDataManagers:
    • first parameter: dataBrokerClient - is it a reference to the client that is making the request? Why does the broker need to keep track of clients?
    • second parameter: classification - what is the definition of "classification"? Is it a keyword that data managers provide during registration to identify the nature of the data they can provide? for example: classification = "statistical data"
    • Do we need an API to query for the list of available classifications? The client will need this information to form the query for data managers.
    • return type is "Element". I assume it's the XML element object. Should the method return an array of EPRs of data managers or an array of objects that represents the data manager?
    • The API for retrieving the data managers should reflect how the broker indexes the data managers in the registry. The WS-RC specification may be relevant here as a means to provide classification and provide information about how to access the the catalog resources (data managers). The API for making a query for data managers will change depending on how the registry of data managers is designed. For example, if I'm interpreting the use of the "classification" parameter properly, it is used to describe the type of data that the data manager can provide. When a client ask for data managers for "statistical data", the broker will return all data managers that provide "statistical data". Will this type of categorization mechanism provide us with sufficient granularity to pinpoint the data source that the client is looking for? For example, if there are 10 data managers that are categorized under "statistical data", the client will need to access each of them and make the same query to each of them to get the data it is looking for. We can let data managers provide more information about themselves during registration to better qualify the resources and data sources that they are managing. The increased level of granularity will allow client to use more specify queries when looking up data managers. For example, the client can ask for statistical data for a specific machine.
    • We intend to do a (partial) reference implementation of CMDBf in COSMOS. The design of data manager registration is analogous to the registration service of MDRs with the federating CMDB in the CMDBf specification. The CMDBf registration service lets MDRs register managed resources and their relationships with the federating CMDB. The federating CMDB will provide a federated view of multiple data sources. The API design of the data broker need to take the CMDBf specification into account to either allow the broker to be evolved to support CMDBf registration or to implement some simpler functions of the federating CMDB.
    • The SDMX specification provides a way to specify metadata. If we decide to use SDMX in the context of the broker in the data broker, we should restrict ourselves to use SDMX as a means to specify metadata, and not data. In the end-to-end scenario, both metadata and data were stored in the same database. This has to change with the introduction of the broker component. The management and storage of data is the responsibility of the data maangers. We can use the SDMX constructs (such as categories, key families, etc.) to specify the metadata of the data managers and the data sources that they are managing.
      • While increasing the metadata granularity in the broker registry has its benefits, we need to be careful not to overdo it. For example, an SDMX dataset represents data collected in a period of observation. The number of datasets can change during the lifetime of the data manager. If we decide to include the dataset concept in the registry, we will need to require the data managers to update the registry periodically which will introduce desirable complexities.
    • Here are some good links I found about SDMX:
    • We need to find out the roles of the standards (WS-RC, CMDBf and SDMX) in the COSMOS framework. The API of the broker will need to allow data managers to provide enough information to register themselves with the right level of granularity, and allow clients to make browse or make queries on the registry to find the data managers that hold the data they need.
  4. registerDataManager
    • what is the first parameter "dataBrokerClient"?
    • please explain what each of the parameters mean, and how they will be used by the broker.
    • What is the return object (Element)?
    • does the registration process need to generate a unique identifier for the data manager? or will the broker be using multiple values (hostname, runtimeport, etc.) to uniquely identify data managers?
  5. deregisterDataManager
    • What is the first parameter "dataBrokerClient"?
    • Is name the unique identifier of the data manager?
    • Do we let anyone deregister a data manager? Is there any access control or authentication required for deregistration?
  6. getDataManager:
    • Why do we need this method? does the getDataManagers method of the data broker provide enough information about how to contact each of the data managers?
    • Similar to my second point above, this method does not seem to belong to the data manager.
  7. getKeyFamilyNames, getKeyFamily, getKeyFamilyData:
    • I think these are the "query APIs" for the data manager.
    • These methods assumes the data is described in the SDMX data model. If we decide to use SDMX as the common data model for ALL data sources, the design document needs to specify how any arbitrary data models can be mapped to SDMX and the process and steps to do so.
    • If SDMX is not meant to be used as a common data model for all data, we should not make these 3 methods part of the public API of the data manager. We can let data managers decide how to expose the data they manage, for example, by means of WSDM capabilities or custom convenience APIs. When a client has the EPR of a data manager, it can introspect the capabilities the data manager supports, and invoke those operations to get data. The end-to-end example for i4 took this approach.
    • The use of a common query API for all types of data is difficult but not impossible. It has the benefit of enabling data federation.
    • Data managers are equivalent to MDRs in the CMDBf specification. We should consider implementing the query service of the CMDBf specification as a uniform way to make queries on all kinds of data sources.
    • We can allow data managers to support both the CMDBf query service API and "COSMOS-style" query
    • The design documentation has not provided enough information to prove that the 3 getter methods above are sufficient as a generic query API.
  8. Please explain what is the utility API. Will the implementation be part of the broker or data manager?

Jimmy's comments: What drives the overall interactions between the Broker, Management Domain, Data Managers

  • Is there a COSMOS DC component that plays the role of a "conductor"?
  • As a start, please review the COSMOS Client POV diagram.
  • It is apparent that the Client API "knows" about ALL the COSMOS components. Specifically, the client API communicates with the Management Domain,

the Broker, and the Data Manager. The client API ALSO needs to be aware of the SEQUENCE in which it communicates with the various components. Does this mean that the client API is playing the role of a (perhaps)conductor / low end workflow manager?

  • ISSUE: We need to define if the "Client API" (or whatever we end up calling it) is the right place for this kind of logic.
  • Do bear in mind that since the Client API talks to the MD, Broker, and the DMs, either it OR something REAL close to it needs to understand the MD / Broker / DM inter-relationships.
  • The following (subset) of the management use cases drive this query:
    • What happens if the Management Domain disappears when the Client API is talking to the DM's? Who has the authority to restart the MD? What notifications should be done if this happens?
    • What happens if the Broker disappears when the Client API is talking to the DM's? Who has the authority to restart the Broker? What notifications should be done if this happens?
    • What happens if the DM disappears when the Client API is talking to it? Who has the authority to restart the DM? What notifications should be done if this happens?
    • Of course, there may be other permutations / combinations that you can come up with.
  • So now the BASIC question: Does a SINGLE component, i.e. the "Client API" understand the inter-relationships between the MD, Broker, and DM's?
  • OR should we have a NESTED management model, i.e. the MD controls the Broker, and the Broker controls the DMs? If we choose this, how would this work, given that the Client API talk to ALL three?
  • I do agree with Hubert's comment above, i.e. we DROP the word "Client" from the name. Even though this component is used primarily by a client, it is not a "client" itself and in addition to being an API, it MAY have some management capabilitees as well.

Martin's comments: What drives the overall interactions between the Broker, Management Domain, Data Managers

Let us imagine that we have defined all the different functions that the Client API can do, and lets imagine that we have generic code that implements each one of these functions......

Lets say that the functions can be specified in an xml file, and the order the functions are specified in the xml file (lets call it the conductor.xml file for now) are the 'desired' order to do things in. Lets say that this file is available to the Client (for now I don't know how that would be established) The client then knows that it must do things like 1. Call the Management Domain and get a Data Broker EPR. 2. Call the data broker and pass it a keyset name (and maybe DataSet Type name). You then end up having generic code to do the functions of a client, controlled externally by the conductor.xml file. Therefore the intelligence of the Client API is easier to maintain.

The limitation of intelligence on the Client API could be limited to it knowing where it picks up the conductor.xml file.

--Marty 03:32, 21 August 2007 (EDT)


COSMOS DC Architecture.jpg