Cosmos SDMX Notes

From Eclipsepedia

Jump to: navigation, search

Some observations:

The entire SDMX spec is broad and pretty complex. The query syntax is limited with respect to ranges and formatting. Data sets are treated as first-class entities. COSMOS needs to determine requirements with respect to registry services The metadata facilities are powerful, but may be overkill and/or duplicating effort for COSMOS. Providing a fully SDMX-complient data registry in COSMOS (and requiring adoptors to create complient datasources) may be too much of a burden for adoption.


<excerpted from the Implementer's guide pdf>

223 3.2.3 Data Set
224 Data sets are made up of a number of time series or sections (the cross-sectional
225 organization of observations at a single point in time). In addition to the numeric
226 observation (Observation) and the related date (TimePeriod), which are the core
227 of the time series, there may be attributes (AttributeValue) indicating the status
228 of the observation, e.g. whether the value is a normal or break value, etc. These
229 attributes may be optional (or “conditional”), and may have coded or free text values.
230 They may pertain to any part of the data set - each observation might have a different
231 value for the attribute, or there might be only a single attribute value describing the
232 entire data set, or for each time series, etc.
234 Each time series can be identified by the values of its dimensions. Time series data
235 can be seen as n-dimensional. A given time series will have exactly one value
236 (KeyValue), of the set of permissible values, for each of its dimensions
237 (Dimension), and a set of observations (Observation): one value for each specific
238 point in time (TimePeriod). A specific time series might have dimensions of
239 "frequency", "topic", "stock or flow", "reporting country", etc., with a single
240 corresponding value for each dimension. Taken together, this set of values uniquely
241 identifies the time series within its data set, and is called the time series key
242 (TimeSeriesKey).
244 Cross-sectional representations of the data may be derived from the same Data
245 Structure Definition from which time-series representations are structured, so long as
246 the needed additional structural metadata is provided. This functionality allows
247 multiple measures to be declared in the Data Structure Definition, associated with the
248 representational values of one dimension. When data is structured to represent a set
249 of multiple observations at a single point in time, the “section” – one or more
250 observations for each declared measure – replaces the series in the data structure.
251 Each measure carries at least one dimension of the key ( the “measure dimension”)
252 at the observation level, while the time period is attached at a higher level in the data
253 structure (the Group level – see below). The remainder of the key is found at the
254 Section level (or above), similar to the way in which it is attached at the Series level
255 for time series data structures.
257 Support for cross-sectional data representation is not as complete as that for
258 representing time-series data. The intended functionality is to allow key families
259 which are to be used to represent cross-sectional data to be created with this
260 application in mind. Because time-series data representations are also possible for
261 any Data Structure Definition which has time period as a concept, these data
262 structures may also be derived from the Data Structure Definition. The functional
263 result is that two complementary types of data structures may be provided: the
264 needed cross-sectional view, and the time-series oriented view which may be useful
265 to systems which may not be configured to process data in any other fashion. The
266 Data Structure Definition created to support cross-sectional structuring of data will
267 support the predictable (and thus, automatable) transformation of data from the
268 cross-sectional structure into the time-series structure.
270 Data sets may be organized into “groups” of time series or sections (GroupKey); this
271 is a particularly useful mechanism for attaching metadata to the data. One such
272 group is called the “sibling group”, which shares dimension values for all but the
273 frequency dimension (the frequency dimension is said to be “wildcarded”). In the
274 Data Structure Definition, all legitimate groups are declared and named. All members
275 of the group will share key values for a stated set of dimensions. Attributes may be
276 attached at this level in the data formats, as are the shared key values for those
277 formats where message size is an issue. In cross-sectional formats, time period ( a
278 period or point in time) is attached at the group level.
280 The data structure definition is a description of all the metadata needed to
281 understand the data set structure. This includes identification of the dimensions
282 (Dimension) according to standard statistical terminology, the key structure
283 (KeyDescriptor), the attributes (MetadataAttribute) associated with the data
284 set, the code-lists (CodeList) that enumerate valid values for each dimension and
285 coded attribute (CodedAttribute), information about whether attributes are
286 required or optional and coded or free text. Given the metadata in the data structure
287 definition, all of the data in the data set becomes meaningful.
289 It is also possible to associate annotations (Annotation) with both the structures
290 described in key families and the observations contained in the data set. These
291 annotations are a slightly atypical form of documentation, in that they are used to
292 describe both the data itself - like other attributes - but also may be used to describe
293 other metadata. An example of this is methodological information about some
294 particular dimension in a data structure definition structure, attached as an annotation
295 to the description of that dimension. Regular “footnotes” attached to the data as
296 documentation should be declared as attributes in the appropriate places in a data
297 structure definition – annotations are irregular documentation which may need to be
298 attached at many points in the data structure definition or data set.

The association of attributes with observed values is how SDMX deals with grouping. However, this grouping appears to beexogenous with respect to queries, as attributes are a property of the dataset.

<excerpted from the InformationalModel pdf>

223 3.2.3 Data Flow
999 The DataflowDefinition associates a KeyFamily with one or more Category
1000 (possibly from different CategorySchemes) via the parent class of
1001 DataflowDefinition - StructureUsage. This gives a system the ability to
1002 state which DataSets are to be reported/disseminated for a given Category, and
1003 which DataSets can be reported using the KeyFamily definition. The
1004 DataflowDefinition may also have additional metadata attached that defines
1005 qualitative information and constraints on the use of the KeyFamily such as the sub
1006 set of Codes used in a Dimension (this is covered later in this document – see
1007 “Data Constraints and Provisioning” section 9). Each DataflowDefinition must
1008 have one KeyFamily specified which defines the structure of any DataSets to be
1009 reported/disseminated.

Dataflow IDs act as a type for datasets. The dataflow concept could be useful in supporting our Dataset location requirement (locating an appropriate data service).

Dataset Registration example:

			<ID>JD014</ID> <<<<<<<<< identifies the dataset (ResourceID in WSDM-speak)
			<Name xml:lang="en">Trans46302</Name>
			<Sender id="BIS">
				<Name xml:lang="en">Bank for International Settlements</Name>
					<Name xml:lang="en">G.B. Smith</Name>
			<Receiver id="ECB">
				<Name xml:lang="en">European Central Bank</Name>
					<Name xml:lang="en">B.S. Featherstone</Name>
					<Department xml:lang="en">Statistics Division</Department>
<<<<<<<<<<<<<<< refrerence to the format of the data
   <<<<<<<<<<<<<<<< Identifies the format of the data


			<registry:StatusMessage status="Success"/>

Registration Query example

Can also query on data provider refs, data flow refs, etc.
<<<<<<<<<<<<< identifies the 'type' of data.

		<registry:QueryResult timeSeriesMatch="false">
 <<<<<<< Data source with matching types
3589 Datasource specifies the properties of a data or metadata
3590 source. A SimpleDatasource requires only the URL of the data. A
3591 QueryableDatasource must be able to accept an SDMX-ML Query Message,
3592 and respond appropriately.


Dataset Query example:

				<query:Dimension id="JD_CATEGORY">A</query:Dimension>
				<query:Dimension id="FREQ">M</query:Dimension>
				<query:Dimension id="FREQ">A</query:Dimension>
<<<<<<<<<<<<<<< Identifies the target dataset.                 



Compact Schema for response

	<xs:complexType name="SeriesType">
			<xs:extension base="compact:SeriesType">
					<xs:element ref="Obs" minOccurs="0" maxOccurs="unbounded"/>
					<xs:element name="Annotations" type="common:AnnotationsType" minOccurs="0"/>
				<xs:attribute name="COLLECTION" type="CL_COLLECTION" use="optional"/>
				<xs:attribute name="FREQ" type="CL_FREQ" use="optional"/>
				<xs:attribute name="JD_TYPE" type="CL_JD_TYPE" use="optional"/>
				<xs:attribute name="JD_CATEGORY" type="CL_JD_CATEGORY" use="optional"/>
				<xs:attribute name="VIS_CTY" type="CL_BIS_IF_REF_AREA" use="optional"/>
				<xs:attribute name="TIME_FORMAT" type="CL_TIME_FORMAT" use="required"/>
	<xs:element name="Obs" type="ObsType" substitutionGroup="compact:Obs"/>
	<xs:complexType name="ObsType">
			<xs:extension base="compact:ObsType">
				<xs:attribute name="TIME_PERIOD" type="common:TimePeriodType" use="optional"/>
				<xs:attribute name="OBS_VALUE" type="xs:double" use="optional"/>
				<xs:attribute name="OBS_CONF" type="CL_BIS_OBS_CONF" use="optional"/>
				<xs:attribute name="OBS_PRE_BREAK" type="xs:string" use="optional"/>
				<xs:attribute name="OBS_STATUS" type="xs:string" use="optional"/>

<<<<<<<<<<<<<< identifies the target types (key family)
<<<<<<<<<<<<<<<<<< actual xsd for response, defines Obs, etc.
	urn:sdmx:org.sdmx.infomodel.keyfamily.KeyFamily=BIS:EXT_DEBT:compact BIS_JOINT_DEBT_Compact.xsd

			<bisc:Obs TIME_PERIOD="2000-01" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2001-02" OBS_VALUE="2.29" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-03" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-04" OBS_VALUE="5.24" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-05" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-06" OBS_VALUE="3.78" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-07" OBS_VALUE="3.65" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-08" OBS_VALUE="2.37" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-09" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-10" OBS_VALUE="3.17" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-11" OBS_VALUE="3.34" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-12" OBS_VALUE="1.21" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-01" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-01" OBS_VALUE="5.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2001-02" OBS_VALUE="3.29" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-03" OBS_VALUE="6.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-04" OBS_VALUE="2.24" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-05" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-06" OBS_VALUE="7.78" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-07" OBS_VALUE="3.65" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-08" OBS_VALUE="5.37" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-09" OBS_VALUE="3.14" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-10" OBS_VALUE="1.17" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-11" OBS_VALUE="4.34" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-12" OBS_VALUE="1.21" OBS_STATUS="A"/>
			<bisc:Obs TIME_PERIOD="2000-01" OBS_VALUE="4.14" OBS_STATUS="A"/>