# STEM Design Document

STEM Design Documentation

## Overview

Overview of the design of STEM.

### Modeling Framework

At it's core STEM is a discrete event simulation system. It begins with an initial "simulation state" and then proceeds in step-wise fashion to determine the next state of the Simulation as a function of the current state and a parameter that specifies the current "time". It uses a "graph" to represent the state of the Simulation at each of its steps.

A graph is a simple, but powerful, mathematical abstraction for representing "entities" (i.e., things in the world) and their relationships. More formally, a graph is a set of "nodes", "edges" and "labels", where Nodes generally correspond to entities and edges link two nodes and represent some relationship between them. Labels are attached to either a node or an edge and represent some aspect of their host (like the name of the entity, or the name of the relationship). Each node may have more than one label, but each edge will only have one.

#### Canonical Graph

In STEM, nodes typically represent geographic regions while edges represent relationships between geographic regions. There may be any number of edges between any two nodes, as there many be any number of relationships between any two geographic locations. A label on a node might represent the physical area of the corresponding geographic location, the number of population members of a particular type or the state of a disease at that location. An edge between two nodes might represent a relationship such as the sharing of a common border. In that case, the label for the edge representing that relationship might contain the length of the border between the two locations. A different edge between the same two nodes might represent a road that connects the two locations. It's label might indicate the type of road and how much traffic it carries. A completely different edge might represent the part of the flight path of migratory birds. The graph that includes all of the nodes, edges and labels necessary to represent the state of the Simulation is called the canonical graph.

A Simulation in STEM begins with an initialized canonical graph and a starting time. The first step in a Simulation is to determine the next point in time that will be used to update the state of the canonical graph. With this value determined the internal STEM Simulation "engine" invokes, in a very specific order, a set of computations associated with the canonical graph. These computations take the time point as input and compute the "next" state of the graph as it will be at that time. When these computations are complete. The state of the entire graph is changed to the "next" value just computed. This process continues until stopped by the user, or, if specified, a predetermined end-time is reached.

### Compositional Modeling Framework

One of the main features of STEM, and the one that makes it such a powerful modeling system, is its framework for specifying the features of a Simulation. This framework allows a Simulation to be composed from many different reusable components that combine to form the Simulation's canonical graph, computational elements and sequence of time points.

This approach is extremely flexible and powerful. The components can be from many different sources and can be exchanged among users. The components can also be combined to create larger structures which then can become reusable components themselves. For instance, it is possible to create a detailed model of a country and then reuse that model as a component in many different Simulations. Similarly, computational aspects of a Simulation such as a specialized disease model can be developed by an individual researcher and then used by many others in their models. The ability of STEM to combine components from different sources makes it possible to leverage the varied expertise of different model builders in a way that has never before been possible.

To foster reuse and collaboration, each component in the framework has a set of Dublin Core meta data associated with it. This meta data records important attributes of the component such as its title, descriptive text, the name of the original creator, any literature citations associated with it (e.g., a paper describing a particular disease model) and many others such as important dates, spatial characteristics. This information allows modelers to know exactly what they are using in their models.

### STEM Components

The STEM compositional modeling framework consists of five components that constitute the "building blocks" of any Simulation. These are a: <a href="#scenario">Scenario</a>, <a href="#sequencer">Sequencer</a>, <a href="#decorator"> Decorators</a>, <a href="#model"> Models</a> and <a href="#graph"> Graphs</a>.

#### Scenario

The main component of a Simulation is a Scenario. When a Simulation is run in STEM, it is always created from a Scenario. A Scenario logically collects together three other types of components, a single <a href="#sequencer">Sequencer</a>, a set of <a href="#decorator"> Decorators</a>, and a single <a href="#model"> Model</a>. Together, these components can be used to create a Simulation.

#### Sequencer

A Sequencer is the component of a Simulation that determines the sequence of time values that will be used to compute the next state of the <a href="#canonicalgraph">canonical graph</a>. It may produce values that are at fixed intervals of time, or it may vary the duration of the intervals between points. The values it creates are in simulated "STEM Time", but there is no restriction preventing the values from reflecting "wall clock" time. This can be useful for "Simulations" that incorporate external "real-time" data values from databases or other sources such as weather observations.

#### Decorator

A Decorator is the framework's computational component. Every Decorator can participate in the initialization of the <a href="#canonicalgraph">canonical graph</a>, for instance by setting the values of existing labels or adding additional ones (i.e., they "decorate" the graph). Also, at each Simulation cycle, they are responsible for determining the next state of the <a href="#canonicalgraph">canonical graph</a> by computing the values of labels as a function of the current Simulation time. In an epidemiological Simulation, a disease model would be implemented as a Decorator. There is no restriction on what a Decorator can do, it can, for instance, issue a query to a database or invoke a web service.

#### Model

A Model is the component responsible for representing the contents of the <a href="#canonicalgraph">canonical graph</a> and for creating an instance of it when a Simulation is started from a <a href="#scenario">Scenario</a>. It combines with the final component of the framework called a <a href="#graph"> Graph</a> to form a tree. This tree is a hierarchical organization of the different contributions to the <a href="#canonicalgraph">canonical graph</a>. Model instances form the root and interior nodes of the tree while <a href="#graph"> Graph</a> instances form the leaves. The Model referenced by a <a href="#scenario">Scenario</a> is the root of such a tree.

Each Model contains three different collections. The first is a collection of "sub-Model" instances, each of which is essentially the root of a sub-tree. The second is a collection of <a href="#graph"> Graph</a> instances, and the third is another collection of <a href="#decorator"> Decorators</a>.

#### Graph

Graph instances contain the actual components, Nodes, Edges, and Labels, that eventually will be contributed to a <a href="#canonicalgraph">canonical graph</a>. In the compositional framework, Graph instances are not true mathematical "graphs", they are better described as "graph fragments" as they may contain unresolved sets of Edges or Labels (and no Nodes). When the <a href="#canonicalgraph">canonical graph</a> is created, these fragments are combined and their content's connections eventually resolved (i.e., Edges and Labels will be mated with their appropriate "missing" Nodes). The resulting <a href="#canonicalgraph">canonical graph</a> is a true mathematical graph.

#### Label

Labels play a special role in the framework in that they can store two state values simultaneously. They have a "current" value which, collectively, records the current state of the graph. They can also have a "next" value which is used, collectively, to store the next state of the graph.

#### Model Decorators

The collection of <a href="#decorator"> Decorators</a> that may exist in each <a href="#model"> Model</a> is similar to that contained in a <a href="#scenario">Scenario</a> instance. Its contents represents the computational component of the <a href="#model"> Model</a>. The difference is that the <a href="#modeldecorators"> Model Decorators</a> are only able to modify the parts of the <a href="#canonicalgraph">canonical graph</a> that are contributed by the tree rooted at the <a href="#model"> Model</a>. The <a href="#decorator"> Decorators</a> in the <a href="#scenario">Scenario</a>, being above the root of the tree, are able to access the entire <a href="#canonicalgraph">canonical graph</a>. There is also a strict execution order of <a href="#modeldecorators"> Model Decorators</a>. The ones that are contributed lower in the tree are invoked before ones contributed above them. <a href="#scenario">Scenario</a> <a href="#decorator"> Decorators</a> are invoked last. The order of invocation for <a href="#decorator"> Decorators</a> at the same "level" is arbitrary.

A <a href="#modeldecorators"> Model Decorator</a> would typically be some computation that needs to be executed for each cycle of the Simulation, while a <a href="#scenario">Scenario</a> <a href="#decorator"> Decorator</a> would typically be used to modify the initial state of the <a href="#canonicalgraph">canonical graph</a> to customize it for a particular "#scenario". For example, in an epidemiological Simulation, a disease model would be added to a Simulation as a <a href="#modeldecorators"> Model Decorator</a> while the exact location of an outbreak of a disease would be added to the Simulation by a <a href="#scenario">Scenario</a> <a href="#decorator"> Decorator</a>. Many different <a href="#scenario">Scenario</a> instances could refer to the same <a href="#model"> Model</a> (with its disease model), but provide different <a href="#decorator"> Decorators</a> to specify different starting locations.

<a href="#decorator"> Decorators</a> typically compute the next value for Labels in the <a href="#canonicalgraph">canonical graph</a> as a function of the current contents of the graph and the time. The Figure below illustrates how these components are combined to create a <a href="#model"> Model</a> that is used by two different <a href="#scenario"> Scenarios</a>. <img src="img/ScenarioComposition.jpg" />

### Simulation Execution

When is Simulation is started, the first operation is to create the <a href="#canonicalgraph">canonical graph</a>. This is accomplished by recursively descending the tree rooted by the <a href="#model"> Model</a> referenced by the <a href="#scenario">Scenario</a>. As the <a href="#canonicalgraph">canonical graph</a> is constructed, connections between Labels and Edges in the graph fragments are resolved and then each <a href="#model"> Model</a> <a href="#decorator"> Decorator</a> is invoked and given the opportunity to "decorate" the <a href="#canonicalgraph">canonical graph</a> as part of its initialization. When this is complete, the <a href="#scenario">Scenario</a> <a href="#decorator"> Decorators</a> are invoked to give them the opportunity to decorate the <a href="#canonicalgraph">canonical graph</a>.

When the <a href="#canonicalgraph">canonical graph</a> has been constructed and initialized by the <a href="#model"> Model</a> and <a href="#scenario">Scenario</a> <a href="#decorator"> Decorators</a>, the Simulation can begin its first cycle. The first step of the Simulation is to determine if the Simulation has completed its sequence of cycles. The answer to this question is provided by the <a href="#sequencer">Sequencer</a> that is referenced by the <a href="#scenario">Scenario</a> from which the Simulation was started. If the answer is "No," and the Simulation should continue, the <a href="#sequencer">Sequencer</a> will provide a value that represents the "time" of the next cycle. The STEM Simulation engine then takes that value and invokes each of the <a href="#model"> Model</a> <a href="#decorator"> Decorators</a> (in proper order) passing them the time value. They perform their computations and then the <a href="#scenario">Scenario</a> <a href="#decorator"> Decorators</a> are invoked in the same manner. Frequently, the <a href="#scenario">Scenario</a> <a href="#decorator"> Decorators</a> will not have any computations to perform after they have done their initialization, but there is no restriction that enforces this.

When the <a href="#decorator"> Decorators</a> are finished, the Simulation engine tells the <a href="#canonicalgraph">canonical graph</a> to switch to its "next" state. This means that all of the Labels exchange their "current" and "next" values. This completes the first cycle. The process begins the next cycle with the <a href="#sequencer">Sequencer</a> determining if the sequence of simulation cycles is complete.

### Standards

#### ISO-3166

The ISO 3166-1 Standard defines names and short alphabetic codes for "countries" and "territories" that corresponds the geopolitical divisions of the Earth. There are currently 244 such countries which is a larger number than the membership of the United Nations (191) and reflects the inclusion of territories in the mix. For instance, the standard includes the United States and Puerto Rico as separate entities. This level of representation is sometimes referred to as "United Nations Administration Level 0".

The standard defines two and three letter codes for each entry. A two letter code is referred to as an "alpha-2" code while a three letter code is an "alpha-3" code. Thus, the "ISO 3166-1 alpha-3" code for the United States is "USA" while the "ISO 3166-1 alpha-2" code is "US".

The ISO 3166-2 Standard defines names and alphabetic codes for subdivisions of countries and territories as defined in ISO 3166-1 . This level of representation is sometimes referred to as "United Nations Administration Level 1". Typically, these would be the states or provinces of a particular country. There are approximately 3700 different codes.

The ISO 3166-2 code is a compound code consisting of the ISO 3166-1 alpha-2 code of the parent country and a second country specific string separated by a hyphen. For instance, the code for the State of California in the United States of America is "US-CA". Similarly, the code for Kabul province in Afghanistan is "AF-KAB".

There is a standard called ISO 3166-3 which one might expect would provide codes for subdivisions of Level 1, but it doesn't. Instead, it defines how codes have replaced each other as political boundaries have evolved.

There is still a need in STEM to create codes that reflect Level 2 divisions (e.g., counties in the United States), STEM adopts the convention embodied in the ISO 3166-2 standard by extending the Level 1 code (e.g., "US-CA") with additional country specific identifiers separated by hyphens. In the United States, the Federal Information Processing Standard (FIPS) is the appropriate choice. It defines unique identifiers for each of the counties in the country. For instance, the FIPS code for Santa Clara County in California is "06085" (the leading "0" is significant). Thus, the "ISO" identifier used in STEM for Santa Clara County is "US-CA-06085".

In cases where the Level 2 codes for a particular country are where are known (an opportunity for an open source contribution) a special "generated" code was created with the intention that it would be replaced later by the correct code. This is the case for Afghanistan where the code for the city of Kabul was generated as "G140001" (the leading "G" identifies the code as having been generated and should be replaced when the correct code is obtained). For instance, the "ISO" code used in STEM for Kabul (the city) is "AF-KAB-G140001".

The ISO code used in STEM for a higher resolution area, such as a Census tract, would follow the same convention by appending a code to the code of the containing area separated by a hyphen. There are no such codes defined in STEM at this time.

#### Dublin Core Meta Data

The Dublin Core Meta Data standard was developed by the library science community to define a consistent set of attributes that could be attributed to a "Resource". The definition of a Resource is deliberately unspecified by the standard, but would typically be something like a book, a movie, a video, or a musical score. Most of the attributes are simple and easy to understand and basically specify familiar details of something like a book such as its "Title", its "Creator", and the "Date" it was created. The usefulness of the standard is extended by the rest of the attributes that define such things as "Spatial" characteristics, a "License" and a date range for which a Resource is valid; there are several others.

STEM makes use of the standard by incorporating it as a required feature of each composable component of the graph it uses as its representational framework for each Simulation. Thus, when a simulation is created, every component (i.e., all data and all computation) is specifically labeled and identified as to its origin and applicability.

## Plug-in Descriptions

This section contains descriptions of the different plug-ins that comprise STEM.

### org.eclipse.ohf.stem.core

The org.eclipse.ohf.stem.core plug-in contains the definitions and implementations of the Unified Modeling Language (UML) models that are used to represent the composable components that define a simulation in STEM. These models are Common, Graph, Model, Sequencer and Scenario. These models are implemented using the Eclipse Modeling Framework (EMF).

#### Common Model

Common Model UML Diagram goes here.

Describe the Dublin Core standard.

Describe how Sanity Checking works.

#### Graph Model

Graph Model UML Diagram goes here

#### Model Model

Model Model UML Diagram goes here

#### Sequencer Model

Sequencer Model UML Diagram goes here

#### Scenario Model

Scenario Model UML Diagram goes here