Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
STEM Design Document
Overview
This document is intended to help new (and old) developers find their way in the STEM development environment and to become effective contributors to the code base. The main problem with trying to learn about a new system is figuring out where to start. At the beginning it is very hard to decide what is important and what is not -- it all looks the same.
There are likely to be two different types of developers who contribute to STEM, those that understand the Eclipse Modeling Framework, and those that don't. If you don't understand EMF, that's "Okay". There's plenty in the system that doesn't require it and you can use STEM as an example to help you learn this very powerful technology. EMF is used to implement the core components of the Simulation engine in STEM and provides generated implementations of the system's serialization code and the visual editors for each component, among many other things.
The STEM Modeling Framework
At its core STEM is a discrete event simulation system. It begins with an initial "simulation state" and then proceeds in step-wise fashion to determine the next state of the Simulation as a function of the current state and a parameter that specifies the current "time". It uses a "graph" to represent the state of the Simulation at each of its steps.
A graph is a simple but powerful mathematical abstraction for representing "entities" (i.e., things in the world) and their relationships. More formally, a graph is a set of "nodes", "edges" and "labels", where nodes generally correspond to entities and edges link two nodes and represent some relationship between them. Labels are attached to either a node or an edge and represent some aspect of their host (like the name of the entity, or the name of the relationship). Each node may have more than one label, but each edge will only have one.
The many meanings of the word "Model"
- We use the word “Model” interchangeably
- Depending upon context, it’ll be related to one of the following:
- Data modeled, code-generated data structures (STEM Data Model or EMF Models)
- A Container in STEM UI for composing scenarios (Scenario Model or Geographic model)
- Dynamic, integrated, programmatic models (Disease or population model, like Influenza)
- The Model builder – tool for authoring disease models
Eclipse Modeling Framework
- EMF is a collection of tools for building software on a structured data model
- Majority of STEM is (data) modeled using EMF
- Modeled data structures with EMF Ecore
- Generated Java code for API data bindings
- Serialize/deserialize data objects (primarily XML)
- Leverage and extend EMF UI components (Editors, Wizards)
Core STEM Components
- Core
- Core data (EMF) models
- Identifiable, Scenario, Model, Graph (Node/Edge), Decorators, Labels, …
- Simulation Engine (org.eclipse.stem.jobs)
- Solver, sequencer, logger, graph generator implementations
- Core data (EMF) models
- Data
- Geography, Population, Transportation, Earth Science
- Tools for generating fixed plug-ins of data components (“built-in data”)
- Epidemiological models
- Disease Model implementations (SI, SIR, SEIR, Multi-population, Malaria, Polio, etc)
- Sample diseases
- Population models
- Dynamic population models (e.g. mosquito population)
- Food Production models
- Model Generator – graphical tool for authoring disease models
- User Interface for all of the above
The Canonical Graph
- The STEM Canonical Graph is the runtime representation of any simulation’s state
- simulation.getScenario().getCanonicalGraph()
- Generated from the Scenario when a simulation is started
- Flattened and resolved version of the combined Graphs supplied to the Scenario
- The Canonical Graph Contains
- Decorators
- Nodes
- Nodes Represent areas that contain populations impacted by models
- Node Examples include:
- Geographic regions
- Transportation carriers (airplanes)
- Nodes also contain Labels and are connected by Edges
- Edges
- Labels
- Time
In STEM, nodes typically represent geographic regions while edges represent relationships between geographic regions. There may be any number of edges between any two nodes, as there many be any number of relationships between any two geographic locations. A label on a node might represent the physical area of the corresponding geographic location, the number of population members of a particular type that live there, or a mathematical representation of the state of a disease at that location. An edge between two nodes might represent a relationship such as the sharing of a common border (i.e., two regions are physical adjacent and could easily exchange population members); a different edge between the same two nodes might represent a road that connects the two locations. In the first case, the label for a border edge might record the length of the border between the two locations; in the latter case, the label might indicate the type of road and how much traffic it carries. Completely different edge could also exist, for instance ones that represent the flight path of migratory birds. The graph that includes all of the nodes, edges, and labels necessary to represent the state of the Simulation is called the canonical graph.
Simulation
A Simulation in STEM is created from a canonical graph. The specification of how to create the graph is represented by a Scenario (defined below).
The execution of a Simulation begins with an initialized canonical graph and a starting time. The first step in a Simulation is to determine the next point in time that will be used to update the state of the canonical graph. With this value determined the internal STEM Simulation "engine" invokes, in a very specific order, a set of computations associated with the canonical graph. These computations take the time point as input and compute the "next" state of the graph as it will be at that time. When these computations are complete, the state of the entire graph is changed to the "next" value just computed. This process continues until stopped by the user, or, if specified, a predetermined end-time is reached.
The STEM Compositional Modeling Framework
One of the main features of STEM, and the one that makes it such a powerful modeling system, is its framework for specifying the features of a Simulation. This framework allows a Simulation to be composed from many different reusable components that combine to form the Simulation's canonical graph, computational elements and sequence of time points.
This approach is extremely flexible and powerful. The components can be from many different sources and can be exchanged among users. The components can also be combined to create larger structures which then can become reusable components themselves. For instance, it is possible to create a detailed model of a country and then reuse that model as a component in many different Simulations. Similarly, computational aspects of a Simulation such as a specialized disease model can be developed by an individual researcher and then used by many others in their models. The ability of STEM to combine components from different sources makes it possible to leverage the varied expertise of different model builders in a way that has never before been possible.
To foster reuse and collaboration, each component in the framework has a set of Dublin Core Meta Data associated with it. This metadata records important attributes of the component such as its title, descriptive text, the name of the original creator, any literature citations associated with it (e.g., a paper describing a particular disease model) and others attributes such as important dates, spatial characteristics, etc. This information allows modelers to know exactly what they are using in their models.
STEM Components
The STEM compositional modeling framework consists of eight components that constitute the "building blocks" of any Simulation. These are: Scenarios, Sequencers, Decorators, Models, Graphs, Modifiers, Triggers, and Predicates. In addition, a sequence of related Simulations called a Batch, can be defined using components called an Experiment and Modifier.
Scenario
The main component of a Simulation is a Scenario. When a Simulation is run in STEM, it is always created from a Scenario. A Scenario logically collects together three other types of components, a single Sequencer, a set of Decorators, and a single Model. Together, these components can be used to create a Simulation.
A Scenario can be created using the Scenario Wizard and edited using the Scenario Editor.
Sequencer
A Sequencer is the component of a Simulation that determines the sequence of time values that will be used to compute the next state of the canonical graph. It may produce values that are at fixed intervals of time, or it may vary the duration of the intervals between points. The values it creates are in simulated "STEM Time", but there is no restriction preventing the values from reflecting "wall clock" time. This can be useful for "Simulations" that incorporate external "real-time" data values from databases or other sources such as weather observations.
A Sequencer can be created using the Sequential Sequencer Wizard of the Real-time Sequencer Wizard and edited using the Sequencer Editor.
Decorators
A Decorator is the framework's computational component. Every Decorator can participate in the initialization of the canonical graph, for instance by setting the values of existing labels or adding additional ones (i.e., they "decorate" the graph). Also, at each simulation cycle, they are responsible for determining the next state of the canonical graph by computing the values of labels as a function of the current Simulation time. In an epidemiological Simulation , a disease model would be implemented as a Decorator. There is no restriction on what a Decorator can do; it can, for instance, issue a query to a database or invoke a web service. There are several types of Decorators available in STEM.
- Decorators are Objects that act on the canonical graph by creating and/or updating Labels
- Decorators “decorate” the graph
- They Can act on the graph, nodes, or edges
- They are applied to the (geographic) model or scenario
- Examples of Decorator include
- Disease models (Integration Decorator)
- Population models (Integration Decorator)
- Dynamic population models
- Infectors, Innoculators, Initializers
- Triggers and modifiers
Integration Decorators
Integration Decorators are graph decorators or operators that change values in graph labels in an integrable way. Disease models and population models are defined in terms of ordinary differential equations that define the continuous disease and population state variables (population count vs time, herd immunity vs time, etc). These equations are integrable and STEM provides a choice of solvers (finite difference, Dormand Prince, etc) or engines that solve the equations defined within an integration decorator, updating the appropriate dynamic labels.
Transformation Decorators
Sometimes a user needs to define operations that affect graph labels in a discontinuous or non-integrable way. Transformation decorators are like integration decorators in that they that change values in graph labels, but are intended for situations that are not numerically integrable. An example of this might be a "Slaughter House". A slaughter house the process cattle to meat at a constant rate could be represented as an integration decorator but if the user wants the transformation to occur instantaneously, then the Transformation decorator must be used. The difference is in how the numerical solver engine handles the transformation. Numerical integration requires computation at variable time steps (and the time step is dynamically reduced to ensure the computation converges). Instantaneous or discontinuous transformations occur after one time step has converged (and before the next time step is started). A developer can chose to use transformation decorators where a transformation must be made discontinuously or instantaneously.
Initializers
Initializers are used to set the state of graph labels (usually the initial state at runtime). Initializers include:
- Population Initializer (used, for example, to set the initial population of chickens in a farm)
- Disease Model Initializers
- Infectors (set the initial infectious number or fraction in a population at a single node or for the whole graph)
- Inoculators (set the initial herd immunity (number or fraction) in a population at a single node or for the whole graph
- Disease Initilizers (a more flexible way to set the state of all disease compartments in a single view. E.g., Susceptible, Exposed, Infectious, Recovered, etc.)
- Disease Initilizers from an external file: A way to initialize a graph from a spreadsheet. Useful when defining different initial conditions for a large number of nodes.
- Graphical Editor: Graphs can also be initialized in other ways, for example using the graphical editor to create a user defined graph with label values already configured in a specific way.
Model
A Model is the component responsible for representing the contents of the canonical graph and for creating an instance of it when a Simulation is started from a Scenario. It combines with the final component of the framework called a Graph to form a tree. This tree is a hierarchical organization of the different contributions to the canonical graph. Model instances form the root and interior nodes of the tree while Graph instances form the leaves. The Model referenced by a Scenario is the root of such a tree.
Each Model contains three different collections. The first is a collection of "sub-Model" instances, each of which is essentially the root of a sub-tree. The second is a collection of Graph instances, and the third is another collection of Decorators.
A Model can be created using the Graph Wizard and edited using the Graph Editor.
Graph
Graph instances contain the actual components, Nodes, Edges, and Labels, that eventually will be contributed to a canonical graph. In the compositional framework, graph instances are not true mathematical "graphs"; they are better described as "graph fragments" as they may contain unresolved sets of Edges or Labels (and no Nodes). When the canonical graph is created, these fragments are combined and their content's connections eventually resolved (i.e., Edges and Labels will be mated with their appropriate "missing" Nodes). The resulting canonical graph is a true mathematical graph.
STEM contains prebuilt graphs for geographic administrative regions. A Graph may be also be created using the Graph Wizard.
Nodes
Nodes within a graph typically represent places. These might be entire countries, or smaller regions within countries.
Edges
Edges connect nodes. Physical containment (provinces within countries) are represented by Containment Edges. Physical Adjacency is represented by Common Border Edges. These edges are geometric or geographic facts. Today STEM uses these physical edges to model transportation. In the future we will provide other edges to represent connections that apply only to a particular population. For example, a model of Geese Migration will be represented by a graph of Migration Edges that apply only to Geese.
Labels
- Labels are objects containing simulation state data.
- They are attached to nodes and edges
- Some labels contain static (fixed values). Others contain dynamic (variable values)
- Integration labels provide for numerical integration of current and next values
- Labels contain one or more LabelValue objects with the label’s property values (the actual data)
- Examples of Static Labels include:
- Node Labels
- Census human population
- Area
- Elevation
- Historic Earth Science Data
- Day and Nightime Temperature, precipitation, etc
- Edge labels
- Physical Relationships
- Common Border
- Physical Containment
- Migration rate between nodes
- Transportation rate
- Node Labels
- Examples of Dynamic Labels include:
- Disease model compartment states
- Dynamic population models
- for example, vector capacity models of mosquitos
How Labels Work
- A disease model provides a Label and LabelValue
- A STEM simulation contains
- A cononical graph. The graph contains
- A single instance of the disease model (Decorator)
- A label for each region in geographic model
- A cononical graph. The graph contains
- A disease model’s label contains current and next LabelValue objects that store the compartment states for the node
- Example: A SIR disease model’s LabelValues contains s, i, and r properties
- The Solver calls calculateDeltas(…) on the Decorator, which is responsible for calculating the Label’s nextValue
- The solver integrates between the values
- Example: A SIR disease model’s LabelValues contains s, i, and r properties
Labels hold values that may be static or dynamic. Multiple Labels can be added to any node or edge within STEM. A static label might encode, for example, the land area of a geographic node. A dynamic label might store the current disease state of a population including, for example, herd immunity for a particular disease. This could change over time. Values within a (dynamic) label can be updated by other decorators. Dynamic Labels play a special role in the framework in that they can store two state values simultaneously. They have a "current" value which, collectively, records the current state of the graph. They can also have a "next" value which is used, collectively, to store the next state of the graph. This allows computation to occur synchronously and self consistently across the entire canonical graph.
Model Decorators
The collection of Decorators that may exist in each Model is similar to that contained in a Scenario instance. Its contents represent the computational component of the Model. The difference is that the Model Decorator are only able to modify the parts of the canonical graph that are contributed by the tree rooted at the Model. The Decorators in the Scenario, being above the root of the tree, are able to access the entire canonical graph. There is also a strict execution order of Model Decorators. The ones that are contributed lower in the tree are invoked before ones contributed above them. Scenario Decorators are invoked last. The order of invocation for Decorators at the same "level" is arbitrary.
A Model Decorator would typically be some computation that needs to be executed for each cycle of the Simulation, while a Scenario Decorator would typically be used to modify the initial state of the canonical graph to customize it for a particular scenario. For example, in an epidemiological Simulation, a disease model would be added to a Simulation as a Model Decorator while the exact location of an outbreak of a disease would be added to the Simulation by a Scenario Decorator. Many different Scenario instances could refer to the same Model (with its disease model), but provide different Decorators to specify different starting locations.
Decorators typically compute the next value for Labels in the canonical graph as a function of the current contents of the graph and the time. The Figure below illustrates how these components are combined to create a Model that is used by two different Scenarios. <img src="img/ScenarioComposition.jpg" />
Experiment
An Experiment is a specification of how to take a base Scenario and systematically modify it to create and rung a sequence of related Simulations. For instance, one might want to explore the effect of different transmission rates in a particular Scenario. An Experiment allows one to specify the (base) Scenario and a collection of Modifiers that know how to modify it in specific ways. The modified Scenario instances can then be used to create Simulations.
To initiate the creation of the derivative Scenarios and subsequent Simulations, an Experiemt can be run just liked a Scenario to create a type of execution unit called a Batch.
An Experiment can be created using the Experiment Wizard and edited using the Modifier Wizard.
Modifier
A Modifer is a specifcation of how to systmatically change the values of one ore more features of a Scenario. There are two types of Modifiers. A Range Modifier modifiers numeric features by assigning them values from a specified "range" of values, while Sequence Modifier modifies features by assigning them successive values from a prespecified ordered collection. A Modifier may be crated using the Modifier Wizard and edited using the Modifier Editor.
Range Modifier
A Range Modifier specifies a range of values for a numeric feature and an increment value. It modifies a numeric feature by first assigning it a starting value and then it subsequently assigns values generated by adding an increment value to the previously assigned value until the result exceeds a specified end value. If the start value is great than the end value, then the increment must be negative.
Sequence Modifier
A Sequence Modifier specifies a sequence of values for an arbitrary feature. It modifies the feature by assigning it, in sequential order, the values in the sequence.
Trigger
A Trigger is a special kind of Decorator that combines a Predicate with a reference to another Decorator. Its role is to conditionally execute the Decorator it references if certain conditions exist in the Simulation. Typically, the Decorator referenced by the Trigger will be a Modifier which will be configured to alter some aspect of the running Simulation. For instance, a Modifier could change the values on Labels on an Edge or Node in the canonical graph. Those values could represent such things as the operational status of an airport, or the status (open or closed of a road between two regions. A modifier can also modify another Decorator active in a Simulation. An example would be changing the configuration values of a Disease Model.
The operation of a Trigger is conceptually simple. On each simulation cycle, the Triggers contained by a Scenario are excuted, just like the other Decorators, to update the canonical graph. The first thing the Trigger does is evaluate the Predicate it references. If the Predicate evaluates to False, then the Trigger simply returns without performing any other actions. If, however, the Predicate evalutes to True, then the Trigger allows the Decorator it references to update the canonical graph. So long as the Predicate returns True, the referenced Decorator will be executed. If the Decorator is a Modifier, then it will step through each of its configured modifications each time it is activated until all modification have been completed.
A Trigger can be created using the Trigger Wizard and edited using the Trigger Editor.
Predicate
A Predicate is a boolean expression that returns True or False depending on testable conditions in a running Simulation and the expression itself. Instances of Predicates are reference by Triggers and their logical values are used to control the excution of a Decorator referenced by each Trigger.
Arbitrary logical expressions can be expressed in a Predicate, but currently, due to implementation limitations, the only testable conditions are the current and elapsed times in a running Simulation. In future versions of STEM, a richer set of conditions will be available. The testing of time is still an extremely useful condition as it allows for specific modifcations to be made to aSimulation at precise points in time.
A Predicate can be created using the Predicate Wizard and edited using the Predicate Editor.
Simulation Execution
When is Simulation is started, the first operation is to create the canonical graph. This is accomplished by recursively descending the tree rooted by the Model referenced by the Scenario. As the canonical graph is constructed, connections between Labels and Edges in the graph fragments are resolved and then each Model Decorator is invoked and given the opportunity to "decorate" the canonical graph as part of its initialization. When this is complete, the Scenario Decorators are invoked to give them the opportunity to decorate the canonical graph
When the canonical graph has been constructed and initialized by the Model and Scenario Decorators, the Simulation can begin its first cycle. The first step of the Simulation is to determine if the Simulation has completed its sequence of cycles. The answer to this question is provided by the Sequencer that is referenced by the Scenario from which the Simulation was started. If the answer is "No," and the Simulation should continue, the Sequencer will provide a value that represents the "time" of the next cycle. The STEM Simulation engine then takes that value and invokes each of the Model Decorators (in proper order) passing them the time value. They perform their computations and then the Scenario Decorators are invoked in the same manner. Frequently, the Scenario Decorators will not have any computations to perform after they have done their initialization, but there is no restriction that enforces this.
When the Decorators are finished, the Simulation engine tells the canonical graph to switch to its "next" state. This means that all of the Labels exchange their "current" and "next" values. This completes the first cycle. The process begins the next cycle with the Sequencer determining if the sequence of simulation cycles is complete.
Batch Execution
A Batch is created when an Experiment is executed. When a Batch runs, it first takes the base Scenario of the Experiment, applies the Modifiers referenced by the Experiment and creates a new derivative Scenario. It then initiates the execution of a Simulation from the derived Scenario. When that Simulation completes, the Batch repeats the process until the Modifiers indicate that there are no additional modifications to make, at which point the execution of the Batch is complete and it exits.
Example: Creating your own STEM Scenario from a user defined disease model
click here to see a tutorial on Creating a STEM Scenario
Standards
ISO-3166
The ISO 3166-1 Standard defines names and short alphabetic codes for "countries" and "territories" that corresponds the geopolitical divisions of the Earth. There are currently 244 such countries which is a larger number than the membership of the United Nations (191) and reflects the inclusion of territories in the mix. For instance, the standard includes the United States and Puerto Rico as separate entities. This level of representation is sometimes referred to as "United Nations Administration Level 0".
The standard defines two and three letter codes for each entry. A two letter code is referred to as an "alpha-2" code while a three letter code is an "alpha-3" code. Thus, the "ISO 3166-1 alpha-3" code for the United States is "USA" while the "ISO 3166-1 alpha-2" code is "US".
The ISO 3166-2 Standard defines names and alphabetic codes for subdivisions of countries and territories as defined in ISO 3166-1 . This level of representation is sometimes referred to as "United Nations Administration Level 1". Typically, these would be the states or provinces of a particular country. There are approximately 3700 different codes.
The ISO 3166-2 code is a compound code consisting of the ISO 3166-1 alpha-2 code of the parent country and a second country specific string separated by a hyphen. For instance, the code for the State of California in the United States of America is "US-CA". Similarly, the code for Kabul province in Afghanistan is "AF-KAB".
There is a standard called ISO 3166-3 which one might expect would provide codes for subdivisions of Level 1, but it doesn't. Instead, it defines how codes have replaced each other as political boundaries have evolved.
There is still a need in STEM to create codes that reflect Level 2 divisions (e.g., counties in the United States), STEM adopts the convention embodied in the ISO 3166-2 standard by extending the Level 1 code (e.g., "US-CA") with additional country specific identifiers separated by hyphens. In the United States, the Federal Information Processing Standard (FIPS) is the appropriate choice. It defines unique identifiers for each of the counties in the country. For instance, the FIPS code for Santa Clara County in California is "06085" (the leading "0" is significant). Thus, the "ISO" identifier used in STEM for Santa Clara County is "US-CA-06085".
In cases where the Level 2 codes for a particular country are unknown (an opportunity for an open source contribution) a special "generated" code was created with the intention that it would be replaced later by the correct code. This is the case for Afghanistan where the code for the city of Kabul was generated as "G140001" (the leading "G" identifies the code as having been generated and should be replaced when the correct code is obtained). For instance, the "ISO" code used in STEM for Kabul (the city) is "AF-KAB-G140001".
The ISO code used in STEM for a higher resolution area, such as a Census tract, would follow the same convention by appending a code to the code of the containing area separated by a hyphen. There are no such codes defined in STEM at this time.
Dublin Core Meta Data
The Dublin Core Meta Data standard was developed by the library science community to define a consistent set of attributes that could be attributed to a "Resource". The definition of a Resource is deliberately unspecified by the standard, but would typically be something like a book, a movie, a video, or a musical score. Most of the attributes are simple and easy to understand and basically specify familiar details of something like a book such as its "Title", its "Creator", and the "Date" it was created. The usefulness of the standard is extended by the rest of the attributes that define such things as "Spatial" characteristics, a "License" and a date range for which a Resource is valid; there are several others.
STEM makes use of the standard by incorporating it as a required feature of each composable component of the graph it uses as its representational framework for each Simulation. Thus, when a simulation is created, every component (i.e., all data and all computation) is specifically labeled and identified as to its origin and applicability.
The properties files that contain the data sets that define the built-in components of STEM can also contain Dublin Core Meta Data attributes.
Conventions
Sanity Checking
The Common Model contains an interface called SanityChecker
. This interface has a single method called sane()
which returns a boolean
. The method sane()
should return true
if the instance of the implementing class is not in error (i.e., it is "sane"). The definition of what "in error" means is completely up to the implementor of the method. The idea behind implementing this method is that it allows the state of the entire running system to be checked at run-time for errors. It is really an implementation of the class invariant paradigm of Design by Contract programming practice. The advantage of this approach to implementing it is that it is light weight, doesn't require preprocessing, is language independent and can be checked under direct programmer control.
In the case of STEM, all of the internal data structures representing the state of a Simulation implement the SanityChecker
interface. If STEM is run with assertions enabled, it will invoke sanity checking at the start of each simulation cycle and at a few other specific points. A single call of the sane()
method of a running Simulation instance will propagate throughout the entire set of class instances that represents the state of the canonical graph that represents the state of the Simulation. If any test fails, for instance, a population value was found to be negative, an assertion exception will be thrown and the Simulation will be halted. The exact location where the problem was found will be reported in the error log. This self-checking behavior makes STEM very robust.
Typically, the sane()
method will check that numeric values are within range or that other field values are appropriate. If the values are found to be correct, then the method returns true
, if not it returns false
.
The checking of the sanity of a class instance should also propagate to its children such that the sane()
method of the parent should call the sane()
method of the children. If a child returns false
the parent should return false
as well.
Typically, an assert
statement verifies that the return value is true
. This assertions makes it easy to find the point in the code where a problem is detected. In actual implementation, each test in a sane()
method would assert that it's result is true
. An assertion failure then pinpoints precisely the test that failed.
In the example below the sane()
method first calls the sane()
method of its super class. It then tests that each field is not null
, asserting after each test that the test resulted in a true
value.
public boolean sane() { boolean retValue = super.sane(); retValue = retValue && nodeAURI != null; assert retValue; retValue = retValue && nodeBURI != null; assert retValue; retValue = retValue && label != null; assert retValue; return retValue; } // sane
STEM Project Structure
- Split across three Git repositories
- org.eclipse.stem
- Most of the “code”, including core libraries, disease models, model generator, UI
- org.eclipse.stem.data
- The required STEM data, including geographies, maps, and human population data
- org.eclipse.stem.data.earthscience
- The larger Earth Science static datasets from 2000-2010
Plug in Descriptions
This section contains descriptions of each STEM plug-in/project.
org.eclipse.stem.core
The org.eclipse.stem.core plug-in contains the definitions and implementations of the Unified Modeling Language(UML) models that are used to represent the composable components that define a simulation in STEM. These models are Common, Graph, Model, Sequencer and Scenario. These models are implemented using the Eclipse Modeling Framework (EMF).
This project is tested by the project org.eclipse.stem.tests.core.
Common Model
One goal of STEM is to provide a framework for the interchange of components that can be combined to compose a model. To do this, the components need to be uniquely identified, and they need to be described in some consistent manner. The "common model" is an EMF model that addresses these two requirements. It defines a class Identifiable
that is extended by each of the other modeled components in STEM. Each instance of Identifiable
is uniquely "identified" by a URI and is described by a set of Dublin Core Meta Data attributes specified in the class DublinCore
.
The common model package also includes a non-modeled Java interface called SanityChecker
that defines the semantics of Sanity Checking.
Common Model UML Diagram goes here.
The ecore file that specifies the common model is: org.eclipse.stem.core.model/common.ecore.
Graph Model
Graph Model UML Diagram goes here
Model Model
Model Model UML Diagram goes here
Sequencer Model
Sequencer Model UML Diagram goes here
Scenario Model
Scenario Model UML Diagram goes here
Core GenModel
The EMF "genmodel" for the core plug-in is in the file: org.eclipse.stem.core/model/core.genmodel. It defines how to generate the code for all of the models.
Core Extension Points
The "core" plug-in contains five extension points. These provide a path to extending STEM by defining a set of XML files that contain serialized instances of components defined in the core EMF models.
There are two common schemas used to define the extension points.
Catagory Schema
The catagory
(sic) schema contains attributes that specify the location of a serialized component in a hierarchical name space.
Dublin Core Schema
The dublin_core
schema contains attributes defined by the Dublin Core Meta Data standard that describe the serialized component. There are three required attributes: "title", "identifier" and "format". The title is the text string that will be displayed to the user when the component is presented to them in varies views. The identifier is a URI that specifies the location of the serialized XML file so that it can be located and deserialized to create a Simulation (e.g., "platform:/plugin/org.eclipse.stem.geography/resources/data/country/AFG/AFG_0_area.graph"). The value of the format attribute is the "package namespace URI" (i.e., "eNS_URI") of the component's ecore model (e.g., "http:///org/eclipse/stem/core/graph.ecore").
org.eclipse.stem.core.graph
The org.eclipse.stem.core.graph
extension point is extended by plug-ins that provide serialized Graph files that are intended to be part of the collection of "built-in" Graphs.
org.eclipse.stem.core.model
The org.eclipse.stem.core.model
extension point is extended by plug-ins that provide serialized Model files that are intended to be part of the collection of "built-in" Models.
org.eclipse.stem.core.scenario
The org.eclipse.stem.core.scenario
extension point is extended by plug-ins that provide serialized Scenario files that are intended to be part of the collection of "built-in" Scenarios.
org.eclipse.stem.core.sequencer
The org.eclipse.stem.core.sequencer
extension point is extended by plug-ins that provide serialized Sequencer files that are intended to be part of the collection of "built-in" Sequencers.
org.eclipse.stem.core.decorator
The org.eclipse.stem.core.decorator
extension point is extended by plug-ins that provide serialized Decorator files that are intended to be part of the collection of "built-in" Decorators.
org.eclipse.stem.definitions
The org.eclipse.ohf.stem.definitions project contains two UML/EMF models that define components of graphs that are used in most simulations. The two models are called "Labels" and "Nodes". The project is also home to important "adapters" that provide interfaces that adapt the contents of the canonical graph and isolate other code from the implementation details of the graph.
labels EMF Model
The Labels EMF Model defines a set of common labels that can be used in simulations. These labels essentially provide a set of "types" for representing common data values.
The ecore file that specifies the model is: org.eclipse.ohf.stem.definitions/model/labels.ecore.
nodes EMF Model
The Labels EMF Model defines two sub-classes of org.eclipse.stem.core.graph.Node. The first is "GeographicFeature" which is intended to represent any type of geographic feature (road, river, region, etc.) and a direct sub-class of that called "Region". A Region is inteded to represent an enclosed area such as a country, state or county.
The ecore file that specifies the model is: org.eclipse.stem.definitions/model/nodes.ecore.
Adapters
Spatial Adapters
This includes adapters that provide latitude and longitude values for features in the canonical graph. The #Map_View and #org.eclipse.stem.ge projects make use of spatial adapters.
Relative Value Adapters
A relative value adapter interprets the numeric values stored in labels and provides a relative value between 0.0 (zero) and 1.0 (one). This can be used in generic viewers that provide visualizations of the state of the canonical graph.
org.eclipse.stem.diseasemodels
The org.eclipse.stem.diseasemodels project contains UML/EMF models of the "standard" disease models that are implemented for STEM. These models include deterministic implementations of "SI", "SIR" and "SEIR" disease models.
Disease Model Extension Points
disease
diseasemodel
org.eclipse.stem.diseases
The org.eclipse.stem.diseases project/plug-in contains serialized (XMI) instances of configured disease models instances. These serialized files each define a single disease such as the Spanish Flu. They are identified ("plugged in") to STEM by extensions to the org.eclipse.stem.diseasemodel.disease extension point.
org.eclipse.stem.doc
The org.eclipse.stem.doc project contains the STEM help files.
org.eclipse.stem.geography
The org.eclipse.stem.geography contains data sets that define the geography of the Earth. It contains XML files that are serialized instances of EMF models defined in the org.eclipse.stem.definitions and org.eclipse.stem.core projects. It also contains XML files in GML format that contain Latitude/Longitude data for the political and geographic boundaries of countries and their territories as defined by ISO 3166 standard.
Its contents are entirely generated from the org.eclipse.stem.internal.data project by running the update.xml ant build file (This is not currently true as the GML files currently reside in the geography project, but the plan is to move them to the internal.data project so that different resolutions of the files can be produced)
org.eclipse.stem.internal.data
The org.eclipse.stem.internal.data project contains all of the basic data sets used by STEM. These are transformed using an Ant build file called update.xml to populate the org.eclipse.stem.geography project.
update.xml
The file update.xml is an Ant build file that transforms the data sets described below and populates the org.eclipse.stem.geography project.
Properties Files
In STEM, the data sets that define the built-in components that represent countries, populations, transportation networks and their relationships and other attributes are specified in a set of "properties files". A property file is simply a plain-text file that contains a set of keyword/value assignments. They are human readable and easy to edit.
Each property file needs to have a key of RECORD_CLASSNAME
with its value being the name of the Java class that interprets the contents of the properties file. An instance of the named class will be created and then given the responsibility of processing the rest of the file's contents.
A property file specifying details of a particular country requires a keyword of ISOKEY
and ADMIN_LEVEL
to be provided. The value of ISOKEY
is the ISO 3166-1 alpha-3 code of the country (e.g., BHS). The value of the ADMIN_LEVEL
keyword is the administration level of the ISO keys in the file (e.g., 0).
Each property file can contain a full set of Dublin Core Meta Data attributes. These are used, among other things, to identify and describe the nature of the data set. The values of the attributes are set by assigning values to special keys recognized as Dublin Core attributes. These values flow through the processing of the data sets and ultimately become integrated into the canonical graph.
The recognized attributes are:
Key | Description | Example | Automatically Generated if not supplied | Used/Recommended |
BIBLIOGRAPHIC_CITATION | Specifies a citation for the data. | |||
CONTRIBUTOR | Specifies a contributor to the data. | |||
COVERAGE | Specifies the coverage of the data. Typically, this is a comma separated list of administration levels. | Yes | ||
CREATED | Specifies when the data was created. | 1900-01-01 | Yes | |
CREATOR | Specifies who created the data. | Yes | ||
DATE | Specifies a date of the data. | 1900-01-01 | ||
DESCRIPTION | Specifies a description of the data. | |||
FORMAT | Specifies the format of the the data. | Yes | ||
IDENTIFIER | Specifies the URI of the the data. | Yes | No | |
LANGUAGE | Specifies the language of the data. | No | ||
LICENSE | Specifies the license of the data. | |||
PUBLISHER | Specifies the publisher of the data. | |||
RELATION | Specifies any relationships of the data. | |||
REQUIRED | Specifies any requirements of the data. | |||
RIGHTS | Specifies any rights associated with the data. | |||
SOURCE | Specifies the source of the data. | http://www.census.gov/geo/www/tiger/ | ||
SPATIAL_URI | Specifies the file that contains the latitude/longitude data. | platform:/plugin/org.eclipse.stem.geography/resources/data/geo/country/USA/USA_2_MAP.xml | ||
SUBJECT | Specifies the subject of the data. | |||
TITLE | Specifies the title of the relationship data. | All USA States (1) (except Puerto Rico) | Yes | |
TYPE | Specifies the type of the data. | No | ||
VALID | Specifies the date range that the data is valid for. | start=1900-01-01; end=2001-12-31; |
Administrative levels correspond to political divisions of a country. A level 0 administration area identifies an entire country (e.g. USA or Mexico). A level 1 administration area corresponds to a subdivision of a level 0 area, and can be a state, territory, parish, or a province. As an example, the state of California is a level 1 administration area of the USA. Level 1 administration areas are in turn subdivided into level 2 areas, which in the United States would be a "county" such as Santa Clara County in California.
Country Definitions
The files defining attributes for a particular country are organized by the ISO 3166-1 definition of a country using the alpha-3 codes as the names of directories containing the files. The country directories can be found in org.eclipse.stem.internal.data/resources/data/country. For each country, there are four types of properties files, these define administrative subdivisions of the country, the areas of the subdivisions, their populations, and finally their names. The contents of these files ultimately define Nodes and their Labels in the canonical graph.
Administrative Areas
The administrative subdivisions of a country are specified in a set of property files, one for each level. The names of these files are of the form: "XXX_Y_nodes.properties", where "XXX" is the ISO 3166-1 alpha-3 code of the country (the same as the name of the parent directory of the file) and "Y" is a digit representing the administrative level of the subdivisions defined in the file. Thus, the file "USA_0_node.properties" contains the specification of the United States at level 0. This file defines a single Node that represents the entire country. Similarly, the file "USA_1_node.properties" contains the specification of the States of the United States and the District of Columbia and defines fifty one nodes. The file "USA_2_node.properties" defines the Counties of the United States and contains definitions for 3142 nodes. Other countries are defined similarly, though the data set is not complete so some have fewer levels of specification.
The format of the properties files defining the administrative areas of a country consist of a series of keys corresponding to the standard ISO 3166 identifier for the level of the area.
Names
This file defines the identifiers for every administrative division in a country at each level. For example, for the USA, at level 0 we would have "USA = United States". At level 1, we would have "US-CA = California" , "US-CO = Colorado", and "US-NY = New York". At level 2, for Orange, Monterey, and Napa counties within California we have "US-CA-06059 = Orange County", "US-CA-06053 = Monterey County", and "US-CA-06055 = Napa County". There is a single names property file for every country. The corresponding names property file for the USA is USA_names.properties. For level 2 administrations, the five digits found on the identifiers (i.e. "06053" for identifier "US-CA-06053" are defined as follows: the leftmost two digits identify the level 1 administration ( "06" -> California ) while the remaining digits, which can be up to four, identify the level 2 administration ("053" -> Monterey County).
Area
This file contains area data (in square kilometers) for administrative divisions. There is an area property file for each administration level. In the case of the USA at level 0, we have "USA = 9161923". At level 1, we have "US-CA = 163695.57", "US-CO = 104093.57", and "US-NY = 54556.00" for California, Colorado and New York respectively. At level 2, for Orange, Monterey, and Napa counties within California we have "US-CA-06059 = 2043.5006", "US-CA-06053 = 8603.9405", and "US-CA-06055 = 1952.8510". The area property files corresponding to the USA are USA_0_area.properties, USA_1_area.properties, and USA_2_area.properties.
Populations
This file contains population data for administrative divisions. There is a population property file for each administration level. In the case of the USA at level 0, we have "USA = 298444215". At level 1, we have "US-CA = 33871648 ", "US-CO = 4301261", and "US-NY = 18976457" for California, Colorado and New York respectively. At level 2, for Orange, Monterey, and Napa counties within California we have "US-CA-06059 = 2846289", "US-CA-06053 = 401762", and "US-CA-06055 = 124279".
The population property files corresponding to the USA are USA_0_human.properties, USA_1_human.properties, and USA_2_human.properties.
Relationship Definitions
The property files that define relationships are organized by relationship. They are contained in directories whose names reflect the nature of the relationship. For instance, "airtransporthuman2006" defines the transportation of humans using aircraft for the year 2006. The relationship directories are in org.eclipse.stem.internal.data/resources/data/relationship. The contents of the relationship files ultimately define Edges in the canonical graph.
Built-in Model Definitions
Built-in Scenario Definitions
Built-in Sequencer Definitions
org.eclipse.stem.jobs
The org.eclipse.stem.jobs
plug-in contains the implementations of Eclipse Jobs used by STEM. Currently, there is only a single Simulation job that implements the main simulation cycle. STEM can run multiple simulations simultaneously and uses a separate job instance for each one.
A running instance of STEM contains a singleton instance of a class called SimulationManager
. This class has the responsibility for managing the life-cycle of each Simulation running in STEM. It allows other components in STEM to discover what Simulations currently exist and to monitor changes that affect their existence.
org.eclipse.stem.jobs.nl1
The org.eclipse.stem.jobs.nl1 project is a plug-in fragment containing NLS files for the project org.eclipse.stem.jobs.
org.eclipse.stem.sample
org.eclipse.stem.sequencers
The org.eclipse.stem.sequencers project contains serialized XML files of instances of EMF models the define Sequencers.
org.eclipse.stem.tests.core
The org.eclipse.stem.tests.core project contains JUnit tests for the project org.eclipse.stem.core.
org.eclipse.stem.tests.definitions
The org.eclipse.stem.tests.definitions project contains JUnit tests for the project org.eclipse.stem.definitions .
org.eclipse.stem.tests.diseasemodels
The org.eclipse.stem.tests.diseasemodels project contains JUnit tests for the project org.eclipse.stem.diseasemodels .
org.eclipse.stem.tests.jobs
The org.eclipse.stem.tests.jobs project contains JUnit tests for the project org.eclipse.stem.jobs .
org.eclipse.stem.tests.ui
The org.eclipse.stem.tests.ui project contains JUnit tests for the project org.eclipse.stem.ui .
org.eclipse.stem.tests.util
The org.eclipse.stem.tests.util project contains utility code for testing purposes.
org.eclipse.stem.ui
Perspectives
Simulation
Designer
Analysis
The Analysis and Validation Perspective allows users to perfrom analysis, fitting, model comparison, and validation across multiple simulations and data sets. The Analysis Perspective uses data from from either complete simulation runs or data imported into STEM. The latter could be real bio-surveillance data in the approporiate CSV format or data from a completed simulation.
This perspective holds the windows (and associated tools) that support various analysis operations on existing data. If you don't see the perspective when you start STEM, go to the menu bar and click:
>Window>Open Perspective>Analysis
By default the Analysis Perspective contains three views. However, you can close any of these views and open others.
STEM currently enables the following types of analysis:
1. Estimating Model Parameters from External Data. This view is used to estimate model parameters from an existing data set, either simulated or real.
2. The Epidemic View. This view aggregates data from across locations for a given epidemic scenario and plots the aggregated data and disease incidence.
3. Root Mean Square (RMS) Comparison Between Data Sets. This measures the RMS difference between two existing data sets (simulated or real).
4. Lyapunov Analysis. This comparison data from two existing scenarios (or real data sets) based on their trajectories in a Lyapunov Phase Space.
As shown in the Figure below, each type of analysis is available as a separate "View" or tabbed pane within the Analysis Perspective.
Image:Analysisperspective.jpg 800px
Views
Active Simulations
Control
Decorators
Graphs
Google Earth
Map
Models
Reports
Scenarios
Wizards
New Disease Wizard
New Graph Wizard
New Model Wizard
New Project Wizard
New Scenario Wizard
New Real-Time Sequencer Wizard
New Sequencer Wizard
Editors
Graph Editor
Model Editor
Scenario Editor
org.eclipse.stem.ui.diseasemodels
org.eclipse.stem.ui.ge
The org.eclipse.stem.ui.ge project contains the implementation of the STEM/Google Earth interface.
org.eclipse.stem.ui.nl1
The org.eclipse.stem.ui.nl1 project is a plug-in fragment containing NLS files for the project org.eclipse.stem.ui.
org.eclipse.stem.ui.reports
The org.eclipse.stem.ui.reports project contains preliminary code that implements BIRT graphs and charts of STEM data sets.
org.eclipse.stem.utility
The org.eclipse.stem.utility project contains utility code used for data set transformation etc.