The Model Discovery (MoDisco) Component: A Proposal to move the project in Eclipse/Modeling Component
By Jean Bézivin, Hugo Brunelière – INRIA (ATLAS Group)
By Gabriel Barbier, Frédéric Madiot - MIA-Software
The move to the Modeling project takes MoDisco out of research area and reveals industry-ready aspects. The goal of the Generative Modeling Technologies (GMT) project is to produce a set of prototypes in the area of Model Driven Engineering (MDE). GMT is the official research incubator project of the top-level Eclipse Modeling Project. The Eclipse Modeling Project focuses on the evolution and promotion of model-based development technologies within the Eclipse community by providing a unified set of modeling frameworks, tooling, and standards implementations.
The goal of MoDisco (Model Discovery) is to allow practical extractions of models from legacy systems. Because of the widely different nature and technological heterogeneity of legacy systems, there are several different ways to extract models from such systems. MoDisco proposes a generic and extensible metamodel-driven approach to model discovery. A basic framework and a set of guidelines are provided to the Eclipse contributors to bring their own solutions to discover models in various kinds of legacy.
Due to the highly diversified nature of the considered legacy, MoDisco is a collaborative component involving many organizations. Each of them will bring its own expertise in a given area. The modular architecture of MoDisco, integrating OMG/ADM standards such as KDM, will allow integrating all these contributions.
As a GMT component, MoDisco will make good use of other GMT components or solutions available in the Eclipse Modeling Project (EMF, M2M, M2T, GMF, TMF, etc), and more generally of any plugin available in the Eclipse environment.
Systems are becoming more and more complex. Developing and managing such complex systems already is a main issue. The next important effort is about reverse engineering complex legacy systems in order to be able to migrate them, make them interoperable, or simply understand them. The proposed MoDisco component is mainly about providing an extensible and generic framework under the Eclipse GMT project, part of the top-level Eclipse Modeling Project. The Eclipse GMT project acts as a research incubator for MDE prototypes. Thus, the MoDisco component aims at providing a base framework for model driven reverse engineering tasks. We will discuss the framework’s composition further in this document. One of the key to success of this extensible framework will be its adoption by leading industrials and the development of a wide variety of extensions and a wide user community.
Principles of model discovery are based on a metamodel-driven approach (see Figure 1). It means that every step is guided by a metamodel. Thus, the very first step of a model discovery process is always to define the metamodel corresponding to the models you want to discover. This step is common to all kinds of systems.
Then, the second step is about creating one or many discovering tools that will be called “discoverers” in this document. These tools extract necessary information from the system in order to build a model conforming to the previously defined metamodel. The way to create these discoverers is often manual but can also be semi-automatic. The output of a discoverer is a model, in XMI format for instance.
See directly existing discoverer :
And use cases :
- Model filter
MoDisco aims at providing a platform supporting various legacy modernization use-cases for various kinds of existing technologies.
To facilitate reuse of components between several use-cases, MoDisco is organized in three layers:
- the Use-Cases layer containing components providing a solution for a specific modernization use-case.
- the Technologies layer containing components dedicated to one legacy technology but independent from the modernization use case.
- the Infrastructure layer containing generic components independent from any legacy technology.
This layer contains components supporting legacy modernization use-cases. The kind of use-cases MoDisco could support is quite infinite. Nevertheless, the main ones are well identified:
- Comparison : comparing two versions of the same application at a structural level
- Quality Analysis: detection of anti-patterns in existing code and computation of metrics.
- Cartography: detection of the main components of a system and their dependencies.
- Understanding: extraction of features presented independently from the implementation (structure, behaviour, persistence, data-flow …).
- Reverse-Modeling: creation of models out of existing systems to populate modeling tools (supporting UML or Domain-Specific Languages).
- Refactoring: improvement of the source code to integrate better coding norms or design patterns.
- Migration: transformation of the source code to change the framework, the language, or the architecture of existing applications.
Other more specific use-cases may be supported as, for example, extraction of business rules from programs to populate a business-rules engine, modification of an existing system to better integrate with another system, etc…
The technology layer contains components dedicated to one legacy technology.
These components can be reused between several use-cases components involving the same legacy technology. For example, a use-case computing metrics on Java source code and another providing refactoring for Java applications could reuse the same component.
Use-cases generally involve only one legacy technology (the one used to implement the existing system). Nevertheless, some use-cases can involve several legacy technologies. It is the case when the existing system is heterogeneous, built with several languages. For example, when a system, implemented with Java, stores data into a relational database using JDBC, the use-case may need MoDisco components able to analyse Java and SQL source code.
Each technology component is composed of, at least, a metamodel of the dedicated technology. This metamodel describes the elements required to support modernization use-cases for the corresponding technology. Depending on the kind of use-case the metamodel can be complete (for refactoring or migration) or partial (for cartography or some quality analysis scenarii).
Ideally, the metamodel is completed by a discoverer, a component which aims at building models conforming to the metamodel from artefacts of an existing system. Artifacts analysed by discoverers are not necessarily source code files. There exists multiple other ways a discoverer can find the information needed to create a model of an existing system :
- analysing parameter files
- analysing execution logs
- unzipping an archive to access its contents
- querying a database
- using APIs to access a tool with which the system has been designed
- translating data provided by a reverse-engineering tool into model
- transforming models provided by another discoverer
A MoDisco Technology component can come with utilities dedicated to its metamodel:
- Browsers to navigate the models more easily
- Viewers to represent the models graphically
- Computation of standard metrics
- Transformation to standard metamodels (ASTM, KDM, UML …)
- Generator to regenerate the initial artefact from its model
The Infrastructure layer aims at providing components independent from use-cases and legacy technologies.
There will be two kinds of components in this layer :
MoDisco Knowledge components
These components provide metamodels describing legacy systems independently from their technology. Like components of the Technology layer, these components can come with discoverers or utilities.
Examples of MoDisco Knowledge components are the metamodels from OMG/ADM :
- KDM (Knowledge Discovery Metamodel)
- ASTM (Abstract Syntax Tree Metamodel)
- SMM (Software Metrics Metamodel).
MoDisco Technical components
These components are utilities to build or facilitate the use of all the other components.
Examples of MoDisco Technical components are :
- abstract discoverers from which concrete discoverers can derive
- a file-system metamodel describing the organization of files and directories
- a model browser facilitating the visualization of MoDisco models
Benefits of This Approach
What are the benefits of the MoDisco approach compared to already existing reverse engineering tools? First, MoDisco proposes a unified approach to model-driven reverse engineering and a metamodel driven methodology. This way, we are able to work in the modeling world, coming from a heterogeneous world to a homogeneous one. The target model engineering space already proved its adaptability and scalability by several experiments to match requirements for data integration, tools interoperability and platform migration. Moreover, the well structured modeling world allows easy manipulation of many different concepts in a unified way. For instance, every model can be transformed, weaved, extracted with the same tool set. As those operations are defined upon models’ metamodels, they are reusable for different use cases.
We propose that the MoDisco component be undertaken as part of Eclipse Modeling Project (EMP). The initial list of committers, contributors and interested parties will be completed and provided later. MoDisco is supported by the ModelPlex European Integrated Project (FP6-IP #034081).
A first stable implementation of the infrastructure framework is currently available. The documentation of provided tools is various and cover several aspect of Modernization (reverse modeling, reverse documentation, migration, etc.).
- First draft, July 15, 2009