Index based model compare match engine
Mentor: Cédric Brun
Student: Stefan Leopold
This project is part of the Google Summer of Code 2010
Model elements matching is the most critical phase during model comparison regarding performance and memory consumption. Further improvement of the current EMF Compare GenericMatchEngine and the adoption and integration of new ideas and concepts in this part of the EMF Compare framework can largely help to get better scalability.
Providing out-of-the-box solutions is essential in today's business, especially if they can be adapted and - if required - customized easily. EMF Compare already provides such experience for usecases of different size and of different complexity. Live model editing with comparison support further raises the expectations - and requirements with regard to performance and scalability.
Looking at the current implementation - while having these changed requirements in mind - reveals some "open space for improvement", e.g.
- for the computation of contentSimilarity, for each object the same "key" - as long as the model object (live editing!) and the filter of the GenericMatchEngine (adapted during MatchEngine run?) remains unchanged - is computed several times, so computing and maintaining an model object index could largely improve performance,
- the current implementation of the nameSimilarityMetric in der GenericMatchEngige is not symmetric (in many cases it is but not always, e.g. bbb->abb 100% match, abb->bbb 50% match - algorithm works pair by pair!), having the guarantee it is symmetric, some optimization to the MatchEngine could be applied easily,
So an in-depth analysis of the current implementation combined with the adoption of new ideas and/or other existing concepts (there seem to be synergy's e.g. with EMF Index) may be a first step for better integration of EMF Compare with live model editing, even with huge model graphs.
The main objective of this project is to measurably improve the performance of the EMF Compare model matching algorithm. This improvement will be documented by benchmarks performed on provided "Real-World" data.
|M1||1-4||research||identifying performance problems and analysing bottlenecks, preparing test data, suggesting improvements|
|M2||5-8||implementation||coding of prototypes and patches with accepted improvements|
|M3||9-12||integration||committing work, providing benchmark report with performance impacts|