Jump to: navigation, search

Difference between revisions of "Index based model compare match engine"

(New page: == Abstract == Model elements matching is the most critical phase during model comparison regarding performance and memory consumption. Further improvement of the current EMF Compare Gene...)
 
Line 1: Line 1:
== Abstract ==
+
Mentor: Cédric Brun
 +
 
 +
Student: Stefan Leopold
 +
 
 +
This project is part of the [[Google Summer of Code 2010]]
 +
 
 +
== Abstract ==
  
 
Model elements matching is the most critical phase during model comparison regarding performance and memory consumption. Further improvement of the current EMF Compare GenericMatchEngine and the adoption and integration of new ideas and concepts in this part of the EMF Compare framework can largely help to get better scalability.
 
Model elements matching is the most critical phase during model comparison regarding performance and memory consumption. Further improvement of the current EMF Compare GenericMatchEngine and the adoption and integration of new ideas and concepts in this part of the EMF Compare framework can largely help to get better scalability.
 +
 +
== Detailed Information ==
 +
 +
Providing out-of-the-box solutions is essential in today's business, especially if they can be adapted and - if required - customized easily. EMF Compare already provides such experience for usecases of different size and of different complexity. Live model editing with comparison support further raises the expectations - and requirements with regard to performance and scalability.
 +
 +
Looking at the current implementation - while having these changed requirements in mind - reveals some "open space for improvement", e.g.
 +
* for the computation of contentSimilarity, for each object the same "key" - as long as the model object (live editing!) and the filter of  the GenericMatchEngine (adapted during MatchEngine run?) remains unchanged - is computed several times, so computing and maintaining an model object index could largely improve performance,
 +
* the current implementation of the nameSimilarityMetric in der GenericMatchEngige is not symmetric (in many cases it is but not always, e.g. bbb->abb 100% match, abb->bbb 50% match - algorithm works pair by pair!), having the guarantee it is symmetric, some optimization to the MatchEngine could be applied easily,
 +
* ...
 +
 +
So an in-depth analysis of the current implementation combined with the adoption of new ideas and/or other existing concepts (there seem to be synergy's e.g. with EMF Index) may be a first step for better integration of EMF Compare with live model editing, even with huge model graphs.
 +
 +
== Deliverables ==
 +
 +
The main objective of this project is to measurably improve the performance of the EMF Compare model matching algorithm. This improvement will be documented by benchmarks performed on provided "Real-World" data.
 +
 +
== Timeline  ==
 +
 +
{| style="text-align: center" class="wikitable FCK__ShowTableBorders"
 +
|- style="background: #efefef"
 +
! Milestone
 +
! Weeks
 +
! Step
 +
! Details
 +
|- style="background: lightgrey"
 +
! M1
 +
| 1-4
 +
| research
 +
| align="left"| identifying performance problems and analysing bottlenecks, preparing test data, suggesting improvements
 +
|- style="background: lightgrey"
 +
! M2
 +
| 5-8
 +
| implementation
 +
| align="left"| coding of prototypes and patches with accepted improvements
 +
|- style="background: lightgrey"
 +
! M3
 +
| 9-12
 +
| integration
 +
| align="left"| committing work, providing benchmark report with performance impacts
 +
|}
 +
 +
== References ==
 +
 +
[1] [http://wiki.eclipse.org/index.php/EMF_Compare EMF Compare]<br>
 +
[2] [http://www.eclipse.org/modeling/ EMF]<br>
 +
[3] [http://www.eclipse.org/forums/index.php?t=thread&frm_id=108 EMF Forum]<br>
 +
 +
[[Category:SOC]]

Revision as of 14:48, 27 April 2010

Mentor: Cédric Brun

Student: Stefan Leopold

This project is part of the Google Summer of Code 2010

Abstract

Model elements matching is the most critical phase during model comparison regarding performance and memory consumption. Further improvement of the current EMF Compare GenericMatchEngine and the adoption and integration of new ideas and concepts in this part of the EMF Compare framework can largely help to get better scalability.

Detailed Information

Providing out-of-the-box solutions is essential in today's business, especially if they can be adapted and - if required - customized easily. EMF Compare already provides such experience for usecases of different size and of different complexity. Live model editing with comparison support further raises the expectations - and requirements with regard to performance and scalability.

Looking at the current implementation - while having these changed requirements in mind - reveals some "open space for improvement", e.g.

  • for the computation of contentSimilarity, for each object the same "key" - as long as the model object (live editing!) and the filter of the GenericMatchEngine (adapted during MatchEngine run?) remains unchanged - is computed several times, so computing and maintaining an model object index could largely improve performance,
  • the current implementation of the nameSimilarityMetric in der GenericMatchEngige is not symmetric (in many cases it is but not always, e.g. bbb->abb 100% match, abb->bbb 50% match - algorithm works pair by pair!), having the guarantee it is symmetric, some optimization to the MatchEngine could be applied easily,
  • ...

So an in-depth analysis of the current implementation combined with the adoption of new ideas and/or other existing concepts (there seem to be synergy's e.g. with EMF Index) may be a first step for better integration of EMF Compare with live model editing, even with huge model graphs.

Deliverables

The main objective of this project is to measurably improve the performance of the EMF Compare model matching algorithm. This improvement will be documented by benchmarks performed on provided "Real-World" data.

Timeline

Milestone Weeks Step Details
M1 1-4 research identifying performance problems and analysing bottlenecks, preparing test data, suggesting improvements
M2 5-8 implementation coding of prototypes and patches with accepted improvements
M3 9-12 integration committing work, providing benchmark report with performance impacts

References

[1] EMF Compare
[2] EMF
[3] EMF Forum