(Code) Recommenders Project Proposal
This page contains the draft for the Eclipse Code Recommenders Project proposal, and as such appreciates your comments. Furthermore, we would be glad if you put your name on the "Interested Parties" section if you like to support this project proposal :-) If wiki comments aren't appropriate you may send an email to email@example.com.
This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process document) and is written to declare the intent and scope of the proposed (code) recommenders project. This proposal is written to solicit additional participation and input from the Eclipse community. If you have any comments on the project or want to join the project please send your feedback to <url> forum.
This proposal is structured as follows. Section “Background” gives the motivation of the project and provides some background information about the origins of the proposed project, namely, the Code Recommenders Project developed at Darmstadt University of Technology. Section “Initial Contributions” describes the current state of the project and the initial contributions that will be made. Section “Scope” outlines the initial set of tools and platforms this project aims to deliver to its users; "Description" gives little more details on the intermediate goals. Section "Related Eclipse Projects" describes potential future connections between current Eclipse Projects and the Code Recommenders project as well as likely collaborations. The remaining sections (Committers, Mentors, Interested Parties, Additional Information) describe what their names suggest.
Under the right circumstances, groups are remarkably intelligent and are often better than the smartest person in them. - James Surowiecki: Wisdom of the Crowds
Application frameworks have become an integral part of today’s software development—this is hardly surprising given their promised benefits such as reduced costs, higher quality, and shorter time to market. But using an application framework is not free of cost. Before frameworks can be used efficiently, software developers have to learn their correct usage which often results in high initial training costs.
To reduce these training costs, framework developers provide diverse documentation addressing different information needs. Tutorials, for instance, describe typical usage scenarios, and thus give the application developer an initial insight into the workings of the framework. However, their benefit quickly disappears when problems have to be solved that differ from standard usage scenarios. Now, API documentation becomes the most important resource for software developers. Documentation is scanned for hints relevant for the own problem at hand but if it does not provide the required information, the most costly part of the research begins: The source code of other programs is investigated that successfully used the framework in a similar way. But learning correct framework usage from these real-world examples is difficult. The problem with these examples is that they also contain application-specific code that obscures the view on what is really important for using the framework. This significantly complicates the understanding process which makes the training a challenging and time-consuming task again. However, source code of other applications seems to be a valuable source of information. Code-search engines like Google Codesearch or Krugle experience their hype not least because existing framework documentation seems insufficient to support developers on their daily work.
But despite their widespread use, it’s an open question whether code-search engines solve the problem of missing documentation in a satisfactory manner. When looking at how developers use code-search engines [cite holmes 2009], it turns out that they rarely create a single query and study just a single example; instead, they typically refine their queries several times, investigate a number of examples, compare them to each other and try to extract a pattern that underlies all these examples, i.e., a common way how to use the API in question.
Although this task is very time-consuming, analyzing example code seems worth doing. Apparently, example code must provide some important insights in how to use a given API. Given this observation, the question is raised whether such important information can be extracted from example code automatically, i.e., without large manual effort. And furthermore if valuable information can be found, how can these findings made accessible to support developers on their daily work.
The Code Recommenders project developed at Darmstadt University of Technology investigates exactly these two questions. In a nutshell, tools are developed that automatically analyze large-scale code repositories, extract various interesting data from it and integrate this information back into the IDE where it is reused by developers on their daily work. The vision of the project is to create a context-sensitive IDE that learns from what is relevant in a given code situation from its users and, in turn, give back this knowledge to other users. If you like, you may think of it like a collaborative way of sharing knowledge over the IDE.
This Eclipse proposal is the next step towards the goal to build next generation of collaborative IDE services, which we call “the IDE 2.0” – inspired by the success of Web 2.0. The complete vision and explanation of the IDE 2.0 to web 2.0 analogy is described in “IDE 2.0: Collective Intelligence in Software Development” published at the Working Conference on the Future of Software Engineering Research (FoSER) 2010”.
One of the major goals of this project is to make a new generation of tool ideas accessible and usable by the Eclipse community, to further improve these tools based on the user feedback obtained or even to build completely new tools based on the experiences and developer needs.
So far, a couple of steps towards IDE 2.0 have been accomplished, some of which we will describe briefly in Section “Initial Contributions”. These tools, however, have to prove themselves as being useful. To allow this evaluation this project aims to (i) provide a platform for innovative IDE features that leverage the wisdom of the crowds, (ii) build a very vibrant community around IDE 2.0 services based on Eclipse, and (iii) provide an open platform allowing every community member to actively contribute to these services and to build and evaluate new tools based on the data contributed by the community itself.
The initial scope of this project is to provide tools for the following topics:
- Intelligent Code Completion Systems:
Code Completion Systems pretty good in showing a developer all possible completions in a given context. However, sometimes these proposals can be overwhelming for novice developers. Goal of this project is to develop completion engines that leverage the information how other developers used certain types in similar context and thus are capable to filter OR rearrange proposals according to some relevance criterion (similar to Mylyn's Context model but learning this relevance judgment based on how thousands of users used a given API). read more...
- Smart Template Engines:
The well-known SWT Templates are pretty helpful for developers not familiar with all details of SWT. Unfortunately creating such templates is a tedious and time-consuming task. Consequently the number of such code templates is rather small. However, code of existing applications contains hundreds of frequently reoccurring code snippets that can be extracted and shared among developers. This project will provide tools that support developers finding (for instance) method call chains for situations like "How do I get an instance of IStatusLineManager inside a ViewPart" and will allow them to share such templates with other developers.
- Usage-Driven API Documentation:
API documentation, independent of how much time has been spent on writing them, lacks the information how developers actually use these APIs. This information, however, can be easily extracted from code that uses the APIs in questions, and thus could be used to enrich existing API documentation with real usage driven documentation. Code Recommenders aims to develop tools for finding and sharing this kind of knowledge among developers. read more...
- Stacktrace Search Engine:
Exceptions occur. Apache Maven, for instance, reflects this reality by providing wiki pages for frequently occurring build exceptions which aim to explain why these exceptions may have occurred during a Maven build and how to fix them. This concept is a pretty neat idea but its potential is not exhausted yet. Currently the matching between an exception occurring during a build and a wiki page is done based on the type of the exception (e.g., BuildException, IllegalArgumentException etc.) This matching is rather coarse-grained and neglects the fact that the same exception might occur in many different locations and may be caused by many different reasons. First experimental results have shown that leveraging much more information like the stackframe elements and exceptions messages etc. yield a system that is capable to find very similar exceptions and thus allows building a new kind of search engine for stacktraces. This project aims to develop such a stacktrace search engine and provide integrations of this engine into existing web platforms like the Eclipse forums and others.
- API Misuse / Bug Detector:
When using APIs unfamiliar with we often misuse a given API, i.e., we forget to call certain methods or pass wrong parameters to a method call etc. These mistakes are hard to find and debug. Tools like PMD and FindBugs do a great job on finding issues like NULL pointers, or recommend overriding hashCode along with equals but aren't a big help if framework specific usage rules are violated. However, research tools exist that are capable to find strange API uses, i.e., usages which significnatly differ from how most people used a certain API and thus may indicate possibly bugs in code. This project aims to provide an evaluation for such tools and will provide an initial system as baseline read more
However, the scope of the recommenders project is not limited to such kind of tools and encourages the community discuss new ideas of tools that might be helpful for software engineers.
There are dozens of (research) projects that leverage collective intelligence in one way or the other, and the code recommenders project developed at Darmstadt University of Technology is just one of them. However, an open vendor-neutral Eclipse project may be a perfect place for these tools to contribute to Eclipse and to evaluate their approaches within a vibrant user community.
But every Eclipse incubator project has to start with an initial contribution which will consist of two existing recommender components. Each component was described in its own blog post in detail, and we refer interested parties to these blog posts and to the forum for further discussions of these tools.
Components like the Stacktrace mining and search engine, or a API Usage bug detector are under development yet and will follow when ready.
Goal of the (code) recommenders project is to build IDE tools like intelligent code completion, extended API docs etc. that continuously improve themselves by leveraging implicit and explicit knowledge about how APIs are used by their clients, and, in turn, give back this information to other developers to ease their work with new and unfamiliar frameworks and development environments.
Current state of the initial contribution is that these systems are fed more or less manually by an administrator that collects example applications from large code repositories like EclipseSource’s Yoxos and then starts the analysis and data extraction process to build new models. This approach may be further automated to leverage the already existing infrastructure of the Eclipse Marketplace and P2 to continuously scan and update API usages and build up-to-date models for the Eclipse APIs.
Unfortunately, such a manual approach does not scale well if potentially thousands of (non-eclipse-based) frameworks should be supported. It is simply too difficult to find enough example applications to make this approach work. Thus, in the long-term this manual data collection process should be replaced by a community-driven approach where users are allowed to voluntary share their knowledge about how use these APIs either by giving explicit or implicit feedback (cf. the position paper about user feedback and information sharing). Clearly, special requirements for privacy have to be met so that no individual’s private or company’s critical data is collected or published. Different models of data sharing have to be developed and discussed with the community.
As one of the first steps, a platform allowing developers to share knowledge will be developed and the existing tools (i.e., intelligent code completion and usage-driven Javadocs) will be based on these concepts as a proof of concept. A community driven approach may follow.
Relationship with other Eclipse projects
The (code) recommenders project so far does not have any relationships to existing Eclipse projects. However, the concepts that will be developed are designed to improve existing IDE services, and thus, are likely to build strong relationships to existing projects. Likely candidates are:
- Eclipse JDT: by improving features like code completion, (API) documentation, code search etc.
- Eclipse PDE: large-scale analyses how clients use framework APIs etc. may help framework developers to gain some feedback about how clients use their APIs and more generally about usability issues of their APIs
- Eclipse Mylyn: builds on innovative concepts like development contexts and links user actions with tools like issue trackers, time tracking and many things more. Leveraging collective intelligence to build new services on top of Mylyn (for instance by leveraging social media or different kinds of user interactions) offer great potential for future collaboration.
This list is necessarily incomplete. However, it sketches the potential how existing projects could benefit from (and cooperate with) the proposed project.
The Code Recommenders project is developed at Darmstadt University of Technology. The project is lead by Marcel Bruch and advised by Mira Mezini. Although the number of initial committers is low, we expect this set to quickly grow. The project itself was supported by more than 50 students doing various hands-on trainings, bachelor and master theses in the past and future contributions will be made directly under the proposed project. Thus, the initial committers will be
- Marcel Bruch, Darmstadt University of Technology (Project Lead)
- Mira Mezini, Darmstadt University of Technology (Project Management)
- Chris Aniszczyk
Companies and individuals likewise are encouraged to express their interest into this project by putting your name and affiliation on the list below.
- Chris Aniszczyk - Red Hat
- Fabian Steeg - University of Cologne
- Benjamin Muskalla - Tasktop
- Sebastian Proksch - TU Darmstadt
- Dennis Sänger - TU Darmstadt
- Beyhan Veliev - EclipseSource