Jump to: navigation, search

Recommenders/Attic/Proposal

(Code) Recommenders Project Proposal

This page contains the working draft of the Eclipse (Code) Recommenders Project proposal, and thus appreciates your comments. Feel free to put your comments on everthing you think that needs a revision. You may use a comment format similar to this to make it easy to spot these locations: {marcel: paragraph needs major revision. details about the component x aren't described sufficiently. I'm missing ...}. We would be glad if you put your name on the "Interested Parties" section if you like to support this project proposal :-) If wiki comments aren't appropriate you may send an email to bruch@cs.tu-darmstadt.de.

Thanks, Marcel


Introduction

This proposal is in the Project Proposal Phase (as defined in the Eclipse Development Process document) and is written to declare the intent and scope of the proposed (code) recommenders project. This proposal is written to solicit additional participation and input from the Eclipse community. If you have any comments on the project or want to join the project please send your feedback to <url> forum.

This proposal is structured as follows. Section “Background” gives the motivation of the project and provides some background information about the origins of the proposed project, namely, the Code Recommenders Project developed at Darmstadt University of Technology. Section “Initial Contributions” describes the current state of the project and the initial contributions that will be made. Section “Scope” outlines what’s in the project’s scope; "Description" gives little more details on the intermediate goals. Section "Related Eclipse Projects" describes potential future connections between current Eclipse Projects and the Code Recommenders project as well as likely collaborations. The remaining sections (Committers, Mentors, Interested Parties, Additional Information) describe what their names suggest.

Background

Under the right circumstances, groups are remarkably intelligent and are often better than the smartest person in them.    - James Surowiecki: Wisdom of the Crowds


Application frameworks have become an integral part of today’s software development—this is hardly surprising given their promised benefits such as reduced costs, higher quality, and shorter time to market. But using an application framework is not free of cost. Before frameworks can be used efficiently, software developers have to learn their correct usage which often results in high initial training costs.

To reduce these training costs, framework developers provide diverse documentation addressing different information needs. Tutorials, for instance, describe typical usage scenarios, and thus give the application developer an initial insight into the workings of the framework. However, their benefit quickly disappears when problems have to be solved that differ from standard usage scenarios. Now, API documentation becomes the most important resource for software developers. Documentation is scanned for hints relevant for the own problem at hand but if it does not provide the required information, the most costly part of the research begins: The source code of other programs is investigated that successfully used the framework in a similar way. But learning correct framework usage from these real-world examples is difficult. The problem with these examples is that they also contain application-specific code that obscures the view on what is really important for using the framework. This significantly complicates the understanding process which makes the training a challenging and time-consuming task again. However, source code of other applications seems to be a valuable source of information. Code-search engines like Google Codesearch or Krugle experience their hype not least because existing framework documentation seems insufficient to support developers on their daily work.

But despite their widespread use, it’s an open question whether code-search engines solve the problem of missing documentation in a satisfactory manner. When looking at how developers use code-search engines [cite holmes 2009], it turns out that they rarely create a single query and study just a single example; instead, they typically refine their queries several times, investigate a number of examples, compare them to each other and try to extract a pattern that underlies all these examples, i.e., a common way how to use the API in question.

Although this task is very time-consuming, analyzing example code seems worth doing. Apparently, example code must provide some important insights in how to use a given API. Given this observation, the question is raised whether such important information can be extracted from example code automatically, i.e., without large manual effort. And furthermore if valuable information can be found, how can these findings made accessible to support developers on their daily work.

The Code Recommenders project developed at Darmstadt University of Technology investigates exactly these two questions. In a nutshell, tools are developed that automatically analyze large-scale code repositories, extract various interesting data from it and integrate this information back into the IDE where it is reused by developers on their daily work. The vision of the project is to create a context-sensitive IDE that learns from what is relevant in a given code situation from its users and, in turn, give back this knowledge to other users. If you like, you may think of it like a collaborative way of sharing knowledge over the IDE.

This Eclipse proposal is the next step towards the goal to build next generation of collaborative IDE services, which we call “the IDE 2.0” – inspired by the success of Web 2.0. The complete vision and explanation of the IDE 2.0 to web 2.0 analogy is described in “IDE 2.0: Collective Intelligence in Software Development” published at the Working Conference on the Future of Software Engineering Research (FoSER) 2010”.

Scope

A couple of steps towards IDE 2.0 have been accomplished, some of which we will describe briefly in Section “Initial Contributions”. However, context-sensitive developer assistance can occur in many ways and this project aims to (i) provide a platform for innovative IDE features that leverage the wisdom of the crowds, (ii) build a very vibrant community around IDE 2.0 services based on Eclipse, and (iii) provide an open-data access platform allowing every community member to actively contribute to these services and to build and evaluate new tools based on the data contributed by the community itself.

Although a bit vague, we believe that this describes the scope of the project in sufficient detail to keep itself open minded for any inputs coming from the community, meaning that no further limitations beyond leveraging collective intelligence are planned yet.

Initial Contribution

The project starts with the initial contributions developed at Darmstadt University of Technology. However, we have to point out (and as also outlined in the IDE 2.0 position paper) that there are dozens of projects that leverage collective intelligence in one way or the other, making this project a perfect place for these tools to contribute to Eclipse and to evaluate their approaches within a vibrant user community.

However, every Eclipse incubator project has to start with an initial contribution which will consist of three existing Code Recommenders components. Each component was described in its own blog post in detail, and we refer interested parties to these blog posts and to the forum for further discussions of these tools.

  1. Intelligent Code Completion
  2. Extended, usage-driven Javadoc
  3. Self-improving Example Code Search Engine

Other components like a Stacktrace mining and search engine, or a API Usage bug detector are planned but under development yet. These systems are will follow when ready.

Description

Goal of the (code) recommenders project is to build IDE tools like intelligent code completion, extended API docs etc. that continuously improve themselves by leveraging implicit and explicit knowledge about how APIs are used by their clients, and, in turn, give back this information to other developers to ease their work with new and unfamiliar frameworks and development environments.

Current state of the initial contribution is that these systems are fed more or less manually by an administrator that collects example applications from large code repositories like EclipseSource’s Yoxos and then starts the analysis and data extraction process to build new models. This approach may be further automated to leverage the already existing infrastructure of the Eclipse Marketplace and P2 to continuously scan and update API usages and build up-to-date models for the Eclipse APIs.

Unfortunately, such a manual approach does not scale well if potentially thousands of (non-eclipse-based) frameworks should be supported. It is simply too difficult to find enough example applications to make this approach work. Thus, in the long-term this manual data collection process will be replaced by a community-driven approach where users are allowed to voluntary share their knowledge about how use these APIs either by explicitly or implicit feedback (cf. the position paper about user feedback and information sharing). Clearly, special requirements for privacy have to be met so that no individual’s private or company’s critical data is collected or published. Different models of data sharing have to be developed and discussed with the community.

As one of the first steps, such a community-driven, fully automated approach will be developed and the existing tools (i.e., intelligent code completion and usage-driven Javadocs) will be based on these concepts as a proof of concept.

Relationship with other Eclipse projects

The (code) recommenders project so far does not have any relationships to existing Eclipse projects. However, the concepts that will be developed are designed to improve existing IDE services, and thus, are likely to build strong relationships to existing projects. Likely candidates are:

  • Eclipse JDT: by improving features like code completion, (API) documentation, code search etc.
  • Eclipse PDE: large-scale analyses how clients use framework APIs etc. may help framework developers to gain some feedback about how clients use their APIs and and more generally about usability issues of their APIs
  • Eclipse Mylyn: builds on innovative concepts like development contexts and links user actions with tools like issue trackers, time tracking and many things more. Leveraging collective intelligence to build new services on top of Mylyn (for instance by leveraging social media or different kinds of user interactions) offer great potential for future collaboration.

This list is necessarily incomplete. However, it sketches the potential how existing projects could benefit from (and cooperate with) the proposed project.

Committers

The Code Recommenders project is developed at Darmstadt University of Technology. The project is lead by Marcel Bruch and advised by Mira Mezini. Although the number of initial committers is low, we expect this set to quickly grow. The project itself was supported by more than 50 students doing various hands-on trainings, bachelor and master theses in the past and future contributions will be made directly under the proposed project. Thus, the initial committers will be

  • Marcel Bruch, Darmstadt University of Technology (Project Lead)
  • Mira Mezini, Darmstadt University of Technology (Project Management)

Mentors

  • ---
  • ---


Interested Parties

Companies and individuals likewise are encouraged to express their interest into this project by putting your name and affiliation on the list below.

  1. <your name> – <your affiliation if appropriate>

Additional information