Skip to main content
Jump to: navigation, search

Difference between revisions of "Recommenders/Attic/ContributorTopics"

m (Mining Object Interprocedural Object-Lifecycles)
Line 90: Line 90:
== Mining Object Interprocedural Object-Lifecycles ==
== Mining Interprocedural Object-Lifecycles ==

Revision as of 05:12, 31 May 2011

This page contains a list of potential contributor topics. Some ideas are inspired by existing papers, some are pretty vague and slightly more than a reminder what else might be cool to have at code recommenders. Some ideas may be simple to archive, others may require quite a lot of work and feel more like a 6 months project (e.g., a thesis). If you have any questions regarding any of these topics, please send your comments to the forum or developer mailing list.

"Clean Code" Method Sorter / Code Formatter for Eclipse

In Progress. Mateusz, Fabian; SS 2011

Coding conventions as discussed, for example, in Clean Code recommend that methods should be ordered in the sequence they get called. This drastically improves the readability of your code, since the amount of jumps in code required to understand the code are reduced to a minimum. Manual ordering is an annoying and time consuming task. During this hands-on you will develop an Eclipse Code Formatter that sorts a class' members according to Rober C. Martin's Clean Code style guide.

Developers who looked at this Java Elements frequently also visited these API Elements - Building an API Element Traversal Data Collector

Partially In Progress. Stefan works on the selection listener as part of his GSOC; SS 2011

While developers learn how to solve a specific task they need to learn about the given API. Assuming they found a starting point they have to figure out if there are additional things they must know about or which would help them. This is something every developer will be confronted with again and again. For instance, if a developer is looking at the implementation of Wizard#addPages he might also be interested in looking the methods Wizard#addPage or Wizard#setNeedsProgressMonitor.

But how do you get the information what you should have a look at? Webshops like Amazon follow a pretty simple approach. They just look at how their users interact with their webapp and collect the information what the user looked (i.e., clicked) at and then run some mining techniques on this data to extract valuable data. During this hands-on you will create an Eclipse plug-in that collects the information what API elements developers investigated and publish this information to a central knowledge repository. In a subsequent step this data will be analyzed to find valuable patterns in this data - howerver, this is not part of the hands-on :)

Example Code-Search Engine 4 Eclipse

In Progress. Katrin; SS 2011

Finding good code examples is a difficult task. Code-search engines like Krugle or Google code-search aim to support developers during this task. However, their search capabilities are somewhat limited. Recently, the Code Recommenders project published a first prototype of next generation code search engine tightly integrated into Eclipse.

However, due to copyright and license issues some companies prohibit the use of code-search engines that work on untrusted codebases. Thus, during this hands-on you will design a local, lucene-based code-search engine that works on the local Eclipse installation an workspace only.

Callchain Completion ++

In Progress. Christian, Ingo, SS 2011

Query Builder for Code Search

In Progress. Niklas; SS 2011

In our current implementation of Code Search we use Code Completion as input to build search queries. This is an easy and intuitive way to create queries, but it is limited in flexibility and adjustment of terms. So we think that a view allowing developers to define the elements of the query in an intuitive way will be a necessary supplement.

Additionally it is possible for the client to determine which feature weights should be used for scoring on server side, but as a view is missing to edit those weights this feature is not usable, yet.

Flexible Configuration of Content Assist lists

In Progress. Thomas, Gerrit; SS 2011

The preference page for the Eclipse Completion Engines currently allows you to enable multiple engines on the default tab. But for cycling multiple tabs it is only possible to have one engine on each cycling tab. As Code Recommenders adds multiple engines, we faced the problem that we must aggregate our engines to provide useful configurations to developers. This change is filed and accepted as feature request by the JDT team. There are some adapted screenshots attached to the feature request to give a hint of how this could be implemented:

Synonym based completion engine


When working with unknown APIs developers may use the code completion to search for methods applicable to what they currently want to do. Probably they already have a keyword they are looking for and just search the list of proposed methods matching that one. For example the developer already knows about arrays, but is new to collections and wants to know how many elements are contained in a collection. As known from arrays the developer starts to search for length and types in the prefix “len” to reduce the list of proposed completions. As we know there will be no proposals matching “len” as the method he is searching for is named size in collections. But if the completion engine would allow using synonyms it could propose size knowing that size and length are exchangeable words.

Web-Crawler for stacktraces


We are currently implementing a platform where developers got stuck on an error can search for their stacktrace to get help. Current search engines don’t fit very well with stacktraces. So we set up an optimized search with an index of stacktraces pointing to help resources on the web. To find those stacktraces on the web we need crawlers for all kind of structured resources. We currently have a crawler for bugzilla for example. An interesting type to crawl is a RSS-Feed as lots of discussion and help platforms provide them.

Stacktraces WebUI


The crawler is one part of the stacktrace search engine. While the engine itself as well as an Eclipse prototype is already there (see for details), an intuitive search interface for the web is missing.

Neither technology nor design is fixed. However, there is a preference for Eclipse technologies ;-) Which features the UI should offer warrants further discussion. If you are interested in this just send your comments to the forum or developer mailing list. Check [1].

Subwords Completion Engine

In Progress. Paul-Emmanuel

From Stackoverflow we received a request for a fulltext autocompletion engine. See this blog post and this forum thread for details. A first prototype is implemented but a few features are missing like

  • Relevance Sorting. Sort proposals that match the prefix entered by the user higher in the list than those that match the simple regular expression. A first guess how to implement this may be to use the Jaro-Winkler string distance measure and rank the proposals accordingly.
  • Make it work for any proposal. So far, it works for method call completions only. But it should be straight forward (and useful) to implement this for any kind of proposal.
  • Highlight the characters in the proposal text that matched the regular expression to give developers a quick feedback why this proposal matched.
  • Track what tokens user enter to get the 'right' proposal. Imagine: If you learn what people mean if they write 'pb' or 'opn' you can create an incredible fast code completion engine that seems to read your mind. To see a great example of that check out [2] and [3] :)

Highlighting of related methods


The eclipse editor already introduces highlighting for usages of the selected variable or method in the open document. Think of an extension to this concept that also highlights methods of interest based on similarity to the current selected method. Similarity of methods can be introduced by comparing what methods get called and which fields and types are used. The outline view may highlight related methods by making them bold if they are related to each other - for example, if a user selects a method compress() inside the class, the Outline view may highlight the method decompress() since it is using the same fields.

Automated Detection of "See Also" Tags


Somewhat related to the above task but slightly more general:

The idea has been described in more detail in Api hyperlinking via structural overlap

Frameworks (or "Frameworks of Frameworks" such as Eclipse) provide APIs consisting of hundreds or even thousands of functions. Some of those documentations come up with comprehensive and detailed cross-references for a large number of API functions, e.g., SEE ALSO sections, providing hyperlinks to related functions that accomplish the same or a relevant task. Such informative layout is an effective way to organize the knowledge and help avoid getting lost in the jungle.

However, maintaining these separate documentations is painful and labor-intensive. Tools such as doxygen and javadoc allow developers to manually fill in cross-references (@see tags in Javadoc). However, maintaing for each function a model of all related API functions is tedious, and may quickly become outdated. Consequently such manual cross-references are frequently missing.

Take, for instance, Apache HTTP server as an example. Its source code is fairly well documented: From year 2005 to 2008 the number of documented API functions grows from 1353 to 1461. The number of those method commented with @see, however, grew from 6 to 15! This is accounting for only 0.4%∼1.0% of all API functions.

Long et al. have proposed an approach to automatically identify such cross-references by analyzing the code of the framework itself. In a first step, we would like to learn how well this approach could work for Eclipse related APIs and then improve it based on the feedback we get by the Eclipse Community.

This tool aims to be integrated into the Extended Javadoc Platform for Eclipse.

Mining Interprocedural Object-Lifecycles


Framework reuse provides several benefits such as reduced costs, higher quality, and shorter time to market. By extracting the design into abstract classes and defining their responsibilities and collaborations, frameworks not only enable the reuse of functionality at code level, but also enable reuse at the design level. Applications are build from a framework by extending or customizing parts of the framework, for instance by inheriting from an abstract class and providing implementations for its abstract methods which contain the application specific behavior.

Although implementations of these methods will be provided by application developers, the framework usually has some expectations what kinds of behaviors these methods should exhibit, such as which methods should be called when.

For illustration, consider a dialog window that gathers some user input using a text widget. Typically, the text widget follows a two-phase life-cycle: First, it is configured and placed in a visual container during dialog creation; then, it is queried for the user input after the dialog window is closed. These two phases are typically encoded in different methods of the dialog, say, widget configuration takes place within a method called Dialog.create() and reading the input occurs within Dialog.close() respectively.

This brief example shows one major characteristic of frameworks which complicates framework understanding: Some framework types follow a life-cycle with different phases, where each phase differs from others a) in the set of methods invoked on the framework type, and b) in its locations in code, i.e., the (framework) methods this phase can be encoded within.

Application developers need a good understanding of these life-cycles and the framework's expectations; unless the behaviors of the provided implementations are consistent with the expectations of the framework, the application is unlikely to function properly.

As mentioned above, objects follow a lifecyle and each phase of this lifecycle uses some characteristic methods. However, object can be used in different ways: For instance a button might be create as a checkbox or a plain push-button and consequently, as checkbox the button might be queried for its state before discarding it whereas the push-button might be used within a listern method because it triggers an action but does not hold any interesting state.

The goal of this theses is to (i) identify the different states of frameowrk objects, (ii) use these states to determine typical object-lifecycle, and (iii) visualize and evaluate these lifecycles.

Impact of Neighboring Usages on Object Usage Recommendations


So far Code Recommenders considers the context (i.e., the name of the overridden method the completion occurs in) and which methods are invoked on a given variable. Currently we have an approach in the pipe that evaluates how taking into account how an instance was obtained. (for example, IWorkbenchHelpSystem h = PlatformUI.getWorkbench().getHelpSystem()) affects the precision of our systems.

In a next step we would like to extend this approach to leverage also the information what else a developer did inside the method. For instance, if we observe that the programmer created a Button inside the same method and now wants to use a Text widget, we can conclude that this Text widget needs to be create and configured too.

    private Text filePath;
    private void createInputArea(final Composite parent) {
        final Button browse = new Button(..);
       filePath<^Space> // should evaluate to all templates that recommend object creation... like:
       // filePath = new Text(..);
       // filePath.setLayoutData(..);
       // filePath.setText(..);

With such an approach, we don't need the information of how an object was created to make useful recommendations - it would be sufficient to look on other statements in the code to learn what a developer needs. This approach requires some understanding of how to apply machine learning algorithms like Collaborative Filtering and/or matrix Factorization. The input data is already available.

A preliminary version of this approach has been published here.

Mining Usage Documentation - Argument May/ Must NOT be NULL and Method Return May be /Never is NULL

In many cases it should be quite easy to figure out whether or not a method's arguments are allowed to be null, i.e., the method is able to deal with null values. Writing small static analyses that analyzes this code and enriches the existing API documentation with such an information would be extremely helpful. This information is considered to be integrated into the Extended Javadoc Platform.

Sequence Mining of Object Usage Patterns


Most object usages follow some kind of life-cycle. With it, some methods are not allowed to be called before others, others may be called frequently without any ordering constraints etc. Code Recommender's models do yet not preserve any ordering information. Extending the current approach to support ordering constraints - at least for code templates - would be great.

Back to the top