Jump to: navigation, search

Difference between revisions of "Recommenders/Attic/ContributorTopics"

(Removing all content from page)
Line 1: Line 1:
This page contains a list of potential contributor topics. Some ideas are inspired by existing papers, some are pretty vague and slightly more than a reminder what else might be cool to have at code recommenders. Some ideas may be simple to archive, others may require quite a lot of work and feel more like a 6 months project (e.g., a thesis). If you have any questions regarding any of these topics, please send your comments to the [http://wiki.eclipse.org/Recommenders/Contributor_Guide#Forum forum] or [http://wiki.eclipse.org/Recommenders/Contributor_Guide#Mailing_List developer mailing list].
== "Clean Code" Method Sorter / Code Formatter for Eclipse ==
Coding conventions as discussed, for example, in Clean Code recommend that methods should be ordered in the sequence they get called. This drastically improves the readability of your code, since the amount of jumps in code required to understand the code are reduced to a minimum. Manual ordering is an annoying and time consuming task. During this hands-on you will develop an Eclipse Code Formatter that sorts a class' members according to Rober C. Martin's Clean Code style guide.
== Developers who looked at this Java Elements frequently also visited these API Elements - Building an API Element Traversal Data Collector ==
While developers learn how to solve a specific task they need to learn about the given API. Assuming they found a starting point they have to figure out if there are additional things they must know about or which would help them. This is something every developer will be confronted with again and again. For instance, if a developer is looking at the implementation of Wizard#addPages he might also be interested in looking the methods Wizard#addPage or Wizard#setNeedsProgressMonitor.
But how do you get the information what you should have a look at? Webshops like Amazon follow a pretty simple approach. They just look at how their users interact with their webapp and collect the information what the user looked (i.e., clicked) at and then run some mining techniques on this data to extract valuable data. During this hands-on you will create an Eclipse plug-in that collects the information what API elements developers investigated and publish this information to a central knowledge repository. In a subsequent step this data will be analyzed to find valuable patterns in this data - howerver, this is not part of the hands-on :)
== Example Code-Search Engine 4 Eclipse ==
''In Progress. Tobias; WS 2011''
Finding good code examples is a difficult task. Code-search engines like Krugle or Google code-search aim to support developers during this task. However, their search capabilities are somewhat limited. Recently, the Code Recommenders project published a first prototype of next generation code search engine tightly integrated into Eclipse.
However, due to copyright and license issues some companies prohibit the use of code-search engines that work on untrusted codebases. Thus, during this hands-on you will design a local, lucene-based  code-search engine that works on the local Eclipse installation an workspace only.
== Callchain Completion ++==
In progress. Gerrit.
== Query Builder for Code Search ==
''In Progress. Tobias; WS 2011''
In our current implementation of Code Search we use Code Completion as input to build search queries. This is an easy and intuitive way to create queries, but it is limited in flexibility and adjustment of terms. So we think that a view allowing developers to define the elements of the query in an intuitive way will be a necessary supplement.
Additionally it is possible for the client to determine which feature weights should be used for scoring on server side, but as a view is missing to edit those weights this feature is not usable, yet.
== Flexible Configuration of Content Assist lists ==
'Done. Waiting for Eclipse JDT. Thomas, Gerrit; SS 2011''
The preference page for the Eclipse Completion Engines currently allows you to enable multiple engines on the ''default'' tab. But for cycling multiple tabs it is only possible to have one engine on each cycling tab. As Code Recommenders adds multiple engines, we faced the problem that we must aggregate our engines to provide useful configurations to developers. This change is filed and accepted as feature request by the JDT team. There are some adapted screenshots attached to the feature request to give a hint of how this could be implemented: https://bugs.eclipse.org/bugs/show_bug.cgi?id=340876
== Synonym based completion engine ==
When working with unknown APIs developers may use the code completion to search for methods applicable to what they currently want to do. Probably they already have a keyword they are looking for and just search the list of proposed methods matching that one. For example the developer already knows about arrays, but is new to collections and wants to know how many elements are contained in a collection. As known from arrays the developer starts to search for length and types in the prefix “len” to reduce the list of proposed completions. As we know there will be no proposals matching “len” as the method he is searching for is named size in collections. But if the completion engine would allow using synonyms it could propose size knowing that size and length are exchangeable words.
== Web-Crawler for stacktraces ==
We are currently implementing a platform where developers got stuck on an error can search for their stacktrace to get help. Current search engines don’t fit very well with stacktraces. So we set up an optimized search with an index of stacktraces pointing to help resources on the web. To find those stacktraces on the web we need crawlers for all kind of structured resources. We currently have a crawler for bugzilla for example. An interesting type to crawl is a RSS-Feed as lots of discussion and help platforms provide them.
== Stacktraces WebUI ==
The crawler is one part of the stacktrace search engine. While the engine itself as well as an Eclipse prototype is already there (see http://code-recommenders.blogspot.com/2011/05/oh-stacktrace-my-stacktrace.html for details), an intuitive search interface for the web is missing.
Neither technology nor design is fixed. However, there is a preference for Eclipse technologies ;-) Which features the UI should offer warrants further discussion. If you are interested in this just send your comments to the forum or developer mailing list. Check [http://wiki.eclipse.org/Recommenders/Contributor_Guide#Get_in_Contact].
== Subwords Completion Engine ==
''In Progress. Paul-Emmanuel''
From Stackoverflow we received a request for a fulltext autocompletion engine. See [http://code-recommenders.blogspot.com/2011/05/subword-matching-completion-engine-for.html this blog post] and [http://www.eclipse.org/forums/index.php/t/209269/ this forum thread] for details. A first prototype is implemented but a few features are missing like
* Relevance Sorting. Sort proposals that match the prefix entered by the user higher in the list than those that match the simple regular expression. A first guess how to implement this may be to use the [http://en.wikipedia.org/wiki/Jaro–Winkler_distance Jaro-Winkler string distance measure] and rank the proposals accordingly.
* Make it work for any proposal. So far, it works for method call completions only. But it should be straight forward (and useful) to implement this for any kind of proposal.
* Highlight the characters in the proposal text that matched the regular expression to give developers a quick feedback why this proposal matched.
* Track what tokens user enter to get the 'right' proposal. Imagine: If you learn what people mean if they write 'pb' or 'opn' you can create an incredible fast code completion engine that seems to read your mind. To see a great example of that check out [http://vimeo.com/11664433] and [http://vimeo.com/19369928] :)
== Highlighting of related methods ==
The eclipse editor already introduces highlighting for usages of the selected variable or method in the open document. Think of an extension to this concept that also highlights methods of interest based on similarity to the current selected method. Similarity of methods can be introduced by comparing what methods get called and which fields and types are used. The outline view may highlight related methods by making them bold if they are related to each other - for example, if a user selects a method compress() inside the class, the Outline view may highlight the method decompress() since it is using the same fields.
== Automated Detection of "See Also" Tags ==
Somewhat related to the above task but slightly more general:
The idea has been described in more detail in [http://portal.acm.org/citation.cfm?doid=1595696.1595727 Api hyperlinking via structural overlap]
Frameworks (or "Frameworks of Frameworks" such as Eclipse) provide APIs consisting of hundreds or even thousands of functions. Some of those documentations come up with comprehensive and detailed cross-references for a large number of API functions, e.g., SEE ALSO sections, providing hyperlinks to related functions that accomplish the same or a relevant task. Such informative layout is an effective way to organize the knowledge and help avoid getting lost in the jungle.
However, maintaining these separate documentations is painful and labor-intensive. Tools such as doxygen and javadoc allow developers to manually fill in cross-references (@see tags in Javadoc). However, maintaing for each function a model of all related API functions is tedious, and may quickly become outdated. Consequently such manual cross-references are frequently missing.
Take, for instance,  Apache HTTP server as an example. Its source code is fairly well documented: From year 2005 to 2008 the number of documented API functions grows from 1353 to 1461. The number of those method commented with @see, however, grew from 6 to 15! This is accounting for only 0.4%∼1.0% of all API functions.
Long et al. have proposed an approach to automatically identify such cross-references by analyzing the code of the framework itself. In a first step, we would like to learn how well this approach could work for Eclipse related APIs and then improve it based on the feedback we get by the Eclipse Community.
This tool aims to be integrated into the [[Recommenders/ExtDoc | Extended Javadoc Platform for Eclipse]].
== Mining Interprocedural Object-Lifecycles ==
Framework reuse provides several benefits such as reduced costs, higher quality, and shorter time to market. By extracting the design into abstract classes and defining their responsibilities and collaborations, frameworks not only enable the reuse of functionality at code level, but also enable reuse at the design level. Applications are build from a framework by extending or customizing parts of the framework, for instance by inheriting from an abstract class and providing implementations for its abstract methods which contain the application specific behavior.
Although implementations of these methods will be provided by application developers, the framework usually has some expectations what kinds of behaviors these methods should exhibit, such as which methods should be called when.
For illustration, consider a dialog window that gathers some user input using a text widget. Typically, the text widget follows a two-phase life-cycle: First, it is configured and placed in a visual container during dialog creation; then, it is queried for the user input after the dialog window is closed. These two phases are typically encoded in different methods of the dialog, say, widget configuration takes place within a method called Dialog.create() and reading the input occurs within Dialog.close() respectively.
This brief example shows one major characteristic of frameworks which complicates framework understanding: Some framework types follow a life-cycle with different phases, where each phase differs from others a) in the set of methods invoked on the framework type, and b) in its locations in code, i.e., the (framework) methods this phase can be encoded within.
Application developers need a good understanding of these life-cycles and the framework's expectations; unless the behaviors of the provided implementations are consistent with the expectations of the framework, the application is unlikely to function properly.
As mentioned above, objects follow a lifecyle and each phase of this lifecycle uses some characteristic methods. However, object can be used in different ways: For instance a button might be create as a checkbox or a plain push-button and consequently, as checkbox the button might be queried for its state before discarding it whereas the push-button might be used within a listern method because it triggers an action but does not hold any interesting state.
The goal of this theses is to (i) identify the different states of frameowrk objects, (ii) use these states to determine typical object-lifecycle, and (iii) visualize and evaluate these lifecycles.
== Impact of Neighboring Usages on Object Usage Recommendations ==
So far Code Recommenders considers the context (i.e., the name of the overridden method the completion occurs in) and which methods are invoked on a given variable. Currently we have an approach in the pipe that evaluates how taking into account how an instance was obtained. (for example, IWorkbenchHelpSystem h = PlatformUI.getWorkbench().getHelpSystem()) affects the  precision of our systems.
In a next step we would like to extend this approach to leverage also the information ''what else'' a developer did inside the method. For instance, if we observe that the programmer created a Button inside the same method and now wants to use a Text widget, we can conclude that this Text widget needs to be create and configured too.
<source lang="java">
    private Text filePath;
    private void createInputArea(final Composite parent) {
        final Button browse = new Button(..);
      filePath<^Space> // should evaluate to all templates that recommend object creation... like:
      // filePath = new Text(..);
      // filePath.setLayoutData(..);
      // filePath.setText(..);
With such an approach, we don't need the information of how an object was created to make useful recommendations - it would be sufficient to look on other statements in the code to learn what a developer needs. This approach requires some understanding of how to apply machine learning algorithms like Collaborative Filtering and/or matrix Factorization.  The input data is already available.
A preliminary version of this approach has been published [http://portal.acm.org/citation.cfm?id=1639775 here].
== Mining Usage Documentation: Can handle Null or not ... that's the open questions... ==
In many cases it should be quite easy to figure out whether or not a method's arguments are allowed to be null, i.e., the method is able to deal with null values. Writing small static analyses that analyzes this code and enriches the existing API documentation with such an information would be extremely helpful. This information is considered to be integrated into the Extended Javadoc Platform.
The challenge here will be to create an interprocedural model that allows simple propagation of "can handle null" or "may return null" over several methods. For instance, a method may call 3 other methods and passes some null arguments to them. Will they fail? if so, that the calling method will do so too. Building such models is far from trivial and we expect some interesting insights into how developers design their APIs
== Sequence Mining of Object Usage Patterns ==
Most object usages follow some kind of life-cycle. With it, some methods are not allowed to be called before others, others may be called frequently without any ordering constraints etc. Code Recommender's models do yet not preserve any ordering information. Extending the current approach to support ordering constraints - at least for code templates -  would be great.

Latest revision as of 02:29, 26 September 2013