Difference between revisions of "Eclipse search plugin: providing a better, faster, more relevant Eclipse search."
(New page: '''Student''': Çağatay Çallı (IRC: kynes) '''Mentor''': Francois Granade (IRC: farialima) This project is part of the [http://wiki.eclipse.org/index.php/Google_Summer_of_Code_2007|Go...)
|Line 102:||Line 102:|
== Secondary Goals ==
== Secondary Goals ==
Enabling support for online code repositories ([http://www.google.com/codesearch Google Code Search], [http://www.koders.com/ Koders] etc.)
Enabling support for searching (indexed) CVS repositories
Revision as of 15:54, 28 May 2007
Student: Çağatay Çallı (IRC: kynes)
Mentor: Francois Granade (IRC: farialima)
This project is part of the Summer of Code 2007
One of the most frequently used features of an IDE is find/replace and search features. Even the earliest source code editors focused on the importance of decreasing the search time and increasing relevance of the results. Even though the search facilities, especially find/replace features in Eclipse are adequate for most developers, there are still plenty of usability issues and enhancement requests arising in Bugzilla. Moreover, search performance in large projects seems inefficient without making use of indexing. Searching in all the files individually while trying to execute a search query on a project that contain a large amount of static files seems meaningless. Currently search feature in Eclipse Help component contains an indexing approach that works by indexing the help pages in the first query and then serving other queries from the index, improving the response time significantly. Apache Lucene is used for implementing this approach. Considering relevance of the search results, projects such as OpenGrok that offer a good source code search and cross-referencing engine exist.
Solving usability issues
Initial Ideas about Usability
1 - Improve Ctrl-J search:
- Ctrl-J : have Ctrl-V paste the current ring in it
- Ctrl-J : if there's a selection, have a second Ctrl-J take this selection as search string
- Ctrl-J followed by Ctl-R would switch to regex search
- Ctrl-J to global search: transform a Ctrl-J to to a global search actually, drive all searches from the keyboard incremental search would be good: there should be a set of keyboard shortcuts on Ctrl-J to set up a complete search, and then execute it globally.
2 - Display search results. the view that displays the search results is very poor currently. Probably a tree view is not perfect. It should show the matches, for example. It should have more shortcuts...
3 - Improve the search dialog. More TBD
4 - Real search/replace:
- search and replace in multiple files (all files/subset of files)
- selective (based on language elements...)
Bugzilla Reports Related with Usability
1. find/replace search should have a tick-box for "ignore comments" (enhancement request) bug 161398
2. Show as package tree bug 160481 (enhancement request)
3. Store Previous Searches for Startup (enhancement request) bug 169252
4. Search dialog "Java Search" Scope should include Hierarchy (WONTFIX bug) bug 110252
5. New text search shown line not a great help (LATER) bug 127672 (This bug is waiting for applying styles on a tree item.)
6. Result in Table (LATER) bug 129185
These two are related:
7. Search shows duplicate results in nested projects (LATER) bug 144959
8. Resource exclusion filters bug 84988
(read together with: bug 144959 - about duplicate results for nested folder structures)
9. Search Enhancements bug 108223
10. Search in files: see matched lines https://bugs.eclipse.org/bugs/show_bug.cgi?id=72575
Integrating an indexing engine like Lucene or OpenGrok into Eclipse search
"I tried to keep myself focused on research at the beginning so I read about both Lucene and OpenGrok. I understood how OpenGrok achieved to be a great tool for source code browsing and cross-referencing. Since it makes use of Lucene, either way (if I use OpenGrok or not) Lucene is a building block of this project. I've read a few articles and browsed a few presentations about Lucene and I'm still trying to learn more about it.
I've been consistently asking questions to myself about index sizes and frequent updates to the index. As a basic solution, I thought of keeping a partitioned index ( e.g. A project having an index that is distributed to multiple Lucene indexes and these index files being accessed/changed when it's necessary. Shortly, we may call this partitioning the index file. ).
This absolutely has a tradeoff because of the cost of not using one index for the whole project. But, when a good number of partitions is selected, I think that the tradeoff will seem small. (I thought of computing the ideal number in the future, like the computation of ideal bucket size in file processing literature.)
I thought of this “partitioning” method because an Eclipse user can get very aggressive when we tell him that we can “linearize” the time to search his files but in order to do this, he has to keep like ~ 50 MB for the index in RAM. ( I've tried indexing different things and I think that this value is a good estimate for large projects. And by the way, a possible question is why do I have to keep things in RAM and do not write things to disc. One answer: dynamic nature of source codes and frequent updates. )
Partitioning approach gives us one advantage: we only have to index a part of the project when you are working on a set of files. Since the number of active editors is considerably less than the actual project size, it is an idea that can work. And since we can keep a relatively small index in RAM, indexes containing your active file set can be updated fast.
However, I still see other tradeoffs in this approach and it is just a start. By the way, I'm trying not to focus on algorithm-wise enhancements for now and I don't plan to implement this idea in my prototypes during the early days of coding. I plan to start coding by addressing usability enhancements first.
I think early search plugin prototypes will makes use of RAM – HDD data transition features of Lucene and just write the index to a file on the disc after a certain threshold."