Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

JDT Core Programmer Guide/MetaIndex

< JDT Core Programmer Guide
Revision as of 14:57, 23 March 2021 by Gayanper.gmail.com (Talk | contribs) (updated the initial draft)

Warning2.png
Draft Content
This page is currently under construction. Community members are encouraged to maintain the page, and make sure the information is accurate.


Introduction

The new MetaIndex was introduced to reduce the time it takes to build the TypeHierarchy of java types.


Problem

Today the when building the TypeHierarchy then IndexBasedHierarchyBuilder perform a recursive search to find subtypes which the number of iterations are equal to the number of sub types found.

So in a workspace where you have 5000 indexes this could lead to a very long response time if you sub type tree is also large. For example i have a workspace which has a TypeHierarchy as follows

Interface1 |-- Interface2 |-- Class2 |-- Interface3 |-- Class3 |-- Interface4 |-- Class4 |-- Interface5 |-- Class5 |-- Interface6 |-- Class6

All these classes and interfaces are in the same project. And this workspace has nearly 1000 indexes, try to build this TypeHierarchy takes around 15 sec. Looking closer at the profiling snapshots shown that we are performing 11 pattern searches on all 1000 indexes.


Solution

The solution we look at is to reduce the number of indexes that are search in this 11 pattern searches or 11 iterations. in this particular example we should be only searching in the index document for this project. But how does the IndexBasedHierarchyBuilder select the correct indexes.

To address this we looked at where we should filter the indexes and we choose to do that at IndexSelector.getIndexLocations. Then we end up thinking what should be out criteria for this index selection and how we could find these indexes. Because we cannot search all indexes to find its eligibility.

To support this we decided to introduce a new MetaIndex which index all index document names against a qualification criteria which can be used for filter indexes, and we decided to use java package names as the qualification value. So we ended up updating the MetaIndex as follows

Category Key ContainerPath
TYPE_INDEX_Q java.lang.util 1234634.index
TYPE_INDEX_Q java.lang.util 12348756.index
TYPE_INDEX_Q com.google.guava 1234634.index

When performing a type hierarchy build we will send the package of the type in each iteration into SearchPattern.indexQualifier and use that value to perform a search in MetaIndex to find index names that needs to be included in current iteration at IndexSelector.getIndexLocations and filter out the index locations.

To support backward compatibility, when performing this MetaIndex search at IndexManager.findMatchingIndexNames(indexQualifier) we check if there are any indexes which are not part of the MetaIndex and include them by default. This handles

  • When the MetaIndex is not created in the workspace
  • When there are indexes which are not included in the MetaIndex
  • When there are prebuilt indexes

How this MetaIndex is updated

We have modified all Indexer implementation to update package usages of each class and compilation unit each indexer scan into the index it self with a new key TYPE_INDEX_Q, this keep track of all packages used as

  • Imports
  • Fully qualified type references
  • Current package name of the class or compilation unit

The capturing of package names are done in BinaryIndexes by looking at fully qualified reference names and extracting the package name from them. Basically the reference is read until the last '.' and use the value as package name. This is not a problem in class files the nested types are seperated by $. Following are the place where the package usages are extracted from

  • Class Declarations (superclass, superinterfaces)
  • Enum Declarations (superinterfaces)
  • Interface Declarations (superinterfaces)
  • Annotation Declarations (superclass)


The capturing of package names are done in SourceIndexer little bit differently. The steps are as follows

  • Extract package names from all none * imports
  • Extract package names from
    • Class Declarations (superclass, superinterfaces)
    • Enum Declarations (superinterfaces)
    • Interface Declarations (superinterfaces)
    • Annotation Declarations (superclass)

if they are qualified names, otherwise we check if the class name was available in the imports we processed, if not we collect them to be resolved later. Once we have scanned the whole compilation unit, we will try to do a diet resolving of the classes we didn't found neither as fully qualified nor in imports.

We do this diet resolving by try to match these classes in current project's package fragment roots. This is implemented at SourceIndexerRequestor.performDietResolution(), this will handle default package types, types from java.lang package as well. After performing this if we still find type which are not resolved, then that mean we have some types which might be part of inheriting hierarchy like


ProjectA
in package p1:
   class Couter {
        protected class Inner { ... }
   }
in package p2:
   class Middle extends p1.Couter {}

ProjectB
in package p3:
   class C extends p2.Middle {
        class I extends Inner {}
   }

So to resolve this Inner class usage we need to resolve the type bindings of the compilation unit, therefore we mark the source document as it needs to be IndexingResolvedDocument. Before introducing this the only criteria for IndexingResolvedDocument was when there are lambda expressions.

When doing this we found that it has an overhead on the Indexing because resolving type bindings takes the considerable amount of time. So when looking closer at this we found that most of the type binding resolution time is spend at IndexBasedJavaSearchEnvironment.create(List<IJavaProject>, ICompilationUnit[]), and also we so that this environment is create for each source file which needs to be resolved even though the source files might be from the same project. So to reduce creating this INameEnvironment repeated we introduce a LRU cache at IndexNameEnviromentCache, which is cleared at the end of indexing and also project entries are clear as soon as the project change is detected. At a glance this could be seen as a increase in memory, but since the LRU cache use SoftReferences the heavy INameEnvironment objects will be GC when they are not accessed. Also this reduce the overhead for the GC cleaning up large INameEnvironment objects which was created repeated when every you need resolution of the compilation unit.


Results

Some search results on the Type Hierarchy build are as follows

Thread[ModalContext,6,main] -> execution time: 4897ms -IndexBasedHierarchyBuilder (Without Fix)
Thread[ModalContext,6,main] -> execution time: 673ms - IndexBasedHierarchyBuilder (With Fix)

TODO: add index time and index sizes


Future

There are discussions at https://bugs.eclipse.org/bugs/show_bug.cgi?id=570078 if we should extend this MetaIndex for other kind of search patterns as well. But the initial investigations we found that it might impact the index times heavily since we need to resolve compilation units to find more information and also in binary indexes we end up in resolving binary types. One such example is to implement "find declaration of a method" which include override declarations as well. So right now we think since those search operations are linear they could be benefit by the new parallel search instead of this MetaIndex.

Back to the top