Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "JDT Core Programmer Guide/Completion"
(→Source code range: strategy after bug 539685) |
(removed "WIP conceptualization" after integration into the main text) |
||
Line 145: | Line 145: | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
[[Category:JDT]] | [[Category:JDT]] |
Revision as of 17:34, 25 April 2021
Contents
General approach
While completion is a monster full of special case heuristics, let's start by explaining the general idea:
- The main driver is class
CompletionEngine
which acts as a variant of the compiler using these components:- a regular
LookupEnvironment
for anything related to resolving - a special parser:
CompletionParser
- a regular
- In many situations the completion parser will create one specialized ASTNode as soon as it hits the cursor location: one from the family of
CompletionOn*
classes. - Ideally, when resolving (
this.lookupEnvironment.completeTypeBindings(compilationUnit, true);
) hits the completion node, it will throwCompletionNodeFound
containing what information was available during resolving. - Based on data in that exception and depending on the kind of completion node a number of searches are performed, see the long list of
CompletionEngine.find*
methods - When suitable things are found, one
InternalCompletionProposal
is created for each, which again collect a lot of useful information:-
#completion
the string to be proposed - source positions
-
#relevance
ranks the proposals so the most relevant can be listed on top - much more information for use by JDT/UI or other adopting tools, to support display, filtering, and applying each proposal in a semantically useful way.
-
Going deeper the big monster can be sub-divided into to sub-monsters
-
CompletionParser
(together withCompletionScanner
) (6 KLOC + 1 KLOC):
- This component is responsible for trying to make sense of the current source code at and around the cursor location in a purely syntactical sense. Here one of the main challenges is to work around the fact that a regular parser would commonly just reject the input due to syntax errors -- we are by definition in the middle of editing, so syntax errors are the norm.
-
CompletionEngine
(>13 KLOC in one file).
- This guy is responsible for all semantic aspects, like figuring out all types involved, finding available members, and performing special matching strategies like camel case, subword etc. While we have a
MissingTypesGuesser
right in the engine, more guessing business happens in JDT/UI (e.g.,ParameterGuessingProposal
).
Where information is maintained
When debugging code completion, it is essential to first understand the basic working of the Parser, and in particular its various stacks. Also the story of RecoveredElement currentElement
is of great importance for completion parsing.
Additionally, the CompletionParser maintains its own data, which partly overlaps with the regular parser state.
assistNode will hold the node on which assist was invoked. It should be one of the CompletionOn*
family of classes, which are instantiated by various overrides consume*
in CompletionParser.
assistNodeParent is used in lots of places of CompletionEngine to infer additional context information. Typically this will be the directly enclosing node, a parameterized type reference when completing on a type argument, or an enclosing statement when completing on an expression etc.
enclosingNode is another ancestor node of assistNode, but with specific purpose:
- for specific casted-receiver proposals, enclosingNode will hold the enclosing IfStatement(s) (or AND_AND_Statements after bug 539685), with instanceof expressions that narrow down the actual type of a receiver.
- to determine an expected type inside a variable declaration or return statement, or in provides statements (it is also set for uses statements where it appears to be unused).
Additionally, the CompletionParser maintains yet another set of stacks all filled in lock-step as controlled by elementPtr (all inherited from AssistParser
):
- elementKindStack holds constants of the set
K_BLOCK_DELIMITER
...K_YIELD_KEYWORD
remembering what tokens have already been seen. This semantically overlaps with the automaton stack, but since the automaton stack uses generated constants that should not be interpreted by hand-written code, this addition stack is very useful for inspecting the syntactical context - elementInfoStack for each "stack frame" addressed by elementPtr, this stack holds additional information, specific to the element kind. See constants starting at
CompletionParser#IF
. Unfortunately, it has not been documented precisely, which constants are used for which element kind. Some flags also seem to be used for different kinds alike. - elementObjectInfoStack is the most tricky of these stacks, as it may hold on to actual ast nodes which may not be reachable otherwise, as these may survive a parser restart during recovery. See call hierarchy of
pushOnElementStack()
for who's putting what on this stack.
To summarize, the CompletionParser has four kinds of sources for ast nodes:
the regular astStack, expressionStack ... family of stacks | the top result of which is provided as the result of parse()
|
the individual nodes assistNode, assistNodeParent and enclosingNode | directly accessible to CompletionEngine |
the currentElement 'cursor tree' | internal to the parser, i.e., it needs to be materialized into astStack before parse() terminates
|
nodes on the elementObjectInfoStack (with interpretation hints in the sibling stacks). | same as above |
Source code range
Steps towards today's solution
- The initial strategy of
CompletionParser
was to restrict the source code range to [method.bodyStart, cursorLocation], i.e, when the scanner hits thecursorLocation
it would answerTokenNameEOF
.- This made the parser totally ignorant to any potential syntax errors after the cursor location
- OTOH it required significant machinery in and around
attachOrphanCompletionNode()
to wrap up some parts that haven't yet made it into the mainastStack
.
- With the introduction of lambdas this was no longer sufficient and so bug 423987 implemented an option to extend the source when we discover we need more.
- Later this proved problematic because at the time the source range was extended, the observed EOF had already muddled with some internal state / stacks.
- In bug 473654 first experiments to avoid setting EOF to cursorLocation caused regressions, so at that time the experiment was abandoned.
- Since bug 539685 the strategy has been reversed indeed: we always start with a range corresponding to the entire method body.
- The new method
CompletionParser.fetchNextToken()
will only ever answer EOF at or aftercursorLocation
when it's "safe" to do so.
- The new method
Unfortunately, a few other locations still manipulate Scanner.eofPosition
:
-
CompletionParser.consumeToken()
sets EOF for specific situations of fields / field initializers. -- it would be great if this could be solved in some other way. - The above is conditionally compensated inside
AssistParser.fallBackToSpringForward()
- A few more locations could be listed here, but those seem to be of less significance.
As a fine point in the implementation of CompletionParser.fetchNextToken()
conditionally delay EOF using the following strategy:
- If no lambda is in sight, answer EOF directly at cursorLocation or as the next following token.
- Otherwise, keep parsing until the expression stack is empty, as to ensure that any complex expression involving a lambda will always be parsed in entirety.
Recovery
This is the story of Parser.currentElement
.
Recovery starts when a syntax error is encountered, the chain resumeOnSyntaxError()
-> buildInitialRecoveryState()
sets things in motion:
-
buildInitialRecoveryState()
essentially converts existing ASTNodes inParser.astStack
intoRecoveredElement
s.- This step maps the nested structure indicated by astStack into a proper tree of recovered elements.
- More elements will be added later using
RecoveredElement.add(*)
-
currentElement
will always point to a current leaf in the tree, the rest being reachable via theparent
chain.
- More elements will be added later using
- Expressions are explicitly omitted.
- An exception is implemented for block lambdas, so that an empty block is inserted into the tree of recovered elements.
- If a lambda appears, e.g., in the RHS of a local declaration, the block cannot be made a child of the local, so the hierarchy has to be tweaked:
- The local declaration is considered as complete
- The enclosing block is resumed
- The new block (body of the lambda) is inserted as a sibling to the local declaration.
- This step maps the nested structure indicated by astStack into a proper tree of recovered elements.
The last expression may, however, be processed by updateRecoveryState()
(right after resumeOnSyntaxError()):
- relevant things may happen below
attachOrphanCompletionNode()
->buildMoreCompletionContext()
.-
buildMoreCompletionEnclosingContext()
has special treatment for code nested in an if statement and specifically for instanceof.
-
Supporting casted-receiver completions
Here is a special kind of completion, for which the AST may be created opportunistically, even inventing some structure may happen.
The core idea is that in statements like if (o instanceof Foo) o.|
completion should propose members of Foo
even if o
has a different static type. We want this to apply in more situations, too: nested if statements. &&
expressions where the left operand is an instanceof
test.
NB: I couldn't find similar treatment for ConditionalExpression
, which looks inconsistent / incomplete to me. Also CompletionEngine.computeExpectedTypes()
has a code comment indicating room for improvement: "for future use".
Curiously, in current master the CompletionParser will concoct even AND_AND_Expressions
(&&
) into an IfStatement
, so CompletionEngine.findFieldsAndMethodsFromCastedReceiver(..)
only handles (nested) if statements - even where the source code contained &&
.
As of bug 539685 I'm changing this to stick closer to a true AST. This strategy consists of two parts:
-
CompletionNodeDetector
will detect the outermost of a chain ofIfStatement
orAND_AND_Expression
of which the condition / left operand is known to be true at the cursorLocation (i.e., the cursor is in the then-branch or right operand). The result will be found in#enclosingNode
. -
CompletionEngine.findFieldsAndMethodsFromCastedReceiver(..)
needs to consider both kinds of nodes during traversal (findGuardingInstanceOf()
andfindGuardedInner()
). It should be straight-forward to also integrateConditionalExpression
in the same manner.
to be continued.
Lambda Specifics
Completion involving lambdas has essentially been implemented via bug 422468 and bug 423987.
fallBackToSpringForward()
This method will essentially try two strategies:
- Can we close a pending lambda, either by the next real token, or by one of a fixed set of
RECOVERY_TOKENS
? - If not, can we just jump to the next 'thing'?
For strategy (1) we first check if the eofPosition
needs to be pushed past the cursor location, allowing to scan at least up-to the end of the current method body.
For strategy (2) we keep a stack (since bug 535743) of snapshots, represented by instances of CompletionParser, which mainly hold copies of the various stacks, plus various assist-specific fields.
- The snapshots are created / updated by calling
commit()
when various elements are consumed which could contain a lambda expression (seetriggerRecoveryUponLambdaClosure()
). - In one arm,
fallBackToSpringForward()
will copy the state of the latest / matching snapshot into the current parser. After such copyState() the parser will be told, toRESUME
(rather thanRESTART
) in order to keep the parsing state reached thus far. - It's crucial that each snapShot will be removed from the stack at the right point in time (from
triggerRecoveryUponLambdaClosure
,consumeBlock
, orconsumeMethodDeclration
).
Is RESUME safe? In bug 539685 we
- start with a syntax error caused by seeing
TokenNameEOF
in the middle of the code (ie. at the cursor position). -
automatonWillShift()
will answerfalse
because the automaton is beyond the state were it could accept the actual next token (after having used EOF for reducing several rules).
This would suggest that extending the range happens too late. We cannot first say EOF, and then later modify currentToken so smth else.
actFromTokenOrSynthetic()
Via bug 473654 another strategy for closing a pending lambda has been added:
Method actFromTokenOrSynthetic()
may synthesize an arbitrary sequence of RECOVERY_TOKENS
as long as they can be accepted by the parser. Here we are feeding the tokens right into the parser as if they were coming from the scanner.