Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

CDT/Obsolete/C editor enhancements/Camel-case completion

< CDT‎ | Obsolete‎ | C editor enhancements
Revision as of 09:59, 2 February 2011 by Jens.elmenthaler.verigy.com (Talk | contribs) (Matching Rules)

This is a problem page. Please treat it as a discussion page and feel free to insert your comments anywhere. My open questions are in bold--Tomasz Wesołowski 13:40, 17 May 2010 (UTC)

Description

When using context assist, many symbols with a similar beginning of name may exist. A solution to select one of them quickly is to search the proposals by their abbreviation as an alternative to searching by beginning of the name. This allows to find the desired proposal more quickly often, especially for long names.

Ranking

Category: editing

Importance: 3/5 --Tomasz Wesołowski 13:40, 17 May 2010 (UTC)

3/5 --Tobias Hahn 09:44, 18 May 2010 (UTC)

5/5 --Jens Elmenthaler I'm using it very often in Java, I really miss it in C++. It possibly also supports the filtering you discussed on another page.

3/5 --Kirstin Weber 08:49, 10 June 2010 (UTC)

3/5 --Gil Barash 18:15, 25 September 2010 (UTC)

Please append your opinion here.

Solution

This feature can be implemented in a way similar to JDT. It must also be taken into account that camelCase is only one convention in C++ and underscore_case is also popular.

An expected behaviour for giving context assist options of a text 'get' would be to give context assist results in a way similar to:

getSomething        // match by name start
giveExampleText     // match by camel case
give_example_text   // match by underscore case
GIVE_EXAMPLE_TEXT   // match by capital underscore case

Matches by name start shall have higher priority than matches by abbreviation.

The size of characters shall be ignored while matching (i.e. only be used to find word boundaries to determine the abbreviation, but not for filtering the results.

This approach would be a bit different than JDT, but allows to select some_symbol and someSymbol in an uniform way - do you find this way preferred or do you prefer strict case sensitivity here, like sS for someSymbol and ss (or even s_s) for some_symbol? Please provide use cases.

Jens Elmenthaler For camelCase matching, we should use the same rules as JDT. That means someSymbol would not match ss. The question whether an underscore_case matches a given string or not is not so simple. My proposal would be to treat _ or an uppercase letter in the typed string as the beginning of a new word, and then match the canndidates accordingly. I.e. typing sS would result in someSymbol and some_symbol. s_s would result in some_symbol only.

Gil Barash I agree that a capital letter or an underscore should represent the beginning of a new word. But my examples would be a bit different:

  • 'ab' should match only symbols which start with 'ab' (ignoring case).
  • 'AB' should match symbols which start with 'ab' (ignoring case), plus 'addBlock', 'AddBlock' and 'add_block'
  • 'a_b' or 'A_B' should match symbols which start with 'a_b' (ignoring case), plus 'addBlock', 'AddBlock' and 'add_block'
  • 'adBl' or 'AdBl' or 'ad_bl' should also match 'addBlock, 'AddBlock' and 'add_block'

I would also suggest that the matching words won't have to be in the same order (with lower ranking, of course), meaning that 'a_b' would match 'add_block' with high ranking and 'block_add' with lower ranking.


Design

This chapter was added by Jens Elmenthaler

Matching Rules

The matching rules provided by the JDT shall also be valid for the CDT. While the underscore notation is rarely used in Java, it is wide-spread in C/C++. Because of that, the matching rules cannot be copied from the JDT, but must be extended to cover the underscore notation as well. For instance, in the JDT you cannot type "FB" in order to yield "FOO_BAR". For C/C++ this is a must (think of all the ugly macro business).

The original SegmentMatcher implemention did not match "FOO_BAR" when typign "FB". So I tried to understand how it works/should work. The result is the following rules.

The string typed by the user to be completed is called the *segment pattern*. Using BNF, a segment pattern might be described as follows (no whitespace of course between the tokens):

 segmentPattern :=
   EMPTY |
   segment |
   segmentPattern segment
 segment :=
   separatedSegment |
   camelCaseSegment |
   numberSegment
 separatedSegment :=
   separator numberBody |
   separator textBody
 separator :=
   NEITHER_DIGIT_NOR_LETTER |
   separator NEITHER_DIGIT_NOR_LETTER
 textBody :=
   LETTER |
   textBody LOWER_CASE_LETTER
 numberBody :=
   DIGIT |
   numberBody DIGIT
 camelCaseSegment :=
   UPPER_CASE_LETTER |
   camelCaseSegment LOWER_CASE_LETTER
 numberSegment :=
   DIGIT |
   numberSegment DIGIT

In a first step, the user provided pattern will be decomposed into pattern segments. The BNF above defines how the pattern can be decomposed into segments starting with one or multiples separators, camel case segments, or number segments.

In the second step, each segment can conceptually be translated into a regular expression that segments in candidate names have to match.

 separatedSegment:
   _foo -> _f[0o][Oo].*
   _Foo -> _F[0o][Oo].*
   _56  -> _56.*
   ___foo -> ___f[0o][Oo].*
 camelCaseSegment (if first of pattern):
   foo  -> f[0o][Oo].*
   Foo  -> F[0o][Oo].*
 camelCaseSegment (any following):
   Foo  -> (_[fF]|F)[0o][Oo].*
 numberSegment:
   42   -> 42.*

In the final step, all the regular expressions from the individual pattern segments are appended one after another. This results in a regular expression that names can finally be matched agains. (Note, the prefix match is not considered here.)

Performance Considerations

UI-only Solution?

Required Core Support

Add SegmentMatcher and ContentAssistMatcher

Thanks to Tomasz, there already is a SegmentMatcher class. This class matches a name against a segment pattern, and thus provides the foundation for a camel case/underscore completion.

In addition a ContentAssistMatcher will be added. This class provides the facade for the matching algorithms exclusively for content assist-like features, i.e. all code for content assist-like features shall use this class instead of the SegmentMatcher directly. This approaches allows to introduce user preferences for further tweaking without the need to touch all clients.

Both the SegmentMatcher as well as the ContentAssistMatcher will become part of the API, such that other plugins can re-use them (e.g. other languages than C/C++). They are located in org.eclipse.cdt.core.parser.util.

Q: this location has been chosen because the current way of matching prefixes is located in CharArrayUtils, which reside in exactly this package. But it doesn't seem to be natural to place them there. Is there a better place?

Index

The index provides IIndex and IIndexFragment(internal only) as API. Both interfaces already provide the means to find:

  • bindings/macros with a given name
  • bindings/macros starting with a given prefix
  • bindings/macros matching a pattern (those with '*' and '?')

In order to maintain API compatibility, the semantics of these methods will not be changed. Instead, explicit methods for content assist will added:

org.eclipse.cdt.core.index.IIndex:

  • IIndexBinding[] findBindingsForContentAssist(char[] prefix, boolean fileScopeOnly, IndexFilter filter, IProgressMonitor monitor) throws CoreException
  • IIndexMacro[] findMacrosForContentAssist(char[] prefix, IndexFilter filter, IProgressMonitor monitor) throws CoreException

org.eclipse.cdt.internal.core.index.IIndexFragment:

  • IIndexFragmentBinding[] findBindingsForContentAssist(char[] prefix, boolean filescope, IndexFilter filter, IProgressMonitor monitor) throws CoreException
  • IIndexMacro[] findMacrosForContentAssist(char[] prefix, IndexFilter filter, IProgressMonitor monitor) throws CoreException;

Q: Does it make sence to put these content assist-specific methods into IIndex and IIndexFragment? Possibly just a utility class accessible to the implementaters of AST nodes and scopes would be sufficient.

AST

PDOM

References

Bug 173458 and Bug 223625 request this feature.

Back to the top