CDT/Obsolete/C editor enhancements/Include management

This is a problem page. Please treat it as a discussion page and feel free to insert your comments anywhere. My open questions are in bold--Tomasz Wesołowski 14:22, 30 April 2010 (UTC)

Problem

Description

The management of include directives is probably one of the most repetitive tasks in C/C++ programming.

There's already an "Add include" feature, but it has its drawbacks: needs manual invocation by the programmer and is inaccurate in numerous situations. Clearly there's a lot of space for improvement.

Possible tools includes:

Checkers/Quick fixes
- Find unused includes. See Marking / removing unused includes.
- Find indirect includes used (i.e. we use a.h but we don't include it but we include b.h which includes a.h)
- Find all files that need to be included. See Headers to include.
Refactoring
- Sort includes. See Sort order.
- Remove all unused includes / flatten include hierarchy (i.e., only include directly used headers). See Marking / removing unused includes.
Editor
- Folding. See Folding.

Ranking

4/5 Sergey Prigogin 07:16, 18 May 2010 (UTC)

4/5 Tobias Hahn 09:32, 18 May 2010 (UTC)

4/5 Jens Elmenthaler

4/5 Kirstin Weber

5/5 Mathiaskunter.gmail.com 12:43, 29 April 2012 (UTC)

Inspiration

JDT features wonderful include handling. If the Java include path is set correctly, then the programmer probably never has to add an import manually and is thus not distracted from actual coding.

The addition of new import can be done in two ways: when selecting a class from Context Assist which is not yet imported, or by invoking Organize Includes. JDT also warns about and allows to remove unused imports.

Solution proposal

A possible solution is to make CDT work like JDT. The problem gets a bit more complex in C/C++, though.

Invocation

Context Assist

A JDT-like solution would be to integrate adding includes with Context Assist.

Implementing a JDT-like solution would require little change to Context Assist behaviour. Context Assist, when invoked from a context with a part of name entered, would need to display a set of indexed symbols; ideally sorted in some clever way - how exactly?. A selection of not-yet-included option would invoke the include directive generation.

Quick Fix

Similarly to JDT, organizing includes could be done via Quick Fix (Ctrl-1) on warnings generated on "unorganized" includes or on the line after the last #include statement for missing includes.

Annotations

The not-yet-included elements can be annotated with the defining header file in-place, so that users can easily invoke the include directive generation.

Menus

Organizing the include directives should also be possible via the main and context menus.

Batch processing / scripting

It's sometimes necessary to organize the includes of multiple files or even the entire project. It would therefore be nice to have some batch processing / scripting functionality of the organize includes operation.

Generation

The generation of an include directive is also not a trivial task, as it is in Java. Include directives can be generated in different ways. It probably makes sense to introduce some user preference options which control the mode of inclusion.

Quotes vs. angle brackets

Include directives can either use quotes or angle brackets to specify the header which should be included. It's common to use the quoted form to include user-defined headers, and the angle bracket form to include (standard) library headers.

Since it can be problematic to determine whether a given header file is "user-defined" or not (AG: it might be possible to inspect "built-in" flag on include paths), there should be a way to let the user specify for a given include directory whether the header files contained therein should be included by quotes or angle brackets.

Sensible default settings possibly are to include header files from the same project by using quotes, and to include all other headers by using angle brackets.

Path to use

Include directives which use the quoted form can always use a relative path to specify the target header (e.g. "header.h", "dir/header.h" or "../header.h"). Depending on the include paths, it may however be possible to include the same header in different ways. For example, it might be possible to include the same header via "path/to/header/header.h" and "header.h" if "path/to/header" is on the include path.

It probably makes sense to use the shortest path by default - i.e. to prefer to include "header.h" instead of "path/to/header/header.h". It however should be possible to fine-tune this behavior via preference options, for example as follows:

Prefer to include headers relative to the source file if located in...

...the same directory as the source file (enabled by default)
...a subdirectory of the source file's directory (enabled by default)
...a parent directory of the source file's directory (disabled by default)

Headers which use angle brackets are usually relative to only one specific (system) include path, and aren't included relative to any source files. The problem of different include paths therefore usually doesn't apply to them.

Headers to include

Headers can be included indirectly, e.g. if file.cpp includes a.h which in turn includes b.h. This is OK as long as file.cpp itself only depends on a.h, but if file.cpp actually depends on b.h instead, it should rather include b.h directly (i.e. the hierarchy of include directives should be flattened). Each file should ideally include exactly those headers which are directly used within that file, not more, not less.

This however only works if all included headers themselves also follow this rule. The problem is that many existing headers don't. A prominent example is std::set, which is actually defined in <bits/stl_set.h>. This header however isn't self-sufficient; someone has to include <set> instead. The responsibility to include the correct header is therefore moved from the author of the header file to the author of the source file. In the given example this isn't problematic since it's a well-known fact that <set> has to be included instead of <bits/stl_set.h>. From an algorithmic point of view however this is implicit knowledge which generally can't be programmatically obtained.

It however should be possible to use a heuristic approach here like the following to (partially) solve this problem. Many C++ libraries follow the naming convention that headers intended for direct inclusion don't have any file name extension. It's furthermore common to name the headers after the symbol which they declare. Let's examine the std::set example again. The definition is in <bits/stl_set.h>, which in turn is included by <set> - i.e. a header which has no file name extension and is also named like the symbol which should be resolved. Chances seem to be good that we've found the header which we actually want to include.

Another way to handle this problem would be to maintain a header "blacklist" which lists those headers which mustn't be included directly, and also tells which headers should be included instead. The user should be able to edit this list, as he maybe needs to add user-defined headers as well to this list. Such a list should probably contain all standard library headers by default so that users don't have to add them manually.

It's certainly necessary to do extensive testing in order to find a suitable algorithm for detecting the required header files. Such an algorithm is also crucial for removing unused includes.

Forward declarations

We often only need a simple forward declaration instead of a full definition. The feature should decide which one is required so that compilation times can be reduced. There should be user preference options about the usage of forward declarations for classes, structs, unions and enums. This should probably be enabled by default except for enums since usage of enum forward declarations require the new C++11 standard.

Check the Forward declarations page about a summary of when forward declarations can be used and when not.

Sort order

Include directives can be sorted in various ways. It therefore would make sense to introduce some user preference options to control the sort order. Possible sort orders are:

By header file location. Include directives can be sorted by the location of the included header file. For example, the sort order might be as follows: Library headers > Project-relative headers > File-relative headers > Forward declarations. AG: "the best practice" of inclusion order is from specific to general, to ensure that the headers are self-sufficient.

Alphabetically.

None. In contrast to Java, reordering inclusions can change the compilation unit in C/C++. Conditional compilations, comments, includes mixed within code etc. can complicate things. Another sort "order" could therefore be to aim to keep the original order of include directives as far as possible.

Reporting problems

We need some feasible way to report problems to the user. The user especially should have the possibility to see which symbols couldn't be successfully resolved while trying to organize the includes. A possible solution would be to optionally insert those problem reports as comments beneath the include directives. The result of an organize includes operation could therefore look e.g. something like this:

// Library headers
#include <library_header_1.h>
#include <library_header_2.h>
 
// Project headers
#include "project_header_1.h"
#include "project_header_2.h"
 
// Forward declarations
class foo;
class bar;
 
// Found problems
// Problem: Couldn't find definition of symbol XXX
// Problem: Couldn't find definition of symbol YYY
// Problem: Couldn't find definition of symbol ZZZ

Marking / removing unused includes

Marking of unused includes could be implemented as a Codan checker.

This task may be more tricky to implement than in Java. To decide if a header is used, we'd also need to check symbols in a compilation unit against all headers included by it, recursively. Browsing a big hierarchy of headers against occurrences of a given symbol might be time-consuming on bigger projects.

Java has "flat" imports, so this problem doesn't exist there.

A different approach to solve this problem is to simply find out those headers which are used by the current file (as it has to be done anyway to organize the includes), and to consider all other included headers as unused. Doing it this way eliminates the need to browse the entire header hierarchy, but requires a preceding "add missing includes" operation (which flattens out the include hierarchy). The feature to mark / remove unused includes can therefore be perfectly implemented as part of an "Organize includes" operation, but can't be so easily implemented as standalone feature.

An example:

File file.cpp:

#include "a.h"
 
void foo()
{
    bar();
}

File a.h:

#include "b.h"

File b.h:

void bar();

If we can determine the required includes for file.cpp ("add missing includes" operation), we will find out that file.cpp needs to only include b.h. Knowing this, we can simply insert the include directive for b.h into file.cpp and conclude that a.h is now an unused include directive (simply because it's not a used one).

So, if we can reliably tell which headers are used by a given source file then we can also reliably tell which headers are not used by this file. I think this is the way to solve this problem.

Do you believe that such feature is neccessary?

[Markus Schorn]: In general you will not be able to determine whether a header file needs to be included. Here are a few different examples for that:
- Including a file that contains inline function definitions: The compiler does not need the definitions, the linker will complain.
  - if c files does not use inlines it should be save to remove them? (Elaskavaia.cdt.gmail.com 18:49, 17 May 2010 (UTC))
- Including a file that contains #undef statements: You will not find references for the undefined macro, however removing the statement may change the way your code is compiled.
  - It can be detected too - i.e. if it is not used by any header below it and compilation unit - it won't affect it (Elaskavaia.cdt.gmail.com 18:49, 17 May 2010 (UTC))
- Similar for #pragma statements that affect the way code is compiled.
- Headers that are necessary dependent on compiler switches.
  - You mean header that are included (or not) using conditional compilation? (Elaskavaia.cdt.gmail.com 18:49, 17 May 2010 (UTC))

Elaskavaia.cdt.gmail.com 18:49, 17 May 2010 (UTC)
- Yes it is not error prone however I think checker like that would be very useful. User should judge if he wants to remove header or not manually. I have done such checker before it can easily take a month on its own (plus other include checkers).

[Kirstin Weber]: I think support for includes is very helpful. It is one of the great features for Java and I really miss it when I write C++ code again. In Java unnecessary imports are marked immediately. If this is too time-consuming or hard/impossible to find out for C++ immediately it would be helpful if there would be a "clean up" function, perhaps after compiling. You compile the code and afterwards you invoke such a feature and all unnecessary imports are removed. You can use such a feature when you have finished your implementation in order to clean up your code at the end.

Folding

A little handy detail related to this problem would be to provide a possibility to fold a block of include directives and / or a block of forward declarations.

References

The proposed "Organize Includes" feature: Bug 45203

Adding includes with Context Assist: Bug 291977

Problem with the "Add Include" feature (already fixed): Bug 182897

Problem with the "Add Include" feature (apparently also already fixed?): Bug 113063

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

CDT/Obsolete/C editor enhancements/Include management

Contents

Problem

Description

Ranking

Inspiration

Solution proposal

Invocation

Context Assist

Quick Fix

Annotations

Menus

Batch processing / scripting

Generation

Quotes vs. angle brackets

Path to use

Headers to include

Forward declarations

Sort order

Reporting problems

Marking / removing unused includes

Folding

References

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

CDT/Obsolete/C editor enhancements/Include management

Contents

Problem

Description

Ranking

Inspiration

Solution proposal

Invocation

Context Assist

Quick Fix

Annotations

Menus

Batch processing / scripting

Generation

Quotes vs. angle brackets

Path to use

Headers to include

Forward declarations

Sort order

Reporting problems

Marking / removing unused includes

Folding

References