Thread: backend *.c #include cleanup (IWYU)
I played with include-what-you-use (IWYU), "a tool for use with clang to analyze #includes in C and C++ source files".[0] I came across this via clangd (the language server), because clangd (via the editor) kept suggesting a bunch of #includes to remove. And I suppose it was right. So as a test, I ran IWYU over the backend *.c files and removed all the #includes it suggested. (Note that IWYU also suggests to *add* a bunch of #includes, in fact that is its main purpose; I didn't do this here.) In some cases, a more specific #include replaces another less specific one. (To keep the patch from exploding in size, I ignored for now all the suggestions to replace catalog/pg_somecatalog.h with catalog/pg_somecatalog_d.h.) This ended up with the attached patch, which has 432 files changed, 233 insertions(+), 1023 deletions(-) I tested this with various compilation options (assert, WAL_DEBUG, LOCK_DEBUG, different geqo variants, etc.) to make sure a header wasn't just used for some conditional code section. Also, again, this patch touches just *.c files, so nothing declared from header files changes in hidden ways. Also, as a small example, in src/backend/access/transam/rmgr.c you'll see some IWYU pragma annotations to handle a special case there. The purpose of this patch is twofold: One, it's of course a nice cleanup. Two, this is a test how well IWYU might work for us. If we find either by human interpretation that a change doesn't make sense, or something breaks on some platform, then that would be useful feedback (perhaps to addressed by more pragma annotations or more test coverage). (Interestingly, IWYU has been mentioned in src/tools/pginclude/README since 2012. Has anyone else played with it? Was it not mature enough back then?) [0]: https://include-what-you-use.org/
Attachment
On Sat, Feb 10, 2024 at 08:40:43AM +0100, Peter Eisentraut wrote: > The purpose of this patch is twofold: One, it's of course a nice cleanup. > Two, this is a test how well IWYU might work for us. If we find either by > human interpretation that a change doesn't make sense, or something breaks > on some platform, then that would be useful feedback (perhaps to addressed > by more pragma annotations or more test coverage). > > (Interestingly, IWYU has been mentioned in src/tools/pginclude/README since > 2012. Has anyone else played with it? Was it not mature enough back then?) I haven't played with it at all, but the topic reminds me of this thread: https://postgr.es/m/flat/CALDaNm1B9naPDTm3ox1m_yZvOm3KA5S4kZQSWWAeLHAQ%3D3gV1Q%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On 10.02.24 21:13, Nathan Bossart wrote: >> (Interestingly, IWYU has been mentioned in src/tools/pginclude/README since >> 2012. Has anyone else played with it? Was it not mature enough back then?) > I haven't played with it at all, but the topic reminds me of this thread: > > https://postgr.es/m/flat/CALDaNm1B9naPDTm3ox1m_yZvOm3KA5S4kZQSWWAeLHAQ%3D3gV1Q%40mail.gmail.com Approaches like that as well as the in-tree pgrminclude work by "I removed the #include and it still compiled fine", which can be unreliable. IWYU on the other hand has the compiler tracking where a symbol actually came from, and so if it says that an #include is not used, then it's pretty much correct by definition.
On Mon, Feb 12, 2024 at 05:08:40PM +0100, Peter Eisentraut wrote: > On 10.02.24 21:13, Nathan Bossart wrote: >> I haven't played with it at all, but the topic reminds me of this thread: >> >> https://postgr.es/m/flat/CALDaNm1B9naPDTm3ox1m_yZvOm3KA5S4kZQSWWAeLHAQ%3D3gV1Q%40mail.gmail.com > > Approaches like that as well as the in-tree pgrminclude work by "I removed > the #include and it still compiled fine", which can be unreliable. IWYU on > the other hand has the compiler tracking where a symbol actually came from, > and so if it says that an #include is not used, then it's pretty much > correct by definition. Okay. FWIW I tend to agree that this is nice cleanup. I'd personally run this before every commit on HEAD if there was an easy way to do so and it didn't cause changes to a bunch of unrelated files, but I understand that the pgindent stuff in the buildfarm isn't super popular. I'd even like to have a tool that automatically adjusted #include ordering to match project policy, assuming one does not already exist. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Sat Feb 10, 2024 at 1:40 AM CST, Peter Eisentraut wrote: > I played with include-what-you-use (IWYU), "a tool for use with clang to > analyze #includes in C and C++ source files".[0] I came across this via > clangd (the language server), because clangd (via the editor) kept > suggesting a bunch of #includes to remove. And I suppose it was right. > > So as a test, I ran IWYU over the backend *.c files and removed all the > #includes it suggested. (Note that IWYU also suggests to *add* a bunch > of #includes, in fact that is its main purpose; I didn't do this here.) > In some cases, a more specific #include replaces another less specific > one. (To keep the patch from exploding in size, I ignored for now all > the suggestions to replace catalog/pg_somecatalog.h with > catalog/pg_somecatalog_d.h.) This ended up with the attached patch, > which has > > 432 files changed, 233 insertions(+), 1023 deletions(-) > > I tested this with various compilation options (assert, WAL_DEBUG, > LOCK_DEBUG, different geqo variants, etc.) to make sure a header wasn't > just used for some conditional code section. Also, again, this patch > touches just *.c files, so nothing declared from header files changes in > hidden ways. > > Also, as a small example, in src/backend/access/transam/rmgr.c you'll > see some IWYU pragma annotations to handle a special case there. > > The purpose of this patch is twofold: One, it's of course a nice > cleanup. Two, this is a test how well IWYU might work for us. If we > find either by human interpretation that a change doesn't make sense, or > something breaks on some platform, then that would be useful feedback > (perhaps to addressed by more pragma annotations or more test coverage). > > (Interestingly, IWYU has been mentioned in src/tools/pginclude/README > since 2012. Has anyone else played with it? Was it not mature enough > back then?) > > [0]: https://include-what-you-use.org/ I like this idea. This was something I tried to do but never finished in my last project. I have also been noticing the same issues from clangd. Having more automation around include patterns would be great! I think it would be good if there a Meson run_target()/Make .PHONY target that people could use to test too. Then, eventually the cfbot could pick this up. -- Tristan Partin Neon (https://neon.tech)
Hi, On 2024-02-10 08:40:43 +0100, Peter Eisentraut wrote: > So as a test, I ran IWYU over the backend *.c files and removed all the > #includes it suggested. (Note that IWYU also suggests to *add* a bunch of > #includes, in fact that is its main purpose; I didn't do this here.) In some > cases, a more specific #include replaces another less specific one. (To > keep the patch from exploding in size, I ignored for now all the suggestions > to replace catalog/pg_somecatalog.h with catalog/pg_somecatalog_d.h.) This > ended up with the attached patch, which has I think this kind of suggested change is why I have been wary of IWYU (and the changes that clangd suggest): By moving indirect includes to .c files you end up with implementation details more widely, which can increase the pain of refactoring. I'd be hesitant to commit to this without a) a policy about adding annotations about when indirect includes shouldn't directly be included b) annotations ensuring that. What were the parameters you used for IWYU? I'm e.g. a bit surprised about the changes to main.c, adding an include for s_lock.h. For one I don't think s_lock.h should ever be included directly, but more importantly it seems there isn't a reason to include spin.h either, but it's not removed here? There are other changes I don't understand. E.g. why is catalog/binary_upgrade.h removed from pg_enum.c? It's actually defining binary_upgrade_next_pg_enum_oid, declared in catalog/binary_upgrade.h? I think some of the changes here could be applied independently of more iwyu infrastructure, e.g. replacing includes of builtins.h with includes of fmgrprotos.h. But the larger set of changes seems somewhat hard to judge as-is. FWIW, for me the tree with the patch applied doesn't build anymore, presumably due to 8be93177c46. ../../../../../home/andres/src/postgresql/src/backend/commands/event_trigger.c: In function 'EventTriggerOnLogin': ../../../../../home/andres/src/postgresql/src/backend/commands/event_trigger.c:955:72: error: 'F_OIDEQ' undeclared (firstuse in this function) 955 | BTEqualStrategyNumber, F_OIDEQ, | ^~~~~~~ ../../../../../home/andres/src/postgresql/src/backend/commands/event_trigger.c:955:72: note: each undeclared identifier isreported only once for each function it appears in Greetings, Andres Freund
On 2024-Feb-10, Peter Eisentraut wrote: > So as a test, I ran IWYU over the backend *.c files and removed all the > #includes it suggested. (Note that IWYU also suggests to *add* a bunch of > #includes, in fact that is its main purpose; I didn't do this here.) In some > cases, a more specific #include replaces another less specific one. Sounds reasonable in principle. I looked at the access/brin changes and they seems OK. Then I noticed the execdebug.h -> executor.h changes and was surprised, until I looked at execdebug.h and though ... maybe we should just get rid of that file altogether. > Also, as a small example, in src/backend/access/transam/rmgr.c you'll see > some IWYU pragma annotations to handle a special case there. Looks pretty acceptable. > The purpose of this patch is twofold: One, it's of course a nice cleanup. > Two, this is a test how well IWYU might work for us. If we find either by > human interpretation that a change doesn't make sense, or something breaks > on some platform, then that would be useful feedback (perhaps to addressed > by more pragma annotations or more test coverage). Apparently the tool has long standing, so since the results appear to be good, I'm not opposed to adding it to our workflow. Not as much as pgindent for sure, though. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "All rings of power are equal, But some rings of power are more equal than others." (George Orwell's The Lord of the Rings)
Peter Eisentraut <peter@eisentraut.org> writes: > Approaches like that as well as the in-tree pgrminclude work by "I > removed the #include and it still compiled fine", which can be > unreliable. IWYU on the other hand has the compiler tracking where a > symbol actually came from, and so if it says that an #include is not > used, then it's pretty much correct by definition. Well, it might be correct by definition for the version of the code that the compiler processed. But it sounds to me like it's just as vulnerable as pgrminclude to taking out #includes that are needed only by #ifdef'd code sections that you didn't compile. On the whole, our experience with automated #include removal is pretty awful. I'm not sure I want to go there again. regards, tom lane
Hi, On 2024-02-12 16:46:55 -0500, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: > > Approaches like that as well as the in-tree pgrminclude work by "I > > removed the #include and it still compiled fine", which can be > > unreliable. IWYU on the other hand has the compiler tracking where a > > symbol actually came from, and so if it says that an #include is not > > used, then it's pretty much correct by definition. > > Well, it might be correct by definition for the version of the code > that the compiler processed. But it sounds to me like it's just as > vulnerable as pgrminclude to taking out #includes that are needed > only by #ifdef'd code sections that you didn't compile. I think pgrminclude is far worse than IWYU - it *maximizes* reliance on indirect includes, the opposite of what iwyu does. I share concerns about removing includes for platform/config option specific code, but I think that it'd not take too many annotations to address that. I think the proposed patch shows some good changes that are painful to do by hand, but easy with iwyu, like replacing builtins.h with fmgrprotos.h, replacing includes of heapam.h/heap.h with table.h etc where appropriate. While I can see applying some targeted changes without more work, I don't really see much point in applying a lot of the other removals without actually committing to adding the necessary IWYU annotations to our code to make iwyu actually usable. Greetings, Andres Freund
On 12.02.24 21:07, Andres Freund wrote: > On 2024-02-10 08:40:43 +0100, Peter Eisentraut wrote: >> So as a test, I ran IWYU over the backend *.c files and removed all the >> #includes it suggested. (Note that IWYU also suggests to *add* a bunch of >> #includes, in fact that is its main purpose; I didn't do this here.) In some >> cases, a more specific #include replaces another less specific one. (To >> keep the patch from exploding in size, I ignored for now all the suggestions >> to replace catalog/pg_somecatalog.h with catalog/pg_somecatalog_d.h.) This >> ended up with the attached patch, which has > > I think this kind of suggested change is why I have been wary of IWYU (and the > changes that clangd suggest): By moving indirect includes to .c files you end > up with implementation details more widely, which can increase the pain of > refactoring. I'm actually not clear on what the policy of catalog/pg_somecatalog.h versus catalog/pg_somecatalog_d.h is or should be. There doesn't appear to be any consistency, other than that older code obviously is less likely to use the _d.h headers. If we have a policy, then adding some annotations might also be a good way to describe it. > What were the parameters you used for IWYU? I'm e.g. a bit surprised about the > changes to main.c, adding an include for s_lock.h. For one I don't think > s_lock.h should ever be included directly, but more importantly it seems there > isn't a reason to include spin.h either, but it's not removed here? I think the changes in main.c might have been my transcription error. They don't make sense. > There are other changes I don't understand. E.g. why is > catalog/binary_upgrade.h removed from pg_enum.c? It's actually defining > binary_upgrade_next_pg_enum_oid, declared in catalog/binary_upgrade.h? Ah, this is a deficiency in IWYU. It keeps headers that provide function prototypes, but it doesn't keep headers that provide extern declarations of global variables. I have filed an issue about that, and it looks like a fix might already be on the way.[0] [0]: https://github.com/include-what-you-use/include-what-you-use/issues/1461 This issue also led me to discover -Wmissing-variable-declarations, about which I will post separately. In the meantime, here is an updated patch with rebase and the above issues fixed.
Attachment
On 19.02.24 08:59, Peter Eisentraut wrote: >> There are other changes I don't understand. E.g. why is >> catalog/binary_upgrade.h removed from pg_enum.c? It's actually defining >> binary_upgrade_next_pg_enum_oid, declared in catalog/binary_upgrade.h? > > Ah, this is a deficiency in IWYU. It keeps headers that provide > function prototypes, but it doesn't keep headers that provide extern > declarations of global variables. I have filed an issue about that, and > it looks like a fix might already be on the way.[0] > > [0]: > https://github.com/include-what-you-use/include-what-you-use/issues/1461 > > This issue also led me to discover -Wmissing-variable-declarations, > about which I will post separately. > > In the meantime, here is an updated patch with rebase and the above > issues fixed. Here is another rebase. Also, for extra caution, I undid all the removals of various port(ability) includes, such as "port/atomics.h", just in case they happen to have some impact in some obscure configuration (= not covered by Cirrus CI). I propose to commit this patch now, and then maybe come back with more IWYU-related proposals in the future once the above issues are fixed.