Re: Proposal: common explicit lists for installed headers - Mailing list pgsql-hackers

From Zsolt Parragi
Subject Re: Proposal: common explicit lists for installed headers
Date
Msg-id CAN4CZFMdLeF7nRjWi=PKwsKyT7iZQidzE-Toeenq8hvpYitOeQ@mail.gmail.com
Whole thread
In response to Re: Proposal: common explicit lists for installed headers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> I think you've chosen the worst-common-denominator
> solution. There is nothing to like about an explicit list of
> installed files.

This is a difficult and very opinionated question. Personally, I like
to use CMake's glob_recurse + configure_depends even for C/C++ source
files in my small personal projects for simplicity, so I completely
understand and somewhat agree with your point.

In practice however, both Meson[1] and CMake[2], the two most popular
build systems, actively discourage any type of globbing/automatic file
discovery magic, and they have good reasons for doing so.

The main problem with it, which was also why Andreas Freund suggested
doing this in the headerscheck thread is that this breaks dependency
resolution:
if a new header file appears, that won't result in a build script
change, which means that the build system won't pick it up. It won't
get installed, it won't get verified by headerscheck, etc.

This means that it can:
* lead to CI failures/misses
* lead to developers missing build errors locally, wasting CI time
* break git bisect

The workaround for it (e.g. cmake's configure_depends) is to instead
of running the glob expression / git ls-files / etc during configure
time, run it for every build invocation. That's fine on fast NVMEs,
terrible on spinning disks / slow fs with large projects like
postgres.

> that we will sometimes forget to add some new header to the list.
> We won't notice such omissions until some end user complains,
> maybe years later.

This already happens, I just submitted a patch for a fix like this
yesterday, which I noticed while working on this [3]. We only use
glob/add_subdir for some directories, not all of them. The current
behavior is inconsistent, and because of that, it is easier to miss. I
think having a clear rule, and following it everywhere would improve
things.

> At least for me,
> git ls-files 'src/include/*.h'
> seems to produce a pretty usable list of server headers.

I was thinking about adding a configure-time step that verifies the
files: during configuration, it checks if we missed any headers (e.g.
they are not in the list files), and reports an error. Still keeping
the explicit requirement, but adding a guardrail - also, CI would fail
quickly if something is missed.

For now I left it out to:

a. keep the first patch simpler
b. keep things consistent, because while this is true for src/include,
it's not true for the other directories, we do not install all header
files from those (e.g. src/interfaces/libpq). If I wanted to add
automatic checks there, I would have to also add exclude lists.

[1]: https://mesonbuild.com/FAQ.html#why-cant-i-specify-target-files-with-a-wildcard
[2]: https://cmake.org/cmake/help/latest/command/file.html#filesystem
[3]: https://www.postgresql.org/message-id/CAN4CZFP6NOjv__4Mx%2BiQD8StdpbHvzDAatEQn2n15UKJ%3DMySSQ%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Srinath Reddy Sadipiralla
Date:
Subject: Re: bug: pg_dumpall with --data-only and --clean options is giving an error after some dump
Next
From: Hannu Krosing
Date:
Subject: Re: Adding pg_dump flag for parallel export to pipes