Thread: ABI Compliance Checker GSoC Project
Hackers, I’d like to introduce Mankirat Singh, a Google Summer of Code student that Pavlo Golub and I are mentoring this year. He’sstarted work on his project, an ABI Compliance Checker. The plan is to work out the patterns, integrate it into the BuildFarm, and get it sending regular reports by early September. He’s building on the foundation laid by Peter Geogheganback in 2023 [1]. He’s written up his plan in his blog [2]. Since the work naturally gets into what’s considered a public API and what’s not, we feel that hackers is the best placeto ask questions about bits to include and exclude, as well as other questions related to configuration settings, etc.,so please watch for those. I hope you all will help Mankirat to quickly learn what’s what, figure out how it goes together,and make it a reality! In the meantime, please give him a warm welcome! Best, David PS: If one of you fine people can grant permission for Mankirat’s blog to be added to Planet Postgres, we’d super appreciateit. [1]: https://postgr.es/m/CAH2-Wzm-W6hSn71sUkz0Rem=qDEU7TnFmc7_jG2DjrLFef_WKQ@mail.gmail.com [2]: https://blog.mankiratsingh.com/
Attachment
Thanks for the introduction :D
On Tue, 3 Jun 2025 at 00:36, David E. Wheeler <david@justatheory.com> wrote:
Since the work naturally gets into what’s considered a public API and what’s not, we feel that hackers is the best place to ask questions about bits to include and exclude, as well as other questions related to configuration settings, etc., so please watch for those. I hope you all will help Mankirat to quickly learn what’s what, figure out how it goes together, and make it a reality!
Following up on this, I would really like to know and understand how I can actually differentiate between what symbols the abidiff tool outputs are considered to be causing ABI instability and what are not.
For example, in minor release 17.0 to 17.1, struct ResultRelInfo was changed by adding a new data member, causing the ABI instability.
But when I tried to compare the postgres binaries of version 17.4 and 17.5, which is supposed to have no ABI instability, I got the following output:
$ abidiff ./install/with_debug/REL_17_4/bin/postgres install/with_debug/REL_17_5/bin/postgres --leaf-changes-only --no-added-syms --show-bytes
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 2 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function (3 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct ReadStream at read_stream.c:109:1' changed:
type size hasn't changed
1 data member insertion:
'int16 io_combine_limit', at offset 2 (in bytes) at read_stream.c:112:1
there are data member changes:
'int16 ios_in_progress' offset changed from 2 to 4 (in bytes) (by +2 bytes)
'int16 queue_size' offset changed from 4 to 6 (in bytes) (by +2 bytes)
'int16 max_pinned_buffers' offset changed from 6 to 8 (in bytes) (by +2 bytes)
'int16 pinned_buffers' offset changed from 8 to 10 (in bytes) (by +2 bytes)
'int16 distance' offset changed from 10 to 12 (in bytes) (by +2 bytes)
'bool advice_enabled' offset changed from 12 to 14 (in bytes) (by +2 bytes)
'struct WalSndCtlData at walsender_private.h:91:1' changed:
type size hasn't changed
there are data member changes:
name of 'WalSndCtlData::sync_standbys_defined' changed to 'WalSndCtlData::sync_standbys_status' at walsender_private.h:110:1
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 2 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function (3 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
'struct ReadStream at read_stream.c:109:1' changed:
type size hasn't changed
1 data member insertion:
'int16 io_combine_limit', at offset 2 (in bytes) at read_stream.c:112:1
there are data member changes:
'int16 ios_in_progress' offset changed from 2 to 4 (in bytes) (by +2 bytes)
'int16 queue_size' offset changed from 4 to 6 (in bytes) (by +2 bytes)
'int16 max_pinned_buffers' offset changed from 6 to 8 (in bytes) (by +2 bytes)
'int16 pinned_buffers' offset changed from 8 to 10 (in bytes) (by +2 bytes)
'int16 distance' offset changed from 10 to 12 (in bytes) (by +2 bytes)
'bool advice_enabled' offset changed from 12 to 14 (in bytes) (by +2 bytes)
'struct WalSndCtlData at walsender_private.h:91:1' changed:
type size hasn't changed
there are data member changes:
name of 'WalSndCtlData::sync_standbys_defined' changed to 'WalSndCtlData::sync_standbys_status' at walsender_private.h:110:1
In the above report, the symbols ReadStream and WalSndCtlData have no type size changes.
And similarly, the following report compares the postgres binaries for versions 17.2 and 17.3:
$ abidiff ./install/with_debug/REL_17_2/bin/postgres ./install/with_debug/REL_17_3/bin/postgres --leaf-changes-only --no-added-syms --show-bytes
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function (6 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
1 function with some sub-type change:
[C] 'function void mdtruncate(SMgrRelation, ForkNumber, BlockNumber)' at md.c:1153:1 has some sub-type changes:
parameter 4 of type 'typedef BlockNumber' was added
'struct IndexOptInfo at pathnodes.h:1104:1' changed:
type size hasn't changed
there are data member changes:
type 'void (*)(...)' of 'IndexOptInfo::amcostestimate' changed:
pointer type changed from: 'void (*)(...)' to: 'void (*)(PlannerInfo*, IndexPath*, double, Cost*, Cost*, Selectivity*, double*, double*)'
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function (6 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
1 function with some sub-type change:
[C] 'function void mdtruncate(SMgrRelation, ForkNumber, BlockNumber)' at md.c:1153:1 has some sub-type changes:
parameter 4 of type 'typedef BlockNumber' was added
'struct IndexOptInfo at pathnodes.h:1104:1' changed:
type size hasn't changed
there are data member changes:
type 'void (*)(...)' of 'IndexOptInfo::amcostestimate' changed:
pointer type changed from: 'void (*)(...)' to: 'void (*)(PlannerInfo*, IndexPath*, double, Cost*, Cost*, Selectivity*, double*, double*)'
This also gives a 1 function subtype change in mdtruncate.
I don't have much idea about the PostgreSQL internal code, but I really wanted to know how we can "classify" what exactly is a false positive for this report and should not be reported by the final ABI compliance checking tool?
I checked the suppression file for libabigail tools, but it seems not to work for the PostgreSQL case, as we need to specify each and every symbol to be ignored in the file, which is kinda not possible. And these symbols to be suppressed don't follow any common Regex as well, and the suppression file doesn't support including whole files or folders.
Secondly, we can't use the -fvisibility=hidden flag while compiling either, as it results in a failed compilation with an error.
The only thing which seems to work, as per my knowledge, is to use the nm tool with the -D flag to get the dynamic symbols list, which are more important from an ABI stability POV(as those will be used by extensions?) and grep the symbols from abidiff output. If we found nothing, then ignore the warning. Although I am unsure about this and its possible side effects...?
For e.g. - $ nm -D ./install/with_debug/REL_17_4/bin/postgres | grep WalSndCtlData
Please correct me if I'm wrong somewhere.
Regards,
Mankirat
On Tue, 3 Jun 2025 at 20:49, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > I don't think it's the > job of the tool to determine that this ABI difference is okay. > Ultimately that's for a human to determine, Yes, but it would be better if we could automate that thing to some extent, along with the development of the ABI compliance checker. > and they must either add an > suppression to the Postgres source code somehow, or modify or revert the > commit so that the ABI change disappears. That's exactly what I aim for with this project, by just automatically run the comparing the latest commit with the previous using abidiff and if some ABI instability found, then immediately report it to the commiter or some mailing list so that maintainers don't have to do that manually everytime - That's the reason for asking about false positives and how to suppress them. > Also a diff worth reporting, and suppressing from the report as > appropriate. Okay - It is a change in name, I assume that's the reason it's important. > The only kinds of ABI changes that should be silenced by default are > > 1) addition of struct members that cause no change to the offsets of other > members in the struct (i.e. a new member that uses space that was > previously padding space) > > 2) addition of struct members at the end of structs, changing the struct > size, but only for structs that aren't used as array elements (the > problem being that the size of the "stride" when scanning the array > would mismatch). Not sure how easy it is to detect this case. These sound very similar to 17.1 minor release ABI instability. abidiff seems to do most of the work in this case: $ abidiff ./install/with_debug/REL_17_0/bin/postgres ./install/with_debug/REL_17_1/bin/postgres --leaf-changes-only --no-added-syms --show-bytes Leaf changes summary: 1 artifact changed Changed leaf types summary: 1 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function (11 filtered out) Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable 'struct ResultRelInfo at execnodes.h:450:1' changed: type size changed from 376 to 384 (in bytes) 1 data member insertion: 'bool ri_needLockTagTuple', at offset 376 (in bytes) at execnodes.h:597:1 > I'm also wondering what platforms/architectures are you thinking on > running this on. For instance, thinking about padding space in structs, > what is padding space in x86_64 may not be so in 32bit x86 ... The reporting system is planned to be developed as an extension to the postgresql build farm, so the abidiff reports across architectures from various animals could be received, processed and reported accordingly. > I think in a first cut of this tooling, we should consider all ABI > changes as potentially problematic. Certainly, will do that only until I understand how to identify and implement some techniques for the removal of false positives. > Please elaborate. Can you not write a suppression file that says > "ignore offset changes for ios_in_progress in ReadStream", for example? I can do that, and that's what's causing the problem. According to the documentation for these suppression files[1], we have to mention the particular symbol name we need to suppress like "ReadStream" or something particular like "ignore offset changes for ios_in_progress in ReadStream between member1 and member 2" which is humanly very hard to do as for sure there will be 100s of symbols in postgres like this which needs to be ignored in that case. > > And these symbols to be suppressed don't follow any common Regex as > > well, and the suppression file doesn't support including whole files > > or folders. > Ummm, are you saying that it complains about changes to unexported > symbols also? > > > > Secondly, we can't use the -fvisibility=hidden flag while compiling > > either, as it results in a failed compilation with an error. > > I didn't understand what you meant here. This led me to another doubt, which might make things clear, that is, any symbol which abidiff gives in output is important to report? because my initial thought was that symbols exported in the binary created when we build postgres also contain some symbols which need to be suppressed because they are some postgres internal functions[2] - is that true? If it's not true, then things seem to be much sorted than before. Note - The -fvisibility=hidden flag I mentioned was on a similar note that when we compile postgres with this flag, it should generate postgres binary by removing internal symbols, but instead it gives a compilation error. Regards, Mankirat [1] - https://sourceware.org/libabigail/manual/suppression-specifications.html#suppr-spec-label [2] - https://www.postgresql.org/message-id/CAH2-Wzm-W6hSn71sUkz0Rem%3DqDEU7TnFmc7_jG2DjrLFef_WKQ%40mail.gmail.com
On 2025-Jun-03, Mankirat Singh wrote: > On Tue, 3 Jun 2025 at 20:49, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > Please elaborate. Can you not write a suppression file that says > > "ignore offset changes for ios_in_progress in ReadStream", for example? > > I can do that, and that's what's causing the problem. According to > the documentation for these suppression files[1], we have to mention > the particular symbol name we need to suppress like "ReadStream" or > something particular like "ignore offset changes for ios_in_progress > in ReadStream between member1 and member 2" which is humanly very hard > to do as for sure there will be 100s of symbols in postgres like this > which needs to be ignored in that case. Well, now that I grep the source for ReadStream, I realize that the struct is defined in a .c file, not in any .h files, so you're right that the tooling needs a way to understand that changes to this symbol must not raise any alarms; and that way must not involve a manually written suppression file. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
On Tue, 3 Jun 2025 at 23:50, David E. Wheeler <david@justatheory.com> wrote:
This led me to a related question (which I had raised earlier too):
I initially assumed that the Postgres binary includes exported symbols that are internal implementation functions - like _bt_pagedel(), as mentioned in here[1] - and symbols like these should ideally be suppressed.
Is that correct?
If so, how do we reliably identify such internal but exported symbols?
There doesn't seem to be a consistent naming convention or regular expression that we can use to suppress many of them at once.
> What’s the error? Maybe we can fix it.
As per my knowledge Postgres internal code lacks visibility annotations on its symbols, which causes compilation errors when fvisibility flag is used. Adding these annotations to the codebase could not be done in a few days only.
> >> Ummm, are you saying that it complains about changes to unexported
> >> symbols also?
>
> This is a good question.
> >> symbols also?
>
> This is a good question.
No, it doesn’t complain about unexported symbols.
But it does complain about some exported symbols that, in my understanding, shouldn’t be flagged.
But it does complain about some exported symbols that, in my understanding, shouldn’t be flagged.
This led me to a related question (which I had raised earlier too):
I initially assumed that the Postgres binary includes exported symbols that are internal implementation functions - like _bt_pagedel(), as mentioned in here[1] - and symbols like these should ideally be suppressed.
Is that correct?
If so, how do we reliably identify such internal but exported symbols?
There doesn't seem to be a consistent naming convention or regular expression that we can use to suppress many of them at once.
> What’s the error? Maybe we can fix it.
As per my knowledge Postgres internal code lacks visibility annotations on its symbols, which causes compilation errors when fvisibility flag is used. Adding these annotations to the codebase could not be done in a few days only.
Regards,
Mankirat
On 2025-Jun-04, Mankirat Singh wrote: > On Tue, 3 Jun 2025 at 23:50, David E. Wheeler <david@justatheory.com> wrote: > > >> Ummm, are you saying that it complains about changes to unexported > > >> symbols also? > > > > This is a good question. > No, it doesn’t complain about unexported symbols. You mentioned ReadStream, but that's not exported. > > What’s the error? Maybe we can fix it. > > As per my knowledge Postgres internal code lacks visibility annotations on > its symbols, which causes compilation errors when fvisibility flag is used. You're being way too vague with your responses. Please copy & paste command lines used and the error messages you get. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Las mujeres son como hondas: mientras más resistencia tienen, más lejos puedes llegar con ellas" (Jonas Nightingale, Leap of Faith)
On Jun 4, 2025, at 09:43, Álvaro Herrera <alvherre@kurilemu.de> wrote: > You mentioned ReadStream, but that's not exported. I this not an export at line 67? ``` ❯ rg ReadStream src/include/storage/read_stream.h 50: * the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO 66:struct ReadStream; 67:typedef struct ReadStream ReadStream; 70:typedef struct BlockRangeReadStreamPrivate 74:} BlockRangeReadStreamPrivate; 77:typedef BlockNumber (*ReadStreamBlockNumberCB) (ReadStream *stream, 81:extern BlockNumber block_range_read_stream_cb(ReadStream *stream, 84:extern ReadStream *read_stream_begin_relation(int flags, 88: ReadStreamBlockNumberCB callback, 91:extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data); 92:extern BlockNumber read_stream_next_block(ReadStream *stream, 94:extern ReadStream *read_stream_begin_smgr_relation(int flags, 99: ReadStreamBlockNumberCB callback, 102:extern void read_stream_reset(ReadStream *stream); 103:extern void read_stream_end(ReadStream *stream); ``` Best, David
Attachment
On Wed, 4 Jun 2025 at 19:13, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On Tue, 3 Jun 2025 at 23:50, David E. Wheeler <david@justatheory.com> wrote: > > > What’s the error? Maybe we can fix it. > > > > As per my knowledge Postgres internal code lacks visibility annotations on > > its symbols, which causes compilation errors when fvisibility flag is used. > > You're being way too vague with your responses. Please copy & paste > command lines used and the error messages you get. Really sorry for that. Here's the workflow I tried to compile $ ./configure CFLAGS="-Og -g -fvisibility=hidden" --prefix=/home/mankirat/install/REL_17_4 $ make -j$(nproc) ........ /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1154: undefined reference to `PQserverVersion' /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1156: undefined reference to `appendPQExpBufferChar' /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1155: undefined reference to `appendPQExpBufferStr' /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1164: undefined reference to `appendPQExpBufferStr' /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1165: undefined reference to `appendPQExpBuffer' /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1169: undefined reference to `termPQExpBuffer' /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1170: undefined reference to `termPQExpBuffer' collect2: error: ld returned 1 exit status make[3]: *** [Makefile:51: pg_restore] Error 1 make[3]: Leaving directory '/home/mankirat/Desktop/OSS/abicc/try2/postgres/src/bin/pg_dump' make[2]: *** [Makefile:45: all-pg_dump-recurse] Error 2 make[2]: Leaving directory '/home/mankirat/Desktop/OSS/abicc/try2/postgres/src/bin' make[1]: *** [Makefile:42: all-bin-recurse] Error 2 make[1]: Leaving directory '/home/mankirat/Desktop/OSS/abicc/try2/postgres/src' make: *** [GNUmakefile:11: all-src-recurse] Error 2 $ make install ......... /usr/bin/install: cannot stat './dynloader.h': No such file or directory make[2]: *** [Makefile:50: install] Error 1 make[2]: Leaving directory '/home/mankirat/postgres/src/include' make[1]: *** [Makefile:42: install-include-recurse] Error 2 make[1]: Leaving directory '/home/mankirat/postgres/src' make: *** [GNUmakefile:11: install-src-recurse] Error 2 I get this error when I try using the -fvisibilty=hidden flag Regards, Mankirat
Hi, On 2025-06-04 11:15:10 -0400, David E. Wheeler wrote: > On Jun 4, 2025, at 09:43, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > > You mentioned ReadStream, but that's not exported. > > I this not an export at line 67? > > ``` > ❯ rg ReadStream src/include/storage/read_stream.h > > 50: * the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO > 66:struct ReadStream; > 67:typedef struct ReadStream ReadStream; No. It just makes the *name* of the struct visible. The type's definition is in the .c file and therefore not visible outside of read_stream.c. Greetings, Andres Freund
On Jun 4, 2025, at 12:10, Andres Freund <andres@anarazel.de> wrote: > No. It just makes the *name* of the struct visible. The type's definition is > in the .c file and therefore not visible outside of read_stream.c. Right, got it, thanks. David
Attachment
On 2025-Jun-04, Mankirat Singh wrote: > Here's the workflow I tried to compile > $ ./configure CFLAGS="-Og -g -fvisibility=hidden" > --prefix=/home/mankirat/install/REL_17_4 > $ make -j$(nproc) > ........ > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1154: > undefined reference to `PQserverVersion' > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1156: > undefined reference to `appendPQExpBufferChar' > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1155: > undefined reference to `appendPQExpBufferStr' > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1164: > undefined reference to `appendPQExpBufferStr' > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1165: > undefined reference to `appendPQExpBuffer' > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1169: > undefined reference to `termPQExpBuffer' > /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1170: > undefined reference to `termPQExpBuffer' Ah yeah, that doesn't work. What confused me is that we do use -fvisibility=hidden in our builds already, just not in this way; what we do is put it in the CFLAGS specifically for "modules". By adding it to the overall CFLAGS I think you're breaking things for the linker somehow, though I don't understand exactly how or why. Anyway, it doesn't look to me like adding -fvisibility=hidden to CFLAGS is a viable solution, though maybe it is possible to get the build to play nice. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Syntax error: function hell() needs an argument. Please choose what hell you want to involve.