Thread: ABI Compliance Checker GSoC Project

ABI Compliance Checker GSoC Project

From
"David E. Wheeler"
Date:
Hackers,

I’d like to introduce Mankirat Singh, a Google Summer of Code student that Pavlo Golub and I are mentoring this year.
He’sstarted work on his project, an ABI Compliance Checker. The plan is to work out the patterns, integrate it into the
BuildFarm, and get it sending regular reports by early September. He’s building on the foundation laid by Peter
Geogheganback in 2023 [1]. He’s written up his plan in his blog [2]. 

Since the work naturally gets into what’s considered a public API and what’s not, we feel that hackers is the best
placeto ask questions about bits to include and exclude, as well as other questions related to configuration settings,
etc.,so please watch for those. I hope you all will help Mankirat to quickly learn what’s what, figure out how it goes
together,and make it a reality! 

In the meantime, please give him a warm welcome!

Best,

David

PS: If one of you fine people can grant permission for Mankirat’s blog to be added to Planet Postgres, we’d super
appreciateit. 

[1]: https://postgr.es/m/CAH2-Wzm-W6hSn71sUkz0Rem=qDEU7TnFmc7_jG2DjrLFef_WKQ@mail.gmail.com
[2]: https://blog.mankiratsingh.com/


Attachment

Re: ABI Compliance Checker GSoC Project

From
Mankirat Singh
Date:
Thanks for the introduction :D

On Tue, 3 Jun 2025 at 00:36, David E. Wheeler <david@justatheory.com> wrote:
Since the work naturally gets into what’s considered a public API and what’s not, we feel that hackers is the best place to ask questions about bits to include and exclude, as well as other questions related to configuration settings, etc., so please watch for those. I hope you all will help Mankirat to quickly learn what’s what, figure out how it goes together, and make it a reality!
Following up on this, I would really like to know and understand how I can actually differentiate between what symbols the abidiff tool outputs are considered to be causing ABI instability and what are not.
For example, in minor release 17.0 to 17.1, struct ResultRelInfo was changed by adding a new data member, causing the ABI instability.

But when I tried to compare the postgres binaries of version 17.4 and 17.5, which is supposed to have no ABI instability, I got the following output:

$ abidiff ./install/with_debug/REL_17_4/bin/postgres install/with_debug/REL_17_5/bin/postgres --leaf-changes-only --no-added-syms --show-bytes
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 2 leaf types changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added function (3 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

'struct ReadStream at read_stream.c:109:1' changed:
  type size hasn't changed
  1 data member insertion:
    'int16 io_combine_limit', at offset 2 (in bytes) at read_stream.c:112:1
  there are data member changes:
    'int16 ios_in_progress' offset changed from 2 to 4 (in bytes) (by +2 bytes)
    'int16 queue_size' offset changed from 4 to 6 (in bytes) (by +2 bytes)
    'int16 max_pinned_buffers' offset changed from 6 to 8 (in bytes) (by +2 bytes)
    'int16 pinned_buffers' offset changed from 8 to 10 (in bytes) (by +2 bytes)
    'int16 distance' offset changed from 10 to 12 (in bytes) (by +2 bytes)
    'bool advice_enabled' offset changed from 12 to 14 (in bytes) (by +2 bytes)

'struct WalSndCtlData at walsender_private.h:91:1' changed:
  type size hasn't changed
  there are data member changes:
    name of 'WalSndCtlData::sync_standbys_defined' changed to 'WalSndCtlData::sync_standbys_status' at walsender_private.h:110:1

In the above report, the symbols ReadStream and WalSndCtlData have no type size changes.

And similarly, the following report compares the postgres binaries for versions 17.2 and 17.3:

$ abidiff ./install/with_debug/REL_17_2/bin/postgres ./install/with_debug/REL_17_3/bin/postgres --leaf-changes-only --no-added-syms --show-bytes
Leaf changes summary: 2 artifacts changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 1 Changed, 0 Added function (6 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

1 function with some sub-type change:

  [C] 'function void mdtruncate(SMgrRelation, ForkNumber, BlockNumber)' at md.c:1153:1 has some sub-type changes:
    parameter 4 of type 'typedef BlockNumber' was added

'struct IndexOptInfo at pathnodes.h:1104:1' changed:
  type size hasn't changed
  there are data member changes:
    type 'void (*)(...)' of 'IndexOptInfo::amcostestimate' changed:
      pointer type changed from: 'void (*)(...)' to: 'void (*)(PlannerInfo*, IndexPath*, double, Cost*, Cost*, Selectivity*, double*, double*)'

This also gives a 1 function subtype change in mdtruncate.

I don't have much idea about the PostgreSQL internal code, but I really wanted to know how we can "classify" what exactly is a false positive for this report and should not be reported by the final ABI compliance checking tool?

I checked the suppression file for libabigail tools, but it seems not to work for the PostgreSQL case, as we need to specify each and every symbol to be ignored in the file, which is kinda not possible. And these symbols to be suppressed don't follow any common Regex as well, and the suppression file doesn't support including whole files or folders.
Secondly, we can't use the -fvisibility=hidden flag while compiling either, as it results in a failed compilation with an error.

The only thing which seems to work, as per my knowledge, is to use the nm tool with the -D flag to get the dynamic symbols list, which are more important from an ABI stability POV(as those will be used by extensions?) and grep the symbols from abidiff output. If we found nothing, then ignore the warning. Although I am unsure about this and its possible side effects...?
For e.g. - $ nm -D ./install/with_debug/REL_17_4/bin/postgres | grep WalSndCtlData 

Please correct me if I'm wrong somewhere.

Regards,
Mankirat

Re: ABI Compliance Checker GSoC Project

From
Mankirat Singh
Date:
On Tue, 3 Jun 2025 at 20:49, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> I don't think it's the
> job of the tool to determine that this ABI difference is okay.
> Ultimately that's for a human to determine,

 Yes, but it would be better if we could automate that thing to some
extent, along with the development of the ABI compliance checker.

> and they must either add an
> suppression to the Postgres source code somehow, or modify or revert the
> commit so that the ABI change disappears.
That's exactly what I aim for with this project, by just automatically
run the comparing the latest commit with the previous using abidiff
and if some ABI instability found, then immediately report it to the
commiter or some mailing list so that maintainers don't have to do
that manually everytime - That's the reason for asking about false
positives and how to suppress them.


> Also a diff worth reporting, and suppressing from the report as
> appropriate.
Okay - It is a change in name, I assume that's the reason it's important.


> The only kinds of ABI changes that should be silenced by default are
>
> 1) addition of struct members that cause no change to the offsets of other
>    members in the struct (i.e. a new member that uses space that was
>    previously padding space)
>
> 2) addition of struct members at the end of structs, changing the struct
>    size, but only for structs that aren't used as array elements (the
>    problem being that the size of the "stride" when scanning the array
>    would mismatch).  Not sure how easy it is to detect this case.

These sound very similar to 17.1 minor release ABI instability.
abidiff seems to do most of the work in this case:

$ abidiff ./install/with_debug/REL_17_0/bin/postgres
./install/with_debug/REL_17_1/bin/postgres --leaf-changes-only
--no-added-syms --show-bytes
Leaf changes summary: 1 artifact changed
Changed leaf types summary: 1 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 0 Added
function (11 filtered out)
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable

'struct ResultRelInfo at execnodes.h:450:1' changed:
  type size changed from 376 to 384 (in bytes)
  1 data member insertion:
    'bool ri_needLockTagTuple', at offset 376 (in bytes) at execnodes.h:597:1


> I'm also wondering what platforms/architectures are you thinking on
> running this on.  For instance, thinking about padding space in structs,
> what is padding space in x86_64 may not be so in 32bit x86 ...

The reporting system is planned to be developed as an extension to the
postgresql build farm, so the abidiff reports across architectures
from various animals could be received, processed and reported
accordingly.


> I think in a first cut of this tooling, we should consider all ABI
> changes as potentially problematic.

Certainly, will do that only until I understand how to identify and
implement some techniques for the removal of false positives.


> Please elaborate.  Can you not write a suppression file that says
> "ignore offset changes for ios_in_progress in ReadStream", for example?

 I can do that, and that's what's causing the problem. According to
the documentation for these suppression files[1], we have to mention
the particular symbol name we need to suppress like "ReadStream" or
something particular like "ignore offset changes for ios_in_progress
in ReadStream between member1 and member 2" which is humanly very hard
to do as for sure there will be 100s of symbols in postgres like this
which needs to be ignored in that case.


> > And these symbols to be suppressed don't follow any common Regex as
> > well, and the suppression file doesn't support including whole files
> > or folders.
> Ummm, are you saying that it complains about changes to unexported
> symbols also?
>
>
> > Secondly, we can't use the -fvisibility=hidden flag while compiling
> > either, as it results in a failed compilation with an error.
>
> I didn't understand what you meant here.

This led me to another doubt, which might make things clear, that is,
any symbol which abidiff gives in output is important to report?
because my initial thought was that symbols exported in the binary
created when we build postgres also contain some symbols which need to
be suppressed because they are some postgres internal functions[2] -
is that true?
If it's not true, then things seem to be much sorted than before.

Note - The -fvisibility=hidden flag I mentioned was on a similar note
that when we compile postgres with this flag, it should generate
postgres binary by removing internal symbols, but instead it gives a
compilation error.

Regards,
Mankirat

[1] - https://sourceware.org/libabigail/manual/suppression-specifications.html#suppr-spec-label
[2] - https://www.postgresql.org/message-id/CAH2-Wzm-W6hSn71sUkz0Rem%3DqDEU7TnFmc7_jG2DjrLFef_WKQ%40mail.gmail.com



Re: ABI Compliance Checker GSoC Project

From
Álvaro Herrera
Date:
On 2025-Jun-03, Mankirat Singh wrote:

> On Tue, 3 Jun 2025 at 20:49, Álvaro Herrera <alvherre@kurilemu.de> wrote:

> > Please elaborate.  Can you not write a suppression file that says
> > "ignore offset changes for ios_in_progress in ReadStream", for example?
> 
> I can do that, and that's what's causing the problem. According to
> the documentation for these suppression files[1], we have to mention
> the particular symbol name we need to suppress like "ReadStream" or
> something particular like "ignore offset changes for ios_in_progress
> in ReadStream between member1 and member 2" which is humanly very hard
> to do as for sure there will be 100s of symbols in postgres like this
> which needs to be ignored in that case.

Well, now that I grep the source for ReadStream, I realize that the
struct is defined in a .c file, not in any .h files, so you're right
that the tooling needs a way to understand that changes to this symbol
must not raise any alarms; and that way must not involve a manually
written suppression file.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")



Re: ABI Compliance Checker GSoC Project

From
Mankirat Singh
Date:
On Tue, 3 Jun 2025 at 23:50, David E. Wheeler <david@justatheory.com> wrote:
> >> Ummm, are you saying that it complains about changes to unexported
> >> symbols also?
>
> This is a good question.
No, it doesn’t complain about unexported symbols.
But it does complain about some exported symbols that, in my understanding, shouldn’t be flagged.

This led me to a related question (which I had raised earlier too):
I initially assumed that the Postgres binary includes exported symbols that are internal implementation functions - like _bt_pagedel(), as mentioned in here[1] - and symbols like these should ideally be suppressed.
Is that correct?
If so, how do we reliably identify such internal but exported symbols?
There doesn't seem to be a consistent naming convention or regular expression that we can use to suppress many of them at once.

> What’s the error? Maybe we can fix it.
As per my knowledge Postgres internal code lacks visibility annotations on its symbols, which causes compilation errors when fvisibility flag is used. Adding these annotations to the codebase could not be done in a few days only.

Regards,
Mankirat

Re: ABI Compliance Checker GSoC Project

From
Álvaro Herrera
Date:
On 2025-Jun-04, Mankirat Singh wrote:

> On Tue, 3 Jun 2025 at 23:50, David E. Wheeler <david@justatheory.com> wrote:
> > >> Ummm, are you saying that it complains about changes to unexported
> > >> symbols also?
> >
> > This is a good question.
> No, it doesn’t complain about unexported symbols.

You mentioned ReadStream, but that's not exported.

> > What’s the error? Maybe we can fix it.
>
> As per my knowledge Postgres internal code lacks visibility annotations on
> its symbols, which causes compilation errors when fvisibility flag is used.

You're being way too vague with your responses.  Please copy & paste
command lines used and the error messages you get.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Las mujeres son como hondas:  mientras más resistencia tienen,
 más lejos puedes llegar con ellas"  (Jonas Nightingale, Leap of Faith)



Re: ABI Compliance Checker GSoC Project

From
"David E. Wheeler"
Date:
On Jun 4, 2025, at 09:43, Álvaro Herrera <alvherre@kurilemu.de> wrote:

> You mentioned ReadStream, but that's not exported.

I this not an export at line 67?

```
❯ rg ReadStream src/include/storage/read_stream.h

50: * the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
66:struct ReadStream;
67:typedef struct ReadStream ReadStream;
70:typedef struct BlockRangeReadStreamPrivate
74:} BlockRangeReadStreamPrivate;
77:typedef BlockNumber (*ReadStreamBlockNumberCB) (ReadStream *stream,
81:extern BlockNumber block_range_read_stream_cb(ReadStream *stream,
84:extern ReadStream *read_stream_begin_relation(int flags,
88:   ReadStreamBlockNumberCB callback,
91:extern Buffer read_stream_next_buffer(ReadStream *stream, void **per_buffer_data);
92:extern BlockNumber read_stream_next_block(ReadStream *stream,
94:extern ReadStream *read_stream_begin_smgr_relation(int flags,
99:   ReadStreamBlockNumberCB callback,
102:extern void read_stream_reset(ReadStream *stream);
103:extern void read_stream_end(ReadStream *stream);
```

Best,

David


Attachment

Re: ABI Compliance Checker GSoC Project

From
Mankirat Singh
Date:
On Wed, 4 Jun 2025 at 19:13, Álvaro Herrera <alvherre@kurilemu.de> wrote:
> > On Tue, 3 Jun 2025 at 23:50, David E. Wheeler <david@justatheory.com> wrote:
> > > What’s the error? Maybe we can fix it.
> >
> > As per my knowledge Postgres internal code lacks visibility annotations on
> > its symbols, which causes compilation errors when fvisibility flag is used.
>
> You're being way too vague with your responses.  Please copy & paste
> command lines used and the error messages you get.
Really sorry for that.

Here's the workflow I tried to compile
$ ./configure CFLAGS="-Og -g -fvisibility=hidden"
--prefix=/home/mankirat/install/REL_17_4
$ make -j$(nproc)
........
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1154:
undefined reference to `PQserverVersion'
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1156:
undefined reference to `appendPQExpBufferChar'
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1155:
undefined reference to `appendPQExpBufferStr'
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1164:
undefined reference to `appendPQExpBufferStr'
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1165:
undefined reference to `appendPQExpBuffer'
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1169:
undefined reference to `termPQExpBuffer'
/usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1170:
undefined reference to `termPQExpBuffer'
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile:51: pg_restore] Error 1
make[3]: Leaving directory
'/home/mankirat/Desktop/OSS/abicc/try2/postgres/src/bin/pg_dump'
make[2]: *** [Makefile:45: all-pg_dump-recurse] Error 2
make[2]: Leaving directory
'/home/mankirat/Desktop/OSS/abicc/try2/postgres/src/bin'
make[1]: *** [Makefile:42: all-bin-recurse] Error 2
make[1]: Leaving directory '/home/mankirat/Desktop/OSS/abicc/try2/postgres/src'
make: *** [GNUmakefile:11: all-src-recurse] Error 2

$ make install
.........
/usr/bin/install: cannot stat './dynloader.h': No such file or directory
make[2]: *** [Makefile:50: install] Error 1
make[2]: Leaving directory '/home/mankirat/postgres/src/include'
make[1]: *** [Makefile:42: install-include-recurse] Error 2
make[1]: Leaving directory '/home/mankirat/postgres/src'
make: *** [GNUmakefile:11: install-src-recurse] Error 2


I get this error when I try using the -fvisibilty=hidden flag

Regards,
Mankirat



Re: ABI Compliance Checker GSoC Project

From
Andres Freund
Date:
Hi,

On 2025-06-04 11:15:10 -0400, David E. Wheeler wrote:
> On Jun 4, 2025, at 09:43, Álvaro Herrera <alvherre@kurilemu.de> wrote:
> 
> > You mentioned ReadStream, but that's not exported.
> 
> I this not an export at line 67?
> 
> ```
> ❯ rg ReadStream src/include/storage/read_stream.h
> 
> 50: * the ReadStreamBlockNumberCB callback to abide by the restrictions of AIO
> 66:struct ReadStream;
> 67:typedef struct ReadStream ReadStream;

No. It just makes the *name* of the struct visible. The type's definition is
in the .c file and therefore not visible outside of read_stream.c.

Greetings,

Andres Freund



Re: ABI Compliance Checker GSoC Project

From
"David E. Wheeler"
Date:
On Jun 4, 2025, at 12:10, Andres Freund <andres@anarazel.de> wrote:

> No. It just makes the *name* of the struct visible. The type's definition is
> in the .c file and therefore not visible outside of read_stream.c.

Right, got it, thanks.

David


Attachment

Re: ABI Compliance Checker GSoC Project

From
Álvaro Herrera
Date:
On 2025-Jun-04, Mankirat Singh wrote:

> Here's the workflow I tried to compile
> $ ./configure CFLAGS="-Og -g -fvisibility=hidden"
> --prefix=/home/mankirat/install/REL_17_4
> $ make -j$(nproc)
> ........
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1154:
> undefined reference to `PQserverVersion'
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1156:
> undefined reference to `appendPQExpBufferChar'
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1155:
> undefined reference to `appendPQExpBufferStr'
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1164:
> undefined reference to `appendPQExpBufferStr'
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1165:
> undefined reference to `appendPQExpBuffer'
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1169:
> undefined reference to `termPQExpBuffer'
> /usr/bin/ld: /home/mankirat/postgres/src/fe_utils/string_utils.c:1170:
> undefined reference to `termPQExpBuffer'

Ah yeah, that doesn't work.  What confused me is that we do use
-fvisibility=hidden in our builds already, just not in this way; what we
do is put it in the CFLAGS specifically for "modules".  By adding it to
the overall CFLAGS I think you're breaking things for the linker
somehow, though I don't understand exactly how or why.  Anyway, it
doesn't look to me like adding -fvisibility=hidden to CFLAGS is a
viable solution, though maybe it is possible to get the build to play
nice.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.