Thread: abi-compliance-checker
Attached is a html report that was generated by a tool called abi-compliance-checker/abi-dumper [1][2] (by using "abi-compliance-checker -l libTest ... ") . I deliberately introduced 2 ABI compatibility issues affecting postgres, just to see what the tool had to say about it. The first ABI issue I mocked up involved a breaking change to the signature of a function with external linkage. Sure enough, this issue (in CheckForSerializableConflictIn(), as it happens) appears in the report as a medium severity item. The second ABI issue I mocked up involved "filling-in" a hole in a struct (a struct that appears in a header that could be included by an extension) with a new field. In other words, the "put new field in existing alignment padding" trick. This kind of difference is generally believed to be non-breaking, and so is acceptable in point releases. But the issue still appears as a low severity item in the report. The report points out (quite reasonably) that my newly added field won't be initialized by old code. In most cases this will be fine, of course. It's just not something that should be taken for granted. Overall, I like the report format -- especially its severity scale. So it seems like abi-compliance-checker has the potential to become a standard release management tool for Postgres point releases. I can imagine a community resource along the lines of https://coverage.postgresql.org; an automatically generated archive of theoretical/actual x86_64 ABI breaks in each point release. I'd appreciate having greater visibility into these issues. [1] https://github.com/lvc/abi-dumper [2] https://manpages.debian.org/unstable/abi-dumper/abi-dumper.1.en.html -- Peter Geoghegan
Attachment
Peter Geoghegan <pg@bowt.ie> writes: > Attached is a html report that was generated by a tool called > abi-compliance-checker/abi-dumper [1][2] (by using > "abi-compliance-checker -l libTest ... ") . I deliberately introduced > 2 ABI compatibility issues affecting postgres, just to see what the > tool had to say about it. This seems pretty cool. I agree that we're in dire need of some tool of this sort for checking back-branch patches. I wonder though if it'll have false-positive problems. Have you tried it on live rather than mocked-up cases, for instance 13.0 vs 13.11? regards, tom lane
On Sun, May 28, 2023 at 6:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > This seems pretty cool. I agree that we're in dire need of some > tool of this sort for checking back-branch patches. I wonder > though if it'll have false-positive problems. Have you tried it > on live rather than mocked-up cases, for instance 13.0 vs 13.11? I tried comparing REL_11_0 to REL_11_20. Attached is the report for that. I don't have time to study this in detail today, but the report seems to have a plausible variety of issues. I noticed that it warns about the breaking signature change to _bt_pagedel(). This is the theoretical ABI break that I mentioned in the commit message of commit b0229f26. This is arguably a false positive, since the tool doesn't understand my reasoning about why it's okay in this particular instance (namely "any extension that called that function was already severely broken"). Obviously the tool couldn't possibly be expected to know better in these kinds of situations, though, so whether or not it counts as a false positive is just semantics. Fortunately, there aren't very many issues in the report. Certainly not enough for false positives (however you define them) to be of great concern. This is nearly 5 years worth of ABI issues. -- Peter Geoghegan
Attachment
Peter Geoghegan <pg@bowt.ie> writes: > I tried comparing REL_11_0 to REL_11_20. Attached is the report for that. Nice! > I don't have time to study this in detail today, but the report seems > to have a plausible variety of issues. I noticed that it warns about > the breaking signature change to _bt_pagedel(). This is the > theoretical ABI break that I mentioned in the commit message of commit > b0229f26. This is arguably a false positive, since the tool doesn't > understand my reasoning about why it's okay in this particular > instance (namely "any extension that called that function was already > severely broken"). Obviously the tool couldn't possibly be expected to > know better in these kinds of situations, though, so whether or not it > counts as a false positive is just semantics. Agreed. The point of such a tool is to make sure that we notice any ABI breaks; it can't be expected to make engineering judgments about whether the alternatives are worse. For instance, I see that it noticed commit 1f28ec6be (Rename rbtree.c functions to use "rbt" prefix not "rb" prefix), which is not something we would have done of our own choosing, but on balance it seemed the best solution. I gather it'd catch things like NodeTag enum assignments changing, which is something we really need to have a check for. (Which reminds me that I forgot to turn on the ad-hoc check for that in gen_node_support.pl. I'll go do that, but it'd be better to have a more general-purpose solution.) regards, tom lane
I wrote: > (Which reminds me that I forgot to turn on the ad-hoc check for > that in gen_node_support.pl. I'll go do that, but it'd be better > to have a more general-purpose solution.) Oh, scratch that, it's not supposed to happen until we make the v16 branch. It'd still be better to not need it. regards, tom lane
On Sun, May 28, 2023 at 8:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I gather it'd catch things like NodeTag enum assignments changing, > which is something we really need to have a check for. Right. Any ABI break that involves machine-generated translation units seems particularly prone to being overlooked. Just eyeballing code (and perhaps double-checking struct layout using pahole) seems inadequate. I'll try to come up with a standard abi-compliance-checker Postgres workflow once I'm back from pgCon. It already looks like abi-compliance-checker is capable of taking most of the guesswork out of ABI compatibility in stable releases, without any real downside, which is encouraging. I have spent very little time on this, so it's quite possible that some detail or other was overlooked. -- Peter Geoghegan
On Sun, May 28, 2023 at 9:34 AM Peter Geoghegan <pg@bowt.ie> wrote: > I'll try to come up with a standard abi-compliance-checker Postgres > workflow once I'm back from pgCon. Ideally, we'd be able to produce reports that cover an entire stable release branch at once, including details about how things changed over time. It turns out that there is a tool from the same family of tools as abi-compliance-checker, that can do just that: https://github.com/lvc/abi-tracker There is an abi-tracker example report, here: https://abi-laboratory.pro/?view=timeline&l=glib It's exactly the same presentation as the report I posted recently, once you drill down. That seems ideal. -- Peter Geoghegan
On 27.05.23 02:52, Peter Geoghegan wrote: > Attached is a html report that was generated by a tool called > abi-compliance-checker/abi-dumper [1][2] (by using > "abi-compliance-checker -l libTest ... ") . I have been using the libabigail library/set of tools (abidiff, abidw) for this. I was not familiar with the tool you used. The nice thing about abidiff is that it gives you text output and a meaningful exit status, so you can make it part of the build or test process. You can also write suppression files to silence specific warnings. I think the way to use this would be to compute the ABI for the .0 release (or rc1 or something like that) and commit it into the tree. And then compute the current ABI and compare that against the recorded base ABI. Here is the workflow: # build REL_11_0 abidw src/backend/postgres > src/tools/postgres-abi-REL_11_0.xml # build REL_11_20 abidw src/backend/postgres > src/tools/postgres-abi.xml abidiff --no-added-syms src/tools/postgres-abi-REL_11_0.xml src/tools/postgres-abi.xml This prints Functions changes summary: 0 Removed, 0 Changed, 0 Added function Variables changes summary: 0 Removed, 0 Changed, 0 Added variable Function symbols changes summary: 14 Removed, 0 Added (85 filtered out) function symbols not referenced by debug info Variable symbols changes summary: 1 Removed, 0 Added (3 filtered out) variable symbols not referenced by debug info 14 Removed function symbols not referenced by debug info: [D] RelationHasUnloggedIndex [D] assign_nestloop_param_placeholdervar [D] assign_nestloop_param_var [D] logicalrep_typmap_gettypname [D] logicalrep_typmap_update [D] pqsignal_no_restart [D] rb_begin_iterate [D] rb_create [D] rb_delete [D] rb_find [D] rb_insert [D] rb_iterate [D] rb_leftmost [D] transformCreateSchemaStmt 1 Removed variable symbol not referenced by debug info: [D] wrconn This appears to be similar to what your tool produced, but I haven't checked it in detail.
On 30.05.23 06:32, Peter Eisentraut wrote: > I think the way to use this would be to compute the ABI for the .0 > release (or rc1 or something like that) and commit it into the tree. And > then compute the current ABI and compare that against the recorded base > ABI. > > Here is the workflow: > > # build REL_11_0 > abidw src/backend/postgres > src/tools/postgres-abi-REL_11_0.xml > # build REL_11_20 > abidw src/backend/postgres > src/tools/postgres-abi.xml > abidiff --no-added-syms src/tools/postgres-abi-REL_11_0.xml > src/tools/postgres-abi.xml Here is a demo patch that implements this. Right now, I have only added support for libpq and postgres. For completeness, the ecpg libraries should be covered as well. (Unlike the above example, I did not use src/tools/ but each component's own subdirectory.) The patch as currently written will actually fail the tests because it contains only one base ABI file to compare against, but the build_32 task on cirrus will of course produce a different ABI. But I left it for now to able to see the results. Eventually, the base ABI file names should include something from host_system.cpu_family(). Also, I commented out the abidiff test for postgres, because the base file is 8 MB and I don't want to send that around. Various findings while playing with these tools: * Different Linux distributions produce slightly different ABI reports. In some cases, symbols like 'optarg@GLIBC_2.17' leak out. * PostgreSQL compilation options affect the exposed ABI. This is perhaps expected to some degree, but there are some curious details. * For example, --enable-cassert exposes additional symbols, and it's maybe not impossible for those to leak into an extension. * Also, --with-openssl actually *removes* symbols from the ABI (such as pg_md5_init). So it's probably not sensible to try to get some universal ABI definition that works everywhere. Instead, I think it would be better to get one specific case working, which would be the one tested on the cirrus linux tasks and/or some equivalent buildfarm machine.
Attachment
+abidiff = find_program('abidiff', native: false, required: false) +abidw = find_program('abidw', native: false, required: false) + +abidw_flags = [ + '--drop-undefined-syms', + '--no-architecture', + '--no-comp-dir-path', + '--no-elf-needed', + '--no-show-locs', + '--type-id-style', 'hash', +] +abidw_cmd = [abidw, abidw_flags, '--out-file', '@OUTPUT@', '@INPUT@'] It would make sense to me to mark abidiff and abidw as disabler: true. +if abidw.found() + libpq_abi = custom_target('libpq.abi.xml', + input: libpq_so, + output: 'libpq.abi.xml', + command: abidw_cmd, + build_by_default: true) +endif + +if abidiff.found() + test('libpq.abidiff', + abidiff, + args: [files('libpq.base.abi.xml'), libpq_abi], + suite: 'abidiff', + verbose: true) +endif With disabler: true, you can drop the conditionals. Disablers tell Meson to disable parts of the build[0]. I also don't think it makes sense to mark the custom_targets as build_by_default: true, unless you see value in that. I would just have them built when the test is ran. However, it might make sense to create an alias_target of all the ABI XML files for people that want to interact with the files outside of the tests for whatever reason. [0]: https://mesonbuild.com/Reference-manual_returned_disabler.html -- Tristan Partin Neon (https://neon.tech)
On 06.06.23 18:52, Tristan Partin wrote: > It would make sense to me to mark abidiff and abidw as disabler: true. ok > +if abidiff.found() > + test('libpq.abidiff', > + abidiff, > + args: [files('libpq.base.abi.xml'), libpq_abi], > + suite: 'abidiff', > + verbose: true) > +endif > > With disabler: true, you can drop the conditionals. Disablers tell Meson > to disable parts of the build[0]. ok > I also don't think it makes sense to mark the custom_targets as > build_by_default: true, unless you see value in that. I would just have > them built when the test is ran. > > However, it might make sense to create an alias_target of all the ABI > XML files for people that want to interact with the files outside of the > tests for whatever reason. Thanks for the feedback. Attached is a more complete patch. I have rearranged this a bit. There are now two build options, called abidw and abidiff. The abidw option produces the xml file, that you would then at appropriate times commit into the tree as the base. The abidiff option enables the abidiff tests. This doesn't actually require abidw, since abidiff can compare the binary directly against the recorded XML file. So these options are distinct and nonoverlapping. Note that in this setup, you still need a conditional around the abidiff test() call, because otherwise meson setup will fail if the base file doesn't exist (yet), so it would be impossible to bootstrap this system. The updated patch also includes the base files for all the ecpg libraries and the files all have OS and architecture specific names. The keep the patch small, I just added a dummy base file for the postgres binary and a suppression file that suppresses everything. There is something weird going on where the cirrus linux/meson job doesn't upload the produced abidw artifacts, even though they are apparently built, and the equivalent works for the freebsd job. Maybe someone can see something that I'm not seeing there.
Attachment
Hi, On 2023-06-06 18:30:38 +0200, Peter Eisentraut wrote: > On 30.05.23 06:32, Peter Eisentraut wrote: > > I think the way to use this would be to compute the ABI for the .0 > > release (or rc1 or something like that) and commit it into the tree. And > > then compute the current ABI and compare that against the recorded base > > ABI. > > > > Here is the workflow: > > > > # build REL_11_0 > > abidw src/backend/postgres > src/tools/postgres-abi-REL_11_0.xml > > # build REL_11_20 > > abidw src/backend/postgres > src/tools/postgres-abi.xml > > abidiff --no-added-syms src/tools/postgres-abi-REL_11_0.xml > > src/tools/postgres-abi.xml > > Here is a demo patch that implements this. > > Right now, I have only added support for libpq and postgres. For > completeness, the ecpg libraries should be covered as well. I think plpgsql would also be good to include, due to things like plpgsql debuggers. > * Different Linux distributions produce slightly different ABI reports. In > some cases, symbols like 'optarg@GLIBC_2.17' leak out. Hm, that's somewhat annoying. > * PostgreSQL compilation options affect the exposed ABI. This is perhaps > expected to some degree, but there are some curious details. > > * For example, --enable-cassert exposes additional symbols, and it's maybe > not impossible for those to leak into an extension. They *definitely* leak into extensions. A single Assert() in an extension or use of an inline function or macro with an Assertion suffices to end up with a reference to ExceptionalCondition. > diff --git a/src/interfaces/libpq/libpq.base.abi.xml b/src/interfaces/libpq/libpq.base.abi.xml > new file mode 100644 > index 0000000000..691bf192af > --- /dev/null > +++ b/src/interfaces/libpq/libpq.base.abi.xml > @@ -0,0 +1,2634 @@ > +<abi-corpus path='src/interfaces/libpq/libpq.so.5.16' soname='libpq.so.5'> > + <elf-function-symbols> > + <elf-symbol name='PQbackendPID' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> > + <elf-symbol name='PQbinaryTuples' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> > + <elf-symbol name='PQcancel' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> > [...] > + <elf-symbol name='termPQExpBuffer' type='func-type' binding='global-binding' visibility='default-visibility' is-defined='yes'/> > + </elf-function-symbols> This seems somewhat painful in its verbosity. We also effectively already have it in the tree, in src/interfaces/libpq/exports.txt. But I guess that's somewhat inevitable :/ It sounds we are planning to mostly rely on CI for this, perhaps we should rely on an artifact from a prior build for a major version + specific task, instead of committing this to source? That'd automatically take care of differences in ABI across different platforms etc. If we want to commit something to the tree, I think we need a fairly complicated "fingerprint" to avoid false positives. OS, OS version, configure options, compiler, compiler version at least? > + <abi-instr version='1.0' address-size='64' path='../src/common/encnames.c' language='LANG_C99'> > + <array-type-def dimensions='1' type-id='c8dedbef' size-in-bits='5376' id='752c85d9'> > + <subrange length='42' type-id='7359adad' id='cb7c937f'/> > + </array-type-def> > + <array-type-def dimensions='1' type-id='c8dedbef' size-in-bits='infinite' id='ac835593'> > + <subrange length='infinite' id='031f2035'/> > + </array-type-def> > + <array-type-def dimensions='1' type-id='56ef96d7' size-in-bits='5376' id='728d2ee1'> > + <subrange length='42' type-id='7359adad' id='cb7c937f'/> > + </array-type-def> > + <array-type-def dimensions='1' type-id='56ef96d7' size-in-bits='infinite' id='a01b33bb'> > + <subrange length='infinite' id='031f2035'/> > + </array-type-def> > + <typedef-decl name='pg_enc2name' type-id='79f06fd8' id='7a4268c7'/> > + <class-decl name='pg_enc2name' size-in-bits='128' is-struct='yes' visibility='default' id='79f06fd8'> > + <data-member access='public' layout-offset-in-bits='0'> > + <var-decl name='name' type-id='80f4b756' visibility='default'/> > + </data-member> > + <data-member access='public' layout-offset-in-bits='64'> > + <var-decl name='encoding' type-id='66325df6' visibility='default'/> > + </data-member> > + </class-decl> > + <typedef-decl name='pg_enc' type-id='ea65169a' id='66325df6'/> > + <enum-decl name='pg_enc' id='ea65169a'> > + <underlying-type type-id='9cac1fee'/> Hm - why is all of this stuff even ending up in the external ABI? It should all be internal, unless I am missing something? I might be looking the wrong way, but to me it sure looks like none of that ends up being externally visible? Greetings, Andres Freund
Hi, On 2023-06-10 12:48:46 -0700, Andres Freund wrote: > > + <typedef-decl name='pg_enc' type-id='ea65169a' id='66325df6'/> > > + <enum-decl name='pg_enc' id='ea65169a'> > > + <underlying-type type-id='9cac1fee'/> > > Hm - why is all of this stuff even ending up in the external ABI? It should > all be internal, unless I am missing something? > > I might be looking the wrong way, but to me it sure looks like none of that > ends up being externally visible? Looks like we ought to add --exported-interfaces-only? That still seems to include things that shouldn't be there, but much less. E.g.: <class-decl name='AddrInfo' size-in-bits='1152' is-struct='yes' naming-typedef-id='79c324ab' visibility='default' id='0b3a01e2'> <data-member access='public' layout-offset-in-bits='0'> <var-decl name='family' type-id='95e97e5e' visibility='default'/> </data-member> <data-member access='public' layout-offset-in-bits='64'> <var-decl name='addr' type-id='8c37a12f' visibility='default'/> </data-member> </class-decl> and things outside of our control: <class-decl name='_IO_FILE' size-in-bits='1728' is-struct='yes' visibility='default' id='ec1ed955'> <data-member access='public' layout-offset-in-bits='0'> <var-decl name='_flags' type-id='95e97e5e' visibility='default'/> </data-member> I guess the latter would have to be suppressed via suppression file. But I don't understand why things like AddrInfo ends up being included... I tried using --header-file with --drop-private-types. But that ends up dropping all enum definitions for some reason. Independently, I'm a bit confused as to why we export pgresStatus in exports.txt - I don't see any reason for that. Looks like it might be leftover from before fa0f24165c0? We're also a bit schizophrenic about where we install pqexpbuffer.h - includedir_internal. But at the same time we export all the symbols? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Independently, I'm a bit confused as to why we export pgresStatus in > exports.txt - I don't see any reason for that. Looks like it might be leftover > from before fa0f24165c0? It looks like before fa0f24165, the *only* way to convert ExecStatusType to text was to access that array directly. That commit invented the wrapper function PQresStatus(), but at that point our docs were so poor that there wasn't any good way to mark use of the array as deprecated. A bit later, 9ceb5d8a7 moved the array declaration to libpq-int.h (without any discussion in the commit message, but maybe there was some on-list). Maybe there's still application code out there using it, I dunno. What I do know is that removing the exports.txt entry will provoke squawks from distros' ABI checkers. regards, tom lane
On Sat Jun 10, 2023 at 9:17 AM CDT, Peter Eisentraut wrote: > I have rearranged this a bit. There are now two build options, called > abidw and abidiff. The abidw option produces the xml file, that you > would then at appropriate times commit into the tree as the base. The > abidiff option enables the abidiff tests. This doesn't actually require > abidw, since abidiff can compare the binary directly against the > recorded XML file. So these options are distinct and nonoverlapping. > > Note that in this setup, you still need a conditional around the abidiff > test() call, because otherwise meson setup will fail if the base file > doesn't exist (yet), so it would be impossible to bootstrap this system. Could you speak more to the workflow you see with managing the checked in diff files? At my previous job, I had tried to do something similar with regard to making sure we didn't break ABI[0], but I took a different approach where instead of hooking into the Meson test infrastructure, I used a CI job where I checked out the previous major version of the code and the current version of the code, built both, and checked the built binaries. The benefit of this workflow is that you don't check anything into the source repo. I think the same approach might be better here, but instead of writing it all into the CI file like I did, use a perl script. Then once you have the perl script, it could be possible to then hook into the Meson test infrastructure. > There is something weird going on where the cirrus linux/meson job > doesn't upload the produced abidw artifacts, even though they are > apparently built, and the equivalent works for the freebsd job. Maybe > someone can see something that I'm not seeing there. Nothing obvious is wrong to me. Was the failure maybe just a fluke? [0]: https://github.com/hse-project/hse/blob/master/.github/workflows/abicheck.yaml -- Tristan Partin Neon (https://neon.tech)
On Mon Jun 12, 2023 at 10:10 AM CDT, Tristan Partin wrote: > On Sat Jun 10, 2023 at 9:17 AM CDT, Peter Eisentraut wrote: > > I have rearranged this a bit. There are now two build options, called > > abidw and abidiff. The abidw option produces the xml file, that you > > would then at appropriate times commit into the tree as the base. The > > abidiff option enables the abidiff tests. This doesn't actually require > > abidw, since abidiff can compare the binary directly against the > > recorded XML file. So these options are distinct and nonoverlapping. > > > > Note that in this setup, you still need a conditional around the abidiff > > test() call, because otherwise meson setup will fail if the base file > > doesn't exist (yet), so it would be impossible to bootstrap this system. > > Could you speak more to the workflow you see with managing the checked > in diff files? Just saw your other email which talks about the workflow. -- Tristan Partin Neon (https://neon.tech)
On 10.06.23 22:24, Andres Freund wrote: > Looks like we ought to add --exported-interfaces-only? Btw., this option requires libabigail 2.1, which isn't available everywhere yet. For example, Debian oldstable (used on Cirrus) doesn't have it yet. So I'll leave this patch set as is for now. If it turns out that this is the right option and we want to proceed with this patch set, we might need to think about a version check or something.
Here is an updated version of this patch. It doesn't have any new functionality, just a rebase and some minor adjustments. I have split up the one patch into several ones, which could be considered incrementally, namely: v3-0001-abidw-option.patch This adds the abidw meson option, which produces the xml files with the ABI description. With that, you can then implement a variety of workflows, such as the abidiff proposed in the later patches, or something rigged up via CI, or you can just build various versions locally and compare them. With this patch, you get the files to compare built automatically and don't have to remember to cover all the libraries or which options to use. I think this patch is mostly pretty straightforward and agreeable, subject to technical review in detail. TODO: documentation TODO: Do we want a configure/make variant of this? v3-0002-Enable-abidw-option-on-Cirrus-CI.patch This adds the abidw option to some CI tasks. This was mostly used by me during development to get feedback from other machines and to produce base files for the subsequent abidiff patch. I'm not sure whether we need it in isolation (other than for general testing that the option works at all). v3-0003-abidiff-option.patch This adds the abidiff test suite that compares base files previously produced with the abidw option to the currently built libraries. There is clearly some uncertainty here whether the produced files are stable enough, whether we want this particular workflow, what additional burdens this would create, etc., so I'm not hung up on this right now, it's mostly a demonstration. v3-0004-abidiff-support-files.patch This contains the support files for patch 0003, just split out because they are bulky and boring. On 10.06.23 16:17, Peter Eisentraut wrote: > On 06.06.23 18:52, Tristan Partin wrote: >> It would make sense to me to mark abidiff and abidw as disabler: true. > > ok > >> +if abidiff.found() >> + test('libpq.abidiff', >> + abidiff, >> + args: [files('libpq.base.abi.xml'), libpq_abi], >> + suite: 'abidiff', >> + verbose: true) >> +endif >> >> With disabler: true, you can drop the conditionals. Disablers tell Meson >> to disable parts of the build[0]. > > ok > >> I also don't think it makes sense to mark the custom_targets as >> build_by_default: true, unless you see value in that. I would just have >> them built when the test is ran. >> >> However, it might make sense to create an alias_target of all the ABI >> XML files for people that want to interact with the files outside of the >> tests for whatever reason. > > Thanks for the feedback. Attached is a more complete patch. > > I have rearranged this a bit. There are now two build options, called > abidw and abidiff. The abidw option produces the xml file, that you > would then at appropriate times commit into the tree as the base. The > abidiff option enables the abidiff tests. This doesn't actually require > abidw, since abidiff can compare the binary directly against the > recorded XML file. So these options are distinct and nonoverlapping. > > Note that in this setup, you still need a conditional around the abidiff > test() call, because otherwise meson setup will fail if the base file > doesn't exist (yet), so it would be impossible to bootstrap this system. > > The updated patch also includes the base files for all the ecpg > libraries and the files all have OS and architecture specific names. The > keep the patch small, I just added a dummy base file for the postgres > binary and a suppression file that suppresses everything. > > There is something weird going on where the cirrus linux/meson job > doesn't upload the produced abidw artifacts, even though they are > apparently built, and the equivalent works for the freebsd job. Maybe > someone can see something that I'm not seeing there.
Attachment
On Wed, 1 Nov 2023 at 16:43, Peter Eisentraut <peter@eisentraut.org> wrote: > > Here is an updated version of this patch. It doesn't have any new > functionality, just a rebase and some minor adjustments. > > I have split up the one patch into several ones, which could be > considered incrementally, namely: > > v3-0001-abidw-option.patch > > This adds the abidw meson option, which produces the xml files with the > ABI description. With that, you can then implement a variety of > workflows, such as the abidiff proposed in the later patches, or > something rigged up via CI, or you can just build various versions > locally and compare them. With this patch, you get the files to compare > built automatically and don't have to remember to cover all the > libraries or which options to use. > > I think this patch is mostly pretty straightforward and agreeable, > subject to technical review in detail. > > TODO: documentation > TODO: Do we want a configure/make variant of this? > > v3-0002-Enable-abidw-option-on-Cirrus-CI.patch > > This adds the abidw option to some CI tasks. This was mostly used by me > during development to get feedback from other machines and to produce > base files for the subsequent abidiff patch. I'm not sure whether we > need it in isolation (other than for general testing that the option > works at all). > > v3-0003-abidiff-option.patch > > This adds the abidiff test suite that compares base files previously > produced with the abidw option to the currently built libraries. There > is clearly some uncertainty here whether the produced files are stable > enough, whether we want this particular workflow, what additional > burdens this would create, etc., so I'm not hung up on this right now, > it's mostly a demonstration. > > v3-0004-abidiff-support-files.patch > > This contains the support files for patch 0003, just split out because > they are bulky and boring. One of the test has failed in cfbot at [1] with: abi-compliance-checker [12:04:10.537] The output from the failed tests: [12:04:10.537] [12:04:10.537] 129/282 postgresql:abidiff / plpgsql.abidiff FAIL 1.25s (exit status 4) [12:04:10.537] [12:04:10.537] --- command --- [12:04:10.537] 12:03:00 /usr/bin/abidiff /tmp/cirrus-ci-build/build/../src/pl/plpgsql/src/plpgsql.x86_64-linux.abi.xml src/pl/plpgsql/src/plpgsql.so [12:04:10.537] --- Listing only the last 100 lines from a long log. --- [12:04:10.537] 'NodeTag::T_RoleSpec' from value '66' to '67' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_FuncCall' from value '67' to '68' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_A_Star' from value '68' to '69' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_A_Indices' from value '69' to '70' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_A_Indirection' from value '70' to '71' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_A_ArrayExpr' from value '71' to '72' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_ResTarget' from value '72' to '73' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_MultiAssignRef' from value '73' to '74' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_SortBy' from value '74' to '75' at nodes.h:26:1 [12:04:10.537] 'NodeTag::T_WindowDef' from value '75' to '76' at nodes.h:26:1 .... [12:04:10.592] ------- [12:04:10.592] [12:04:10.592] [12:04:10.592] Summary of Failures: [12:04:10.592] [12:04:10.592] 129/282 postgresql:abidiff / plpgsql.abidiff FAIL 1.25s (exit status 4) [1] - https://cirrus-ci.com/task/5961614579466240 Regards, Vignesh
On 06.01.24 18:25, vignesh C wrote: > One of the test has failed in cfbot at [1] with: > abi-compliance-checker > [12:04:10.537] The output from the failed tests: > [12:04:10.537] > [12:04:10.537] 129/282 postgresql:abidiff / plpgsql.abidiff FAIL 1.25s > (exit status 4) > [12:04:10.537] > [12:04:10.537] --- command --- > [12:04:10.537] 12:03:00 /usr/bin/abidiff > /tmp/cirrus-ci-build/build/../src/pl/plpgsql/src/plpgsql.x86_64-linux.abi.xml > src/pl/plpgsql/src/plpgsql.so > [12:04:10.537] --- Listing only the last 100 lines from a long log. --- > [12:04:10.537] 'NodeTag::T_RoleSpec' from value '66' to '67' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_FuncCall' from value '67' to '68' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_A_Star' from value '68' to '69' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_A_Indices' from value '69' to '70' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_A_Indirection' from value '70' to '71' at > nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_A_ArrayExpr' from value '71' to '72' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_ResTarget' from value '72' to '73' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_MultiAssignRef' from value '73' to '74' at > nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_SortBy' from value '74' to '75' at nodes.h:26:1 > [12:04:10.537] 'NodeTag::T_WindowDef' from value '75' to '76' at nodes.h:26:1 > .... > [12:04:10.592] ------- > [12:04:10.592] > [12:04:10.592] > [12:04:10.592] Summary of Failures: > [12:04:10.592] > [12:04:10.592] 129/282 postgresql:abidiff / plpgsql.abidiff FAIL 1.25s > (exit status 4) > > [1] - https://cirrus-ci.com/task/5961614579466240 This is kind of intentional, as it shows the the test catches ABI changes. If the patches were to be committed, then the base ABI file would be updated.
On 2023-Nov-01, Peter Eisentraut wrote: > v3-0001-abidw-option.patch > > This adds the abidw meson option, which produces the xml files with the ABI > description. With that, you can then implement a variety of workflows, such > as the abidiff proposed in the later patches, or something rigged up via CI, > or you can just build various versions locally and compare them. With this > patch, you get the files to compare built automatically and don't have to > remember to cover all the libraries or which options to use. > > I think this patch is mostly pretty straightforward and agreeable, subject > to technical review in detail. I like this idea and I think we should integrate it with the objective of it becoming the workhorse of ABI-stability testing. However, I do not think that the subsequent patches should be part of the tree at all; certainly not the produced .xml files in your 0004, as that would be far too unstable and would cause a lot of pointless churn. > TODO: documentation Yes, please. > TODO: Do we want a configure/make variant of this? Not needed IMO. The way I see this working, is that we set up a buildfarm animal [per architecture] that runs the new rules produced by the abidw option and stores the result locally, so that for stable branches it can turn red when an ABI-breaking change with the previous minor release of the same branch is introduced. There's no point on it ever turning red in the master branch, since we're obviously not concerned with ABI changes there. (Perhaps we do need 0003 as an easy mechanism to run the comparison, but I'm not sure to what extent that would be actually helpful.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Tue, Feb 27, 2024 at 8:25 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > The way I see this working, is that we set up a buildfarm animal [per > architecture] that runs the new rules produced by the abidw option and > stores the result locally, so that for stable branches it can turn red > when an ABI-breaking change with the previous minor release of the same > branch is introduced. There's no point on it ever turning red in the > master branch, since we're obviously not concerned with ABI changes there. ABI stability doesn't seem like something that you can alert on. There are quite a few individual cases where the ABI was technically broken, in a way that these tools will complain about. And yet it was generally understood that these changes did not really break ABI stability, for various high-context reasons that no tool can possibly be expected to understand. This will at least be true under our existing practices, or anything like them. For example, if you look at the "Problems with Symbols, High Severity" from the report I posted comparing REL_11_0 to REL_11_20, you'll see that I removed _bt_pagedel() when backpatching a fix. That was justified by the fact that any extension that was calling that function was already hopelessly broken (I pointed this out at the time). Having some tooling in this area would be extremely useful. The absolute number of false positives seems quite manageable, but the fact is that most individual complaints that the tooling makes are false positives. At least in some deeper sense. -- Peter Geoghegan
On 2024-Feb-27, Peter Geoghegan wrote: > On Tue, Feb 27, 2024 at 8:25 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > The way I see this working, is that we set up a buildfarm animal [per > > architecture] that runs the new rules produced by the abidw option and > > stores the result locally, so that for stable branches it can turn red > > when an ABI-breaking change with the previous minor release of the same > > branch is introduced. There's no point on it ever turning red in the > > master branch, since we're obviously not concerned with ABI changes there. > > ABI stability doesn't seem like something that you can alert on. Eh, I disagree. Since you can add suppression rules to the tree, I'd say it definitely is. If you commit something and it breaks ABI, we want to know as soon as possible -- for example suppose the ABI break occurs during a security patch at release time; if we get an alert about it immediately, we still have time to fix it before the mess is released. Now, if you have an intentional ABI break, then you can let the testing system know about it so that it won't complain. We could for example have src/abi/suppressions/REL_11_8.ini and src/abi/suppressions/REL_12_3.ini files (in the respective branches) with the _bt_pagedel() change. You can add this file together with the commit that introduces the change, if you know about it ahead of time, or as a separate commit after the buildfarm animal turns red. Or you can fix your ABI break, if -- as is most likely -- it was unintentional. Again -- this only matters for stable branches. We don't need to do anything for the master branch, as it would be far too noisy if we did that. Now, maybe a buildfarm animal is not the right tool, and instead we need a separate system that tests for it and emails pg-hackers when it breaks or whatever. That's fine with me, but it seems a pretty minor implementation detail. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Tue, Feb 27, 2024 at 9:03 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Now, maybe a buildfarm animal is not the right tool, and instead we need > a separate system that tests for it and emails pg-hackers when it breaks > or whatever. That's fine with me, but it seems a pretty minor > implementation detail. Anything that alerts on breakage is pretty much equivalent to having a buildfarm animal. I have a feeling that there are going to be real problems with alerting, at least if it's introduced right away. I'd feel much better about it if there was an existing body of suppressions, that more or less worked as a reference of agreed upon best practices. Can we do that part first, rather than starting out with a blanket assumption that everything that happened before now must have been perfect? -- Peter Geoghegan
On 2024-Feb-27, Peter Geoghegan wrote: > I have a feeling that there are going to be real problems with > alerting, at least if it's introduced right away. I'd feel much better > about it if there was an existing body of suppressions, that more or > less worked as a reference of agreed upon best practices. Can we do > that part first, rather than starting out with a blanket assumption > that everything that happened before now must have been perfect? Well, I was describing a possible plan, not saying that we have to assume we've been perfect all along. I think the first step should be to add the tooling now (Meson rules as in Peter E's 0001 patch upthread, or something equivalent), then figure out what suppressions we need in the supported back branches. This would let us build the corpus of best practices you want, I think. Once we have clean runs with those, we can add BF animals or whatever. The alerts don't have to be the first step. In fact, we can wait even longer for the alerts. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 27.02.24 14:25, Alvaro Herrera wrote: > I like this idea and I think we should integrate it with the objective > of it becoming the workhorse of ABI-stability testing. However, I do > not think that the subsequent patches should be part of the tree at all; > certainly not the produced .xml files in your 0004, as that would be far > too unstable and would cause a lot of pointless churn. Looking at this again, if we don't want the .xml files in the tree, then we don't really need this patch series. Most of the delicate work in the 0001 patch was to find the right abidw options combinations to reduce the variability in the .xml output files (--no-show-locs is an obvious example). If we don't want to record the .xml files in the tree, then we don't need all these complications. For example, if we want to check the postgres backend ABI across minor versions, we could just compile it multiple times and compare the binaries directly: git checkout REL_16_0 meson setup build-0 meson compile -C build-0 git checkout REL_16_STABLE meson setup build-1 meson compile -C build-1 abidiff --no-added-syms build-0/src/backend/postgres build-1/src/backend/postgres > The way I see this working, is that we set up a buildfarm animal [per > architecture] that runs the new rules produced by the abidw option and > stores the result locally, so that for stable branches it can turn red > when an ABI-breaking change with the previous minor release of the same > branch is introduced. There's no point on it ever turning red in the > master branch, since we're obviously not concerned with ABI changes there. Maybe the way forward here is to write a buildfarm module for this, and then see from there what further needs there are.
On Mon, Mar 4, 2024 at 7:50 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Looking at this again, if we don't want the .xml files in the tree, then > we don't really need this patch series. Based on this, I've updated the status of this patch in the CommitFest application to Withdrawn. If that's not correct, please feel free to adjust. -- Robert Haas EDB: http://www.enterprisedb.com