Thread: Build failure with GCC 15 (defaults to -std=gnu23)
Hi, postgres-17.1 fails to build with upcoming GCC 15 which defaults to -std=gnu23 as follows: ``` In file included from ../../src/include/postgres_fe.h:25, from checksum_helper.c:17: ../../src/include/c.h:456:23: error: two or more data types in declaration specifiers 456 | typedef unsigned char bool; | ^~~~ ../../src/include/c.h:456:1: warning: useless type name in empty declaration 456 | typedef unsigned char bool; | ^~~~~~~ ``` src/include/c.h does attempt to check for whether bool is already defined but the check doesn't work. There may be more issues afterwards but I haven't tried papering over the above issue. It should be possible to reproduce w/ CFLAGS="... -std=gnu23" or CFLAGS="... -std=c23" on older GCC or Clang. thanks, sam
Sam James <sam@gentoo.org> writes: > postgres-17.1 fails to build with upcoming GCC 15 which defaults to > -std=gnu23 as follows: I do not think we claim to support C23 yet. Having said that, I can reproduce this on gcc 14 using -std=gnu23. It appears that configure is deciding that <stdbool.h> is not conformant to C99 because it doesn't declare "bool" as a macro. Did C23 really remove that !? regards, tom lane
On Mon, Nov 18, 2024 at 7:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sam James <sam@gentoo.org> writes: > > postgres-17.1 fails to build with upcoming GCC 15 which defaults to > > -std=gnu23 as follows: > > I do not think we claim to support C23 yet. > > Having said that, I can reproduce this on gcc 14 using -std=gnu23. > It appears that configure is deciding that <stdbool.h> is not > conformant to C99 because it doesn't declare "bool" as a macro. > Did C23 really remove that !? Yes, seems to be a general pattern: features introduced as keyword _Xxx with a library macro xxx -> _Xxx (usually where xxx is already a keyword in C++ but the C committee was afraid to unleash a new keyword directly on the world, I guess?), and now xxx is finally graduating to real keyword status. Other examples: static_assert, thread_local, alignas.
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Nov 18, 2024 at 7:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Having said that, I can reproduce this on gcc 14 using -std=gnu23. >> It appears that configure is deciding that <stdbool.h> is not >> conformant to C99 because it doesn't declare "bool" as a macro. >> Did C23 really remove that !? > Yes, seems to be a general pattern: features introduced as keyword > _Xxx with a library macro xxx -> _Xxx (usually where xxx is already a > keyword in C++ but the C committee was afraid to unleash a new keyword > directly on the world, I guess?), and now xxx is finally graduating to > real keyword status. Other examples: static_assert, thread_local, > alignas. Fun. Well, now that we insist on C99 support in all branches, I wonder whether we can just remove all the non-stdbool support. The one thing that looks tricky is that we insist on sizeof(bool) being 1, but are there any remaining supported platforms where it isn't? The buildfarm doesn't have any examples. regards, tom lane
On Mon, Nov 18, 2024 at 9:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fun. Well, now that we insist on C99 support in all branches, > I wonder whether we can just remove all the non-stdbool support. > The one thing that looks tricky is that we insist on sizeof(bool) > being 1, but are there any remaining supported platforms where > it isn't? The buildfarm doesn't have any examples. So far I have found only Apple/Darwin PPC (RIP), where this was occasionally an issue. Some projects would apparently compile with -mone-byte-bool to unbreak local assumptions, but risk breaking ABI with other libraries (as we do?). GCC filed that switch under Darwin options rather than somewhere more general... can we call that a clue that it was highly unusual? https://gcc.gnu.org/onlinedocs/gcc/Darwin-Options.html There may be a systematic way to survey ABIs from the LLVM or GCC source trees... hmm, I am no expert so take this with a grain of salt but I found where the LLVM project defines BoolWidth and BoolAlign, starting from the commit where they removed Darwin PPC support (2fe49ea0), and it looks like it was the only target ABI that overrode the default of 8 there (it had 32, meaning bits). BTW animal "alligator" has just shown the failure. Ah, yes, due to this GCC switch being flipped a couple of days ago: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Nov 18, 2024 at 9:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Fun. Well, now that we insist on C99 support in all branches, >> I wonder whether we can just remove all the non-stdbool support. >> The one thing that looks tricky is that we insist on sizeof(bool) >> being 1, but are there any remaining supported platforms where >> it isn't? The buildfarm doesn't have any examples. > So far I have found only Apple/Darwin PPC (RIP), where this was > occasionally an issue. Yeah. Well, what say we leave the "typedef unsigned char bool" pathway in place, but set things up to use that only if sizeof the stdbool type isn't 1 --- and then it's up to any hypothetical users of that pathway to choose a compiler and compiler options that won't choke on it. regards, tom lane
On Mon, Nov 18, 2024 at 10:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah. Well, what say we leave the "typedef unsigned char bool" > pathway in place, but set things up to use that only if sizeof > the stdbool type isn't 1 --- and then it's up to any hypothetical > users of that pathway to choose a compiler and compiler options > that won't choke on it. It sounds like we should stop using the old and broken AC_CHECK_HEADER_STDBOOL macro. I think it was doing two jobs in old times: there were some systems that shipped a defective/pre-standard stdbool.h, and some systems without it. I think both classes of system are gone from the universe. Later autoconf versions check for C99 "or later", but we're stuck with the old one and I doubt we are going to upgrade it. Found in their NEWS: *** AC_HEADER_STDBOOL, AC_CHECK_HEADER_STDBOOL are obsolescent and less picky. These macros are now obsolescent, as programs can simply include stdbool.h unconditionally. If you use these macros, they now accept a stdbool.h that exists but does nothing, so long as ‘bool’, ‘true’, and ‘false’ work anyway. This is for compatibility with C 2023 and with C++.
Thomas Munro <thomas.munro@gmail.com> writes: > It sounds like we should stop using the old and broken > AC_CHECK_HEADER_STDBOOL macro. Yeah, that's what I was imagining: assume that <stdbool.h> exists and works, and check only to see if sizeof(bool) is acceptable. > ... Later autoconf versions check for > C99 "or later", but we're stuck with the old one and I doubt we are > going to upgrade it. I'm not sure --- there was some discussion a week or two ago about upgrading autoconf after all. But whether we do or not, it's hard to see what AC_HEADER_STDBOOL is buying us. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > I found a few lines we could just delete in master. I wonder if we > should also just require sizeof(bool) == 1 more explicitly going > forward with an error, since we don't have coverage or any expectation > of ever getting any for the alternative code AFAICS, even if it is > small. Yeah, that's a fair criticism. I don't think we've tested that code path since I retired prairiedog, so who's to say that it works even now? Maybe it's best to just delete that code, and if we ever find a new platform with wider bool, figure out what to do at that time. regards, tom lane
On 18.11.24 02:30, Thomas Munro wrote: > On Mon, Nov 18, 2024 at 11:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> It sounds like we should stop using the old and broken >>> AC_CHECK_HEADER_STDBOOL macro. >> >> Yeah, that's what I was imagining: assume that <stdbool.h> exists >> and works, and check only to see if sizeof(bool) is acceptable. > > I think this is the minimal change, which I'd push back to 13 post-freeze. Note that if we backpatch C23 support, we also need to backpatch at least commits a67a49648d9 and d2b4b4c2259.
Peter Eisentraut <peter@eisentraut.org> writes: > Note that if we backpatch C23 support, we also need to backpatch at > least commits a67a49648d9 and d2b4b4c2259. Yeah. Our normal theory for this kind of thing is "people are likely to build our old branches with modern toolchains", so we are going to have to back-patch C23 compatibility sooner or later. In fact, we'll have to back-patch to 9.2, or else decide that those branches are unbuildable on modern platforms and hence out of scope for compatibility testing. We have a little bit of grace time before this needs to happen, but perhaps not very much. regards, tom lane
On 20.11.24 16:32, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: >> Note that if we backpatch C23 support, we also need to backpatch at >> least commits a67a49648d9 and d2b4b4c2259. > > Yeah. Our normal theory for this kind of thing is "people are > likely to build our old branches with modern toolchains", so > we are going to have to back-patch C23 compatibility sooner or > later. In fact, we'll have to back-patch to 9.2, or else > decide that those branches are unbuildable on modern platforms > and hence out of scope for compatibility testing. I have checked that with this patch and the two above (well, one is just to remove a warning), you can get PG16 and up building cleanly with gcc-14 -std=gnu23. Before that, you get a ton of warnings and errors related to the node tree walker routines. This is presumably related to commit 1c27d16e6e5. Going further back, the bool patch proposed here assumes that stdbool.h exists unconditionally, which is C99, which is not the baseline for older branches. I think for those it's probably best to leave it alone and just use gcc-15 -std=gnu89 or whatever.
On Thu, Nov 21, 2024 at 8:23 PM Peter Eisentraut <peter@eisentraut.org> wrote: > On 20.11.24 16:32, Tom Lane wrote: > > Yeah. Our normal theory for this kind of thing is "people are > > likely to build our old branches with modern toolchains", so > > we are going to have to back-patch C23 compatibility sooner or > > later. In fact, we'll have to back-patch to 9.2, or else > > decide that those branches are unbuildable on modern platforms > > and hence out of scope for compatibility testing. > > I have checked that with this patch and the two above (well, one is just > to remove a warning), you can get PG16 and up building cleanly with > gcc-14 -std=gnu23. Thanks. I pushed the <stdbool.h> thing, which didn't require going back very far. > Before that, you get a ton of warnings and errors related to the node > tree walker routines. This is presumably related to commit 1c27d16e6e5. Alligator is now getting past the bool troubles and reaching that stuff. I was expecting it to be green in master. It's OK on my slightly older "gcc version 15.0.0 20241110 (experimental) (FreeBSD Ports Collection)" with -std=gnu23, but alligator now shows a weird error with tsearch data types. Something about flexible array members (casting from non-flex to flex?, without saying where the cast is?), but IDK, it's an internal error asking for a bug to be filed, not a user-facing one. This might be relevant: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113688 > Going further back, the bool patch proposed here assumes that stdbool.h > exists unconditionally, which is C99, which is not the baseline for > older branches. I think for those it's probably best to leave it alone > and just use gcc-15 -std=gnu89 or whatever. There's only one C89 branch that knows about <stdbool.h>: REL_11_STABLE. That's recent enough that it's still easy to work with, so I just changed it to use AC_CHECK_HEADER instead. In other words, we've removed the bogus "conforms" check. Whether you still need a presence check depends on the C version, and only for 11 is the answer yes. Obviously nobody is really going to build with an actual C89 system so the presence check is never going to fail, but it would be weird on principle to suddenly require a C99 thing...
On 25.11.24 11:01, Thomas Munro wrote: >> I have checked that with this patch and the two above (well, one is just >> to remove a warning), you can get PG16 and up building cleanly with >> gcc-14 -std=gnu23. > > Thanks. I pushed the <stdbool.h> thing, which didn't require going > back very far. > >> Before that, you get a ton of warnings and errors related to the node >> tree walker routines. This is presumably related to commit 1c27d16e6e5. > > Alligator is now getting past the bool troubles and reaching that > stuff. I was expecting it to be green in master. It's OK on my > slightly older "gcc version 15.0.0 20241110 (experimental) (FreeBSD > Ports Collection)" with -std=gnu23, but alligator now shows a weird > error with tsearch data types. Something about flexible array members > (casting from non-flex to flex?, without saying where the cast is?), > but IDK, it's an internal error asking for a bug to be filed, not a > user-facing one. > > This might be relevant: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113688 > >> Going further back, the bool patch proposed here assumes that stdbool.h >> exists unconditionally, which is C99, which is not the baseline for >> older branches. I think for those it's probably best to leave it alone >> and just use gcc-15 -std=gnu89 or whatever. > > There's only one C89 branch that knows about <stdbool.h>: > REL_11_STABLE. That's recent enough that it's still easy to work > with, so I just changed it to use AC_CHECK_HEADER instead. In other > words, we've removed the bogus "conforms" check. Whether you still > need a presence check depends on the C version, and only for 11 is the > answer yes. Obviously nobody is really going to build with an actual > C89 system so the presence check is never going to fail, but it would > be weird on principle to suddenly require a C99 thing... Where does this leave us regarding backpatching the other two C23-related patches? The node tree walker issue looks like a very hard barrier. I don't want to spend too much effort backpatching anything to ancient version if there's little hope of getting the whole thing working.
On Tue, Nov 26, 2024 at 4:57 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Where does this leave us regarding backpatching the other two > C23-related patches? The node tree walker issue looks like a very hard > barrier. I don't want to spend too much effort backpatching anything to > ancient version if there's little hope of getting the whole thing working. Oh. Yeah. 1c27d16e6e5 was not back-patchable. And what f9a56e72 did in 15 and older doesn't seem to have any equivalent in C23, at least without going way overboard. -Wdeprecated-non-prototype was recognising a category of function type that no longer exists, so the code now falls into the more general case of -Wincompatible-pointer-types in C23, which you certainly wouldn't want to suppress. So perhaps we actually can't make any branch older than PostgreSQL 16 into a valid C23 program, and if that's true, I needn't have back-patched the <stdbool.h> change any further back than 16. Perhaps we should reconsider that, then. And if it can't be all the back-branches, we could even decide to focus just on master. Where do we want our C23 support to begin? Coincidental observation: We added -Wdeprecated-non-prototype back when Clang 15 invented it, but I noticed that GCC 15 has just now added it too[1], so alligator started detecting and using that in REL_15_STABLE last week. Of course it doesn't help once you're talking C23. [1] https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=701d8e7e60b85809cae348c1e9edb3b0f4924325 [2] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=alligator&dt=2024-11-18%2019%3A23%3A30&stg=configure
Thomas Munro <thomas.munro@gmail.com> writes: > Oh. Yeah. 1c27d16e6e5 was not back-patchable. And what f9a56e72 did > in 15 and older doesn't seem to have any equivalent in C23, at least > without going way overboard. -Wdeprecated-non-prototype was > recognising a category of function type that no longer exists, so the > code now falls into the more general case of > -Wincompatible-pointer-types in C23, which you certainly wouldn't want > to suppress. So perhaps we actually can't make any branch older than > PostgreSQL 16 into a valid C23 program, and if that's true, I needn't > have back-patched the <stdbool.h> change any further back than 16. > Perhaps we should reconsider that, then. And if it can't be all the > back-branches, we could even decide to focus just on master. Where do > we want our C23 support to begin? Unless somebody has a better idea than 1c27d16e6e5, it would seem reasonable to say that we'll support C23 in v16 and later, but to build an older branch you have to back off to an older C version. I don't feel a need to revert those <stdbool.h> changes. If nothing else, that saved a bit of configure runtime. If we leave it like this, alligator will need some configuration adjustments, and so will other BF animals when they migrate to new gcc, and so will individual hackers when they're trying to build old branches. A possible compromise to reduce the manual pain level could be to adjust configure to add "-std=gnu99" or so to CFLAGS in the pre-v16 branches, if the compiler accepts that. (OTOH, maybe that'd cause pain for some extensions?) regards, tom lane
On Tue, Nov 26, 2024 at 9:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > If we leave it like this, alligator will need some configuration > adjustments, and so will other BF animals when they migrate to new > gcc, and so will individual hackers when they're trying to build > old branches. A possible compromise to reduce the manual pain > level could be to adjust configure to add "-std=gnu99" or so to > CFLAGS in the pre-v16 branches, if the compiler accepts that. We already have tests to see if we need to add -std=c99 to go forwards in time (from a quick look at the build farm, now used only by EOL distros testing GCC 4). Something tells me we might want to be less draconian when travelling backwards, but I dunno... our stuff is working fine with (implicit) -std=c17 all over the place, and we also have: # Do we need -std=c99 to compile C99 code? We don't want to add -std=c99 # unnecessarily, because we optionally rely on newer features. I noticed that later autoconf killed AC_PROG_CC_C99 and its AC_PROG_CC is figuring out the highest C standard available and requesting that, though it doesn't know about C23 yet. > (OTOH, maybe that'd cause pain for some extensions?) So we're talking about -std=XXX suddenly appearing in pg_config --cflags? Of course we want to try quite hard to keep emitting nothing for that if we can. We already emit -std=gnu99 from old tests that keep those GCC 4 build farm zombies at bay (did extensions ever complain about that back in the GCC 4 days? I'd guess not), but no one really uses those in real life. If we one day dare to dream about moving our own baseline to C11/C17, we'll still emit nothing as the compilers are already there by default. We will start to emit a new flag to disable C23 if required, but I think it might be unlikely to upset anyone if it works something like this: * For 16+ nothing, we're going to be C23 clean (after a couple more back-patches) * For 9.2-15 on GCC < 15 it'll stay as nothing too * For 9.2-15 on early GCC 15 adopter distros like Fedora/Gentoo etc we'll detect C23, and perhaps start spitting out -std=c17 (if you've detected C23, I think you can assume that C17 is available so we don't have to do a C17-C11-C99[-C89] search?) * When 12-15 fall out of support and all compilers are eventually C23+ compilers, they'll eventually always be getting -std=c17 by the above rules but no one will mind about that in the ancient branches I think if someone writes new extension code in C23 and wants to use it with PostgreSQL 15 that came out in 2022, they can expect a few time travel problems, but by the time C23 really starts to take off, 15 will be retired, and it's great that we got the hardest part of this into 16. I don't think the problems would be too hard to deal with if you do it. One saving grace here is that all this stuff is converging with C++, and we already ensure our headers are valid C++. As for what they actually mean, we also know that C++ extensions are happily using the tree walker stuff in the wild, which I think must be about the same level of C calling convention abuse whether you do it from C23 or C++, and apparently doesn't break. Example: https://github.com/duckdb/pg_duckdb/blob/d53247f004b154dc81275e9c4b1184c792f4865c/src/pgduckdb_hooks.cpp#L123 The C++ people aren't using --cflags of course. I guess a really-written-in-C23 extension using --cflags to compile its own code would need to append -std=c23 on the end when building against 12-15 (both clang and gcc will take the last of multiple -std flags), if we decide to start emitting -std=c17 in 12-15 because we've detected a C23 compiler. In 16+ they could put -std=c23 on the end, or not if they somehow know it's the default. We could suppress it in pg_config --cflags even though we need it to build the backend (by the arguments above, we know the headers are more acceptable as C23 than the backend .c files), but that seems bound to confuse matters. I don't know how exactly, I'm no expert in the gnarly details of extension buildfiles, and that's just some first thoughts. I might try some ideas out in a few days when my local gcc15-devel package catches up with the new defaults, and see if everything I wrote is complete nonsense.
Thomas Munro <thomas.munro@gmail.com> writes: > ... I think it might be unlikely to > upset anyone if it works something like this: > * For 16+ nothing, we're going to be C23 clean (after a couple more > back-patches) > * For 9.2-15 on GCC < 15 it'll stay as nothing too > * For 9.2-15 on early GCC 15 adopter distros like Fedora/Gentoo etc > we'll detect C23, and perhaps start spitting out -std=c17 (if you've > detected C23, I think you can assume that C17 is available so we don't > have to do a C17-C11-C99[-C89] search?) > * When 12-15 fall out of support and all compilers are eventually C23+ > compilers, they'll eventually always be getting -std=c17 by the above > rules but no one will mind about that in the ancient branches Sounds plausible to me. Will you work on making that happen? regards, tom lane
On Tue, Nov 26, 2024 at 3:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sounds plausible to me. Will you work on making that happen? OK, trying it out...
Pushed. Alligator is turning green in the back branches, only one more to go. I see that Peter also pushed the reserved word patches. Apparently that gcc bug it's blowing up on might go away with -g0 (see link earlier), but anyway it's a nightly build compiler so fingers crossed for a fix soon. The newer branches are building and running for me on { gcc14, gcc15, clang18 } -std=gnu23, and apparently the bug was even in gcc14, so it must require some unlikely conditions that alligator has stumbled on. (Yeah I had missed 10, thanks for the nudge.)
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Nov 18, 2024 at 2:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, that's a fair criticism. I don't think we've tested that code >> path since I retired prairiedog, so who's to say that it works even >> now? Maybe it's best to just delete that code, and if we ever find a >> new platform with wider bool, figure out what to do at that time. > Here's a draft patch for that. Passes an eyeball check, but I've not really tried to test it. I suppose the only meaningful test is likely to be letting the buildfarm loose on it. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > Pushed. Alligator is turning green in the back branches, only one more to go. > I see that Peter also pushed the reserved word patches. Apparently > that gcc bug it's blowing up on might go away with -g0 (see link > earlier), but anyway it's a nightly build compiler so fingers crossed > for a fix soon. The newer branches are building and running for me on > { gcc14, gcc15, clang18 } -std=gnu23, and apparently the bug was even > in gcc14, so it must require some unlikely conditions that alligator > has stumbled on. Looks like flaviventris and serinus just updated to the same broken compiler version that alligator is using :-(. Maybe we'd better file a formal bug report? regards, tom lane
Hi, On 2024-11-27 13:28:24 -0500, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Pushed. Alligator is turning green in the back branches, only one more to go. > > I see that Peter also pushed the reserved word patches. Apparently > > that gcc bug it's blowing up on might go away with -g0 (see link > > earlier), but anyway it's a nightly build compiler so fingers crossed > > for a fix soon. The newer branches are building and running for me on > > { gcc14, gcc15, clang18 } -std=gnu23, and apparently the bug was even > > in gcc14, so it must require some unlikely conditions that alligator > > has stumbled on. > > Looks like flaviventris and serinus just updated to the same broken > compiler version that alligator is using :-(. Maybe we'd better > file a formal bug report? I run a development gcc locally, and I just had updated it this morning (4a868591169). Interestingly I don't see the ICE with it. But I can reproduce it with debian sid's gcc-snapshot, with exactly the same compiler arguments. The snapshot's version: gcc (Debian 20241123-1) 15.0.0 20241123 (experimental) [master r15-5606-g4aa4162e365] so it looks like the bug might have been fixed recently? I'm not sure this is really the bug linked to earlier [1]. I can't repro the issue with 14, for example. It's possible it requires specific gcc configure flags to be triggered? Luckily -g1 does, at least locally, work around the issue with gcc-snapshot. So I guess I'll make flaviventris and serinus use that for now :/ Greetings, Andres Freund [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113688
Andres Freund <andres@anarazel.de> writes: > Hi, > > On 2024-11-27 13:28:24 -0500, Tom Lane wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >> > Pushed. Alligator is turning green in the back branches, only one more to go. >> > I see that Peter also pushed the reserved word patches. Apparently >> > that gcc bug it's blowing up on might go away with -g0 (see link >> > earlier), but anyway it's a nightly build compiler so fingers crossed >> > for a fix soon. The newer branches are building and running for me on >> > { gcc14, gcc15, clang18 } -std=gnu23, and apparently the bug was even >> > in gcc14, so it must require some unlikely conditions that alligator >> > has stumbled on. >> >> Looks like flaviventris and serinus just updated to the same broken >> compiler version that alligator is using :-(. Maybe we'd better >> file a formal bug report? > > I run a development gcc locally, and I just had updated it this morning > (4a868591169). Interestingly I don't see the ICE with it. > > But I can reproduce it with debian sid's gcc-snapshot, with exactly the same > compiler arguments. The snapshot's version: > gcc (Debian 20241123-1) 15.0.0 20241123 (experimental) [master r15-5606-g4aa4162e365] > > so it looks like the bug might have been fixed recently? > > I'm not sure this is really the bug linked to earlier [1]. I can't repro the > issue with 14, for example. > > It's possible it requires specific gcc configure flags to be triggered? > > Luckily -g1 does, at least locally, work around the issue with > gcc-snapshot. So I guess I'll make flaviventris and serinus use that for now > :/ > > Greetings, > > Andres Freund > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113688 See https://gcc.gnu.org/PR117724 as well. The issues are related in that canonicalisation of struct types keeps needing revisiting, more so in light of C23 changes. Note also that the ICE is only with "checking" (~assertions) which is enabled at a stricter level for non-releases by default, so some of it may affect 14 but not show up there. Martin Uecker has posted a patch which is currently being reviewed. I wouldn't worry about it until that lands unless the build failures continue. thanks, sam
Hi, On 2024-11-27 19:01:36 +0000, Sam James wrote: > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113688 > See https://gcc.gnu.org/PR117724 as well. The issues are related in that > canonicalisation of struct types keeps needing revisiting, more so in > light of C23 changes. Thanks. > Note also that the ICE is only with "checking" (~assertions) which is > enabled at a stricter level for non-releases by default, so some of it > may affect 14 but not show up there. Ah, that explains that. > Martin Uecker has posted a patch which is currently being reviewed. I > wouldn't worry about it until that lands unless the build failures continue. I changed my local build to use the same --checking as debian's gcc-snapshot, confirmed that that reproduces the issue. Then applied Martin's patch. It does fix the problem. Greetings, Andres Freund
Hi, On 2024-11-27 13:50:59 -0500, Andres Freund wrote: > Luckily -g1 does, at least locally, work around the issue with > gcc-snapshot. So I guess I'll make flaviventris and serinus use that for now > :/ Done. It did fix flaviventris/HEAD, I'm sure the others will follow soon. Greetings, Andres Freund
> Done. It did fix flaviventris/HEAD, I'm sure the others will follow soon.
alligator/HEAD is still failing though.
robins
Robins Tharakan <tharakan@gmail.com> writes: > alligator/HEAD is still failing though. > Let me know if changing something on alligator can help in some way. I think we're waiting on the gcc crew to fix their bug. As long as alligator is faithfully rebuilding gcc from upstream everyday, there's not much more to do than wait. (You could adopt Andres' -g1 workaround, but then we'd not know when the gcc bug is fixed. So unless this drags on quite awhile, I think alligator is best left as-is.) regards, tom lane
Robins Tharakan <tharakan@gmail.com> writes:
> alligator/HEAD is still failing though.
> Let me know if changing something on alligator can help in some way.
I think we're waiting on the gcc crew to fix their bug. As long
as alligator is faithfully rebuilding gcc from upstream everyday,
there's not much more to do than wait.
(You could adopt Andres' -g1 workaround, but then we'd not know
when the gcc bug is fixed. So unless this drags on quite awhile,
I think alligator is best left as-is.)
Robins Tharakan <tharakan@gmail.com> writes: > It's been a few days since alligator lit up the buildfarm red. IMHO at this > point, it is just noise. So I've reduced its v16+ build frequency to daily > (instead of a few minutes). I'll revert that once gcc is fixed upstream. Looks like alligator just went green with the 20241212 gcc build. regards, tom lane
Robins Tharakan <tharakan@gmail.com> writes:
> It's been a few days since alligator lit up the buildfarm red. IMHO at this
> point, it is just noise. So I've reduced its v16+ build frequency to daily
> (instead of a few minutes). I'll revert that once gcc is fixed upstream.
Looks like alligator just went green with the 20241212 gcc build.
Just forced REL_17_STABLE / REL_16_STABLE and they are pastthat failure point too (still running).
REL_17_STABLE failed on misc-recovery and one context I can add here isthat I triggered both REL_16_STABLE and REL_17_STABLE together andwere running neck-and-neck wherein v16 went past this test (in ~4 minutes)and v17 (got stuck for ~10 min) and failed.
alligator:REL_16_STABLE [12:29:40] running test misc-subscription ...
v17 log:
alligator:REL_17_STABLE [12:25:09] running bin test psql ...
alligator:REL_17_STABLE [12:25:19] running bin test scripts ...
alligator:REL_17_STABLE [12:26:00] running test misc-recovery ...
alligator:REL_17_STABLE [12:37:15] failed at stage recoveryCheck
$
Robins Tharakan <tharakan@gmail.com> writes: > Is it possible that 2 concurrent runs (of different branches) could step on > each other? Looks that way, doesn't it? But I suspect it's just a timing problem --- perhaps one triggered by the background load caused by another test run, but not directly related to it. Looking at 019_replslot_limit.pl, I see ----- # freeze walsender and walreceiver. Slot will still be active, but walreceiver # won't get anything anymore. kill 'STOP', $senderpid, $receiverpid; $node_primary3->advance_wal(2); ----- This fragment seems to assume that the effect of the kill 'STOP' is instantaneous. If it's not, it seems possible that the walsender process could manage to push out the WAL data added by "advance_wal" before it gets frozen, in which case the subsequent loop that's watching for it to get killed would watch in vain, which matches your symptoms. As far as a quick grep finds, this is the only test we have that relies on kill 'STOP'. I suspect that it's too clever for its own good. I trawled the last three months' worth of buildfarm logs and didn't find any other matches to "not ok 19 - walsender termination logged". So this is a pretty improbable failure mode, whatever the explanation is. regards, tom lane
On Fri, 13 Dec 2024 at 12:46, Robins Tharakan <tharakan@gmail.com> wrote:REL_17_STABLE failed on misc-recovery and one context I can add here isthat I triggered both REL_16_STABLE and REL_17_STABLE together andwere running neck-and-neck wherein v16 went past this test (in ~4 minutes)and v17 (got stuck for ~10 min) and failed.Is it possible that 2 concurrent runs (of different branches) could step on each other?These are the logs that I captured, and v16 [2] / v17 [1] literally ran at the sametime (seconds apart).v16 log:alligator:REL_16_STABLE [12:24:42] running bin test scripts ...alligator:REL_16_STABLE [12:25:28] running test misc-recovery ...
alligator:REL_16_STABLE [12:29:40] running test misc-subscription ...
v17 log:
alligator:REL_17_STABLE [12:25:09] running bin test psql ...
alligator:REL_17_STABLE [12:25:19] running bin test scripts ...
alligator:REL_17_STABLE [12:26:00] running test misc-recovery ...
alligator:REL_17_STABLE [12:37:15] failed at stage recoveryCheck
$
We actually have a good deal of protection against concurrent runs clobbering each other.
It's not clear to me if you're using "run_branches.pl --run-parallel" or not. If not, you might like to consider changing to that - it's the recommended way of doing concurrent runs. Apart from any other reason it removes the need for a lot of redundant git fetches. By default it staggers concurrent build starts by 60 seconds.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
We actually have a good deal of protection against concurrent runs clobbering each other.
It's not clear to me if you're using "run_branches.pl --run-parallel" or not. If not, you might like to consider changing to that - it's the recommended way of doing concurrent runs. Apart from any other reason it removes the need for a lot of redundant git fetches. By default it staggers concurrent build starts by 60 seconds.
Robins Tharakan <tharakan@gmail.com> writes: > Unrelated, for a slow system my understanding was that it's quite > inefficient to keep running older > branches every few minutes (like HEAD does) - so for some of the animals I > explicitly run older > branches (for e.g. v13) every few hours, but HEAD runs every few minutes. > Are you saying it's still a good idea to run all together every few minutes > (and let older branches > skip if there's nothing to do)? Nowadays run_branches' check for whether there's something to do is cheap enough that it's not worth skipping. So it's recommendable to just launch that every so often, and not complicate your life with manual per-branch scheduling. regards, tom lane
Nowadays run_branches' check for whether there's something to do is
cheap enough that it's not worth skipping. So it's recommendable
to just launch that every so often, and not complicate your life
with manual per-branch scheduling.
On Sun, Dec 15, 2024 at 8:04 PM Robins Tharakan <tharakan@gmail.com> wrote: > [unexplained failure on alligator] 2024-12-19 12:38:26.223 ACDT [1956030:2] [unknown] FATAL: could not open shared memory segment "/PostgreSQL.2167569412": No such file or directory That smells like systemd's RemoteIPC feature: https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC