Thread: margay fails assertion in stats/dsa/dsm code
Hi, BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is sometimes failing with: TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c", Line: 1069, PID: 9038) I can't immediately see why it's doing this, but my tool that looks for assertion failures hasn't seen that on any other system. Example stack (trimmed from log), in this case a regular backend, other times it was a parallel worker: TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c", Line: 1069, PID: 3944) ExceptionalCondition+0x64 [0x1008bb348] dsm_segment_address+0x44 [0x1006ff7d0] get_segment_by_index+0x7c [0x1008ee960] dsa_get_address+0x9c [0x1008ef754] pgstat_get_entry_ref+0x1068 [0x10075f348] pgstat_prep_pending_entry+0x58 [0x100758424] pgstat_assoc_relation+0x44 [0x10075b314] _bt_first+0x9ac [0x10036cd78] btgettuple+0x10c [0x1003653a8] index_getnext_tid+0x4c [0x1003531c4] index_getnext_slot+0x78 [0x100353564] systable_getnext+0x18 [0x1003519b4] SearchCatCacheMiss+0x74 [0x10089ce18] SearchCatCacheInternal+0x1c0 [0x10089d0a4] GetSysCacheOid+0x34 [0x1008b5ca4] get_role_oid+0x18 [0x100767444] hba_getauthmethod+0x8 [0x100599da4] ClientAuthentication+0x1c [0x10058cb68] InitPostgres+0xacc [0x1008d26b8] PostgresMain+0x94 [0x1007397f8] ServerLoop+0x1184 [0x1006739e8] PostmasterMain+0x1400 [0x10067520c] main+0x2e0 [0x1005a28c0] _start+0x64 [0x1002c5c44] I know that on Solaris we use dynamic_shared_memory=posix. The other Solaris/Sparc system is wrasse, and it's not doing this. I don't see it yet, but figured I'd report this much to the list in case someone else does.
On Fri, Jun 3, 2022 at 12:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: > BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is > sometimes failing with: > > TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c", > Line: 1069, PID: 9038) I spent some time on the GCC farm machine gcc211 (Sol 11.3, GCC 5.5), but could not repro this. It's also not happening on wrasse (Sol 11.3, Sun Studio compiler). I don't have access to a Sol 11.4 CBE/Sparc system like margay, but I have learned that CBE is the name of a very recently announced rolling release intended for open source developers[1]. I still have no idea if the active thing here is Sparc, Sol 11.4, "CBE", GCC 11.2 or just timing conditions that reveal bugs in our dsm/dsa/dshash/pgstat code that show up here in about 1/4 of make check runs on this stack, but miraculously nowhere else. Perhaps margay's owner could shed some light, or has a way to provide ssh access to a similar zone with a debugger etc installed? [1] https://blogs.oracle.com/solaris/post/announcing-the-first-oracle-solaris-114-cbe
Am 28.06.2022 um 08:27 schrieb Thomas Munro: > On Fri, Jun 3, 2022 at 12:05 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> BF animal margay (a newly started Solaris 11.4/Sparc/GCC 11.2 box) is >> sometimes failing with: >> >> TRAP: FailedAssertion("seg->mapped_address != NULL", File: "dsm.c", >> Line: 1069, PID: 9038) > > I spent some time on the GCC farm machine gcc211 (Sol 11.3, GCC 5.5), > but could not repro this. It's also not happening on wrasse (Sol > 11.3, Sun Studio compiler). I don't have access to a Sol 11.4 > CBE/Sparc system like margay, but I have learned that CBE is the name > of a very recently announced rolling release intended for open source > developers[1]. I still have no idea if the active thing here is > Sparc, Sol 11.4, "CBE", GCC 11.2 or just timing conditions that reveal > bugs in our dsm/dsa/dshash/pgstat code that show up here in about 1/4 > of make check runs on this stack, but miraculously nowhere else. > Perhaps margay's owner could shed some light, or has a way to provide > ssh access to a similar zone with a debugger etc installed? > > [1] https://blogs.oracle.com/solaris/post/announcing-the-first-oracle-solaris-114-cbe Looks like a timing issue for me, because it happens only sometimes. No problems with versions 14 and 13. I can provide ssh access to this system.
On Thu, Jun 2, 2022 at 8:06 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I know that on Solaris we use dynamic_shared_memory=posix. The other > Solaris/Sparc system is wrasse, and it's not doing this. I don't see > it yet, but figured I'd report this much to the list in case someone > else does. My first thought was that the return value of the call to dsm_impl_op() at the end of dsm_attach() is not checked and that maybe it was returning NULL, but it seems like whoever wrote dsm_impl_posix() was pretty careful to ereport(elevel, ...) in every failure path, and elevel is ERROR here, so I don't see any issue. My second thought was that maybe control had escaped from dsm_attach() due to an error before we got to the step where we actually map the segment, but then the dsm_segment * would be returned to the caller. Maybe they could retrieve it later using dsm_find_mapping(), but that function has no callers in core. So I'm kind of stumped too, but did you by any chance check whether there are any DSM-related messages in the logs before the assertion failure? -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Jun 29, 2022 at 6:04 AM Robert Haas <robertmhaas@gmail.com> wrote: > My first thought was that the return value of the call to > dsm_impl_op() at the end of dsm_attach() is not checked and that maybe > it was returning NULL, but it seems like whoever wrote > dsm_impl_posix() was pretty careful to ereport(elevel, ...) in every > failure path, and elevel is ERROR here, so I don't see any issue. My > second thought was that maybe control had escaped from dsm_attach() > due to an error before we got to the step where we actually map the > segment, but then the dsm_segment * would be returned to the caller. > Maybe they could retrieve it later using dsm_find_mapping(), but that > function has no callers in core. Thanks for looking. Yeah. I also read through that code many times and drew the same conclusion. > So I'm kind of stumped too, but did you by any chance check whether > there are any DSM-related messages in the logs before the assertion > failure? Marcel kindly granted me access to his test machine, where the failure can be reproduced by running make check lots of times. I eventually figured out that the problem control flow is ... of course ... the one path that doesn't ereport(), and that's when errno == EEXIST. That is a path that is intended to handle DSM_OP_CREATE. Here we are handling DSM_OP_ATTACH, and I have verified that we're passing in just O_RDWR. EEXIST is a nonsensical error for shm_open() without flags containing O_CREAT | O_EXCL (according to POSIX and Solaris's man page). On this OS, shm_open() opens plain files in /tmp (normally a RAM disk, so kinda like /dev/shm on Linux), that much I can tell with a plain old "ls" command. We can also read its long lost open source cousin (which may be completely different for all I know, but I'd doubt it): https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/rt/shm.c https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/rt/pos4obj.c Erm. It looks like __pos4obj_lock() could possibly return -1 and leave errno == EEXIST, if it runs out of retries? Then shm_open() would return -1, and we'd blow up. However, for that to happen, one of those "SHM_LOCK_TYPE" files would have to linger for 64 sleep loops, and I'm not sure why that'd happen, or what to do about it. (I don't immediately grok what that lock file is even for.) I suppose this could indicate that the machine and/or RAM disk is overloaded/swapping and one of those open() or unlink() calls is taking a really long time, and that could be fixed with some system tuning. I suppose it's also remotely possible that the process is getting peppered with signals so that funky shell script-style locking scheme is interrupted and doesn't really wait very long. Or maybe I guessed wrong and some other closed source path is to blame *shrug*. As for whether PostgreSQL needs to do anything, perhaps we should ereport for this unexpected error as a matter of self-preservation, to avoid the NULL dereference you'd presumably get on a non-cassert build with the current coding? Maybe just: - if (errno != EEXIST) + if (op == DSM_OP_ATTACH || errno != EEXIST) ereport(elevel, (errcode_for_dynamic_shared_memory(), errmsg("could not open shared memory segment \"%s\": %m", margay would probably still fail until that underlying problem is addressed, but less mysteriously on our side at least.
On Wed, Jun 29, 2022 at 4:00 PM Thomas Munro <thomas.munro@gmail.com> wrote: > I suppose this could indicate that the machine and/or RAM disk is > overloaded/swapping and one of those open() or unlink() calls is > taking a really long time, and that could be fixed with some system > tuning. Hmm, I take that bit back. Every backend that starts up is trying to attach to the same segment, the one with the new pgstats stuff in it (once the small space in the main shmem segment is used up and we create a DSM segment). There's no fairness/queue, random back-off or guarantee of progress in that librt lock code, so you can get into lock-step with other backends retrying, and although some waiter always gets to make progress, any given backend can lose every round and run out of retries. Even when you're lucky and don't fail with an undocumented incomprehensible error, it's very slow, and I'd considering filing a bug report about that. A work-around on PostgreSQL would be to set dynamic_shared_memory_type to mmap (= we just open our own files and map them directly), and making pg_dynshmem a symlink to something under /tmp (or some other RAM disk) to avoid touch regular disk file systems.
On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro <thomas.munro@gmail.com> wrote: > As for whether PostgreSQL needs to do anything, perhaps we should > ereport for this unexpected error as a matter of self-preservation, to > avoid the NULL dereference you'd presumably get on a non-cassert build > with the current coding? Maybe just: > > - if (errno != EEXIST) > + if (op == DSM_OP_ATTACH || errno != EEXIST) > ereport(elevel, > (errcode_for_dynamic_shared_memory(), > errmsg("could not open shared > memory segment \"%s\": %m", > > margay would probably still fail until that underlying problem is > addressed, but less mysteriously on our side at least. That seems like a correct fix, but maybe we should also be checking the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR, "the author of dsm.c is not as clever as he thinks he is"). -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Jul 1, 2022 at 4:02 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > - if (errno != EEXIST) > > + if (op == DSM_OP_ATTACH || errno != EEXIST) > > ereport(elevel, > > (errcode_for_dynamic_shared_memory(), > > errmsg("could not open shared > > memory segment \"%s\": %m", > > > > margay would probably still fail until that underlying problem is > > addressed, but less mysteriously on our side at least. > > That seems like a correct fix, but maybe we should also be checking > the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as > an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR, > "the author of dsm.c is not as clever as he thinks he is"). Thanks. Also the mmap and sysv paths do something similar, so I also made the same change there just on principle. I didn't make the extra belt-and-braces check you suggested for now, preferring minimalism. I think the author of dsm.c was pretty clever, it's just that the world turned out to be more hostile than expected, in one very specific way. Pushed. So that should get us to a state where margay still fails occasionally, but now with an ERROR rather than a crash. Next up, I confirmed my theory about what's happening on closed Solaris by tracing syscalls. It is indeed that clunky sleep(1) code that gives up after 64 tries. Even in pre-shmem-stats releases that don't contend enough to reach the bogus EEXIST error, I'm pretty sure people must be getting random sleeps injected into their parallel queries in the wild by this code. I have concluded that that implementation of shm_open() is not really usable for our purposes. We'll have to change *something* to turn margay reliably green, not to mention bogus error reports we can expect from 15 in the wild, and performance woes that I cannot now unsee. So... I think we should select a different default dynamic_shared_memory_type in initdb.c if defined(__sun__). Which is the least terrible? For sysv, it looks like all the relevant sysctls that used to be required to use sysv memory became obsolete/automatic in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT, whereas for mmap mode your shared memory data is likely to cause file I/O because we put the temporary files in your data directory. I'm thinking perhaps we should default to dynamic_shared_memory_type=sysv for 15+. I don't really want to change it in the back branches, since nobody has actually complained about "posix" performance and it might upset someone if we change it for newly initdb'd DBs in a major release series. But I'm not an expert or even user of this OS, I'm just trying to fix the build farm; better ideas welcome. Thoughts?
On Thu, Jun 30, 2022 at 10:34 PM Thomas Munro <thomas.munro@gmail.com> wrote: > So... I think we should select a different default > dynamic_shared_memory_type in initdb.c if defined(__sun__). Which is > the least terrible? For sysv, it looks like all the relevant sysctls > that used to be required to use sysv memory became obsolete/automatic > in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT, > whereas for mmap mode your shared memory data is likely to cause file > I/O because we put the temporary files in your data directory. I'm > thinking perhaps we should default to dynamic_shared_memory_type=sysv > for 15+. I don't really want to change it in the back branches, since > nobody has actually complained about "posix" performance and it might > upset someone if we change it for newly initdb'd DBs in a major > release series. But I'm not an expert or even user of this OS, I'm > just trying to fix the build farm; better ideas welcome. Boy, relying on DSM for critical stuff sure is a lot of fun! This is exactly why I hate adding new facilities that have to be implemented in OS-dependent ways. Changing the default on certain platforms to 'posix' or 'sysv' according to what works best on that platform seems reasonable to me. I agree that defaulting to 'mmap' doesn't seem like a lot of fun, although I think it could be a reasonable choice on a platform where everything else is broken. You could alternatively try to fix 'posix' by adding some kind of code to work around that platform's deficiencies. Insert handwaving here. -- Robert Haas EDB: http://www.enterprisedb.com
On Sat, Jul 2, 2022 at 1:15 AM Robert Haas <robertmhaas@gmail.com> wrote: > Changing the default on certain platforms to 'posix' or 'sysv' > according to what works best on that platform seems reasonable to me. Ok, I'm going to make that change in 15 + master. > I agree that defaulting to 'mmap' doesn't seem like a lot of fun, > although I think it could be a reasonable choice on a platform where > everything else is broken. You could alternatively try to fix 'posix' > by adding some kind of code to work around that platform's > deficiencies. Insert handwaving here. I don't think that 'posix' mode is salvageable on Solaris, but a new GUC to control where 'mmap' mode puts its files would be nice. Then you could set it to '/tmp' (or some other RAM disk), and you'd have the same end result as shm_open() on that platform, without the lock problem. Perhaps someone could propose a patch for 16. As for the commit I already made, we can now see the new error: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=margay&dt=2022-07-01%2016%3A00%3A07 2022-07-01 18:25:25.848 CEST [27784:1] ERROR: could not open shared memory segment "/PostgreSQL.499018794": File exists Unfortunately this particular run crashed anyway, for a new reason: one backend didn't like the state the new error left the dshash in, during shmem_exit: 2022-07-01 18:25:25.848 CEST [27738:21] pg_regress/prepared_xacts ERROR: could not open shared memory segment "/PostgreSQL.499018794": File exists 2022-07-01 18:25:25.848 CEST [27738:22] pg_regress/prepared_xacts STATEMENT: SELECT * FROM pxtest1; TRAP: FailedAssertion("!hash_table->find_locked", File: "dshash.c", Line: 312, PID: 27784) /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'ExceptionalCondition+0x64 [0x1008bb8b0] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'dshash_detach+0x48 [0x10058674c] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_detach_shmem+0x68 [0x10075e630] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_shutdown_hook+0x94 [0x10075989c] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'shmem_exit+0x84 [0x100701198] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit_prepare+0x88 [0x100701394] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit+0x4 [0x10070148c] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'StartBackgroundWorker+0x150 [0x10066957c] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'maybe_start_bgworkers+0x604 [0x1006717ec] /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'sigusr1_handler+0x190 [0x100672510] So that's an exception safety problem in dshash or pgstat's new usage thereof, which is arguably independent of Solaris and probably deserves a new thread. You don't need Solaris to see it, you can just add in some random fault injection.
Hi, On 2022-07-02 11:10:07 +1200, Thomas Munro wrote: > 2022-07-01 18:25:25.848 CEST [27738:21] pg_regress/prepared_xacts > ERROR: could not open shared memory segment "/PostgreSQL.499018794": > File exists > 2022-07-01 18:25:25.848 CEST [27738:22] pg_regress/prepared_xacts > STATEMENT: SELECT * FROM pxtest1; > TRAP: FailedAssertion("!hash_table->find_locked", File: "dshash.c", > Line: 312, PID: 27784) > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'ExceptionalCondition+0x64 > [0x1008bb8b0] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'dshash_detach+0x48 > [0x10058674c] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_detach_shmem+0x68 > [0x10075e630] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'pgstat_shutdown_hook+0x94 > [0x10075989c] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'shmem_exit+0x84 > [0x100701198] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit_prepare+0x88 > [0x100701394] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'proc_exit+0x4 > [0x10070148c] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'StartBackgroundWorker+0x150 > [0x10066957c] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'maybe_start_bgworkers+0x604 > [0x1006717ec] > /home/marcel/build-farm-14/buildroot/HEAD/pgsql.build/tmp_install/home/marcel/build-farm-14/buildroot/HEAD/inst/bin/postgres'sigusr1_handler+0x190 > [0x100672510] > > So that's an exception safety problem in dshash or pgstat's new usage > thereof, which is arguably independent of Solaris and probably > deserves a new thread. You don't need Solaris to see it, you can just > add in some random fault injection. FWIW potentially relevant thread for that aspect: https://postgr.es/m/20220311012712.botrpsikaufzteyt%40alap3.anarazel.de What do you think about the proposal at the end of that email? Greetings, Andres Freund
On Sat, Jul 2, 2022 at 11:10 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sat, Jul 2, 2022 at 1:15 AM Robert Haas <robertmhaas@gmail.com> wrote: > > Changing the default on certain platforms to 'posix' or 'sysv' > > according to what works best on that platform seems reasonable to me. > > Ok, I'm going to make that change in 15 + master. For the record, I asked a Solaris kernel engineer about that shm_open() problem and learned that a fix shipped about a month after we had this discussion (though I haven't tested it myself): https://twitter.com/casperdik/status/1730288613722562986 I also reported the issue to illumos, since I'd like to be able to revert 94ebf811 eventually...: https://www.illumos.org/issues/16093