Re: margay fails assertion in stats/dsa/dsm code - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: margay fails assertion in stats/dsa/dsm code
Date
Msg-id CA+hUKGL+DNK5d5ARgFuN4kntZrur68E+DWbcm3iyF_NjTZ4DBQ@mail.gmail.com
Whole thread Raw
In response to Re: margay fails assertion in stats/dsa/dsm code  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: margay fails assertion in stats/dsa/dsm code
Re: margay fails assertion in stats/dsa/dsm code
List pgsql-hackers
On Wed, Jun 29, 2022 at 6:04 AM Robert Haas <robertmhaas@gmail.com> wrote:
> My first thought was that the return value of the call to
> dsm_impl_op() at the end of dsm_attach() is not checked and that maybe
> it was returning NULL, but it seems like whoever wrote
> dsm_impl_posix() was pretty careful to ereport(elevel, ...) in every
> failure path, and elevel is ERROR here, so I don't see any issue. My
> second thought was that maybe control had escaped from dsm_attach()
> due to an error before we got to the step where we actually map the
> segment, but then the dsm_segment * would be returned to the caller.
> Maybe they could retrieve it later using dsm_find_mapping(), but that
> function has no callers in core.

Thanks for looking.  Yeah.  I also read through that code many times
and drew the same conclusion.

> So I'm kind of stumped too, but did you by any chance check whether
> there are any DSM-related messages in the logs before the assertion
> failure?

Marcel kindly granted me access to his test machine, where the failure
can be reproduced by running make check lots of times.  I eventually
figured out that the problem control flow is ... of course ... the one
path that doesn't ereport(), and that's when errno == EEXIST.  That is
a path that is intended to handle DSM_OP_CREATE.  Here we are handling
DSM_OP_ATTACH, and I have verified that we're passing in just O_RDWR.
EEXIST is a nonsensical error for shm_open() without flags containing
O_CREAT | O_EXCL (according to POSIX and Solaris's man page).

On this OS, shm_open() opens plain files in /tmp (normally a RAM disk,
so kinda like /dev/shm on Linux), that much I can tell with a plain
old "ls" command.  We can also read its long lost open source cousin
(which may be completely different for all I know, but I'd doubt it):

https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/rt/shm.c
https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/rt/pos4obj.c

Erm.  It looks like __pos4obj_lock() could possibly return -1 and
leave errno == EEXIST, if it runs out of retries?  Then shm_open()
would return -1, and we'd blow up.  However, for that to happen, one
of those "SHM_LOCK_TYPE" files would have to linger for 64 sleep
loops, and I'm not sure why that'd happen, or what to do about it.  (I
don't immediately grok what that lock file is even for.)

I suppose this could indicate that the machine and/or RAM disk is
overloaded/swapping and one of those open() or unlink() calls is
taking a really long time, and that could be fixed with some system
tuning.  I suppose it's also remotely possible that the process is
getting peppered with signals so that funky shell script-style locking
scheme is interrupted and doesn't really wait very long.  Or maybe I
guessed wrong and some other closed source path is to blame *shrug*.

As for whether PostgreSQL needs to do anything, perhaps we should
ereport for this unexpected error as a matter of self-preservation, to
avoid the NULL dereference you'd presumably get on a non-cassert build
with the current coding?  Maybe just:

-               if (errno != EEXIST)
+               if (op == DSM_OP_ATTACH || errno != EEXIST)
                        ereport(elevel,
                                        (errcode_for_dynamic_shared_memory(),
                                         errmsg("could not open shared
memory segment \"%s\": %m",

margay would probably still fail until that underlying problem is
addressed, but less mysteriously on our side at least.



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Can we do something to help stop users mistakenly using force_parallel_mode?
Next
From: Michael Paquier
Date:
Subject: Re: pg_upgrade allows itself to be run twice