Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected
Date
Msg-id CA+hUKGKRQrJhVYBkmLJZsScJ434qiduWzzpB0-0_FW8z1kTjcw@mail.gmail.com
Whole thread Raw
In response to Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Wed, Aug 13, 2025 at 7:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Here's a new attempt at that.  It picks futexes automatically, unless
> you export MACOSX_DEPLOYMENT_TARGET=14.3 or lower, and then it picks
> sysv as before.

I compared my hand-rolled atomics logic against FreeBSD's user space
semaphores[1].  sem_post() and sem_trywait() are about the same, and I
guess there just aren't many ways to write those.  Minor differences:

1. They use CAS in sem_post() because they want to report EOVERFLOW if
you exceed SEM_VALUE_MAX, but POSIX doesn't seem to require that, so I
just used fetch-and-add.  Is that bad?  I noticed their alternative
older version (remove "_new" from [1]) also uses a whole 32 bit
counter like me, so perhaps isn't so worried about exceeding a narrow
field of bits, and does it my way.
2.  In sem_trywait() they reload the value while our
pg_atomic_compare_exchange_u32() does that for you, and I also prefer
to avoid assignment expression syntax if I can.

On the other hand my v1 sem_wait() seemed a little janky and weird in
comparison to theirs, and I was a bit worried about bugs.  I thought
about rewriting it to look more like that, but then I realised that I
could use sem_trywait() to implement sem_wait() for the same net
effect.  It's much shorter and sweeter and more intuitively
understandable, I think?

I also didn't bother to supply a couple of functions that posix_sema.c
doesn't use.

There's no immediately obvious performance difference, but I wasn't
expecting there to be: it's used in the slow path of our LWLocks and
they already have similar user space atomics in front, so you don't
really avoid entering the kernel with this technique except perhaps
just occasionally if you wait just as someone posts.  This is all just
about the convenience of bannishing System V cruft.  It doesn't seem
to be any slower though, in simple testing, and CI might even be a bit
faster based on spot checks of a few runs... but proper statistical
tools might be needed to see if that is a real phenomenon.

Anyone got any better ideas for how to organise the build scripts,
control the feature, naming etc?  I came up with is this, which
automatically falls back to sysv if it can't find what it needs
(respecting Apple's deployment target thing following what we did for
preadv):

      meson.build:        sema_type = "unnamed_posix+emulation"
      configure template: PREFERRED_SEMAPHORES="UNNAMED_POSIX+EMULATION"

The pg_futex.h abstraction layer doesn't have any support for other
OSes yet because I tried to pare this patch down to the absolute
minimum to solve this problem, but I did try at least a little bit to
anticipate that (having removed it from earlier versions from a few
years ago) when designing the futex contracts.  We could trivially add
at least OpenBSD support to use emulated semaphores there.  I never
quite figured out whether the NetBSD futex API is really available
outside its Linux syscall emulator, but that might be doable too.  And
futexes might of course turn out to have applications in other
synchronisation primitives.

Any thoughts on whether this is worth pursuing, and what kinds of
validation might be useful?

[1] https://github.com/freebsd/freebsd-src/blob/main/lib/libc/gen/sem_new.c

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Next
From: Peter Geoghegan
Date:
Subject: Re: index prefetching