Thread: [PATCH] Make ENOSPC not fatal in semaphore creation

[PATCH] Make ENOSPC not fatal in semaphore creation

From
mp39590@gmail.com
Date:
From: Mikhail <mp39590@gmail.com>

We might be in situation when we have "just enough" semaphores in the
system limit to start but previously crashed unexpectedly, in that case
we won't be able to start again - semget() will return ENOSPC, despite
the semaphores are ours, and we can recycle them, so check this
situation and try to remove the semaphore, if we are unable - give up
and abort.
---
 src/backend/port/sysv_sema.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index 21c883ba9a..a889591dba 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -88,10 +88,6 @@ static void ReleaseSemaphores(int status, Datum arg);
  *
  * Attempt to create a new semaphore set with the specified key.
  * Will fail (return -1) if such a set already exists.
- *
- * If we fail with a failure code other than collision-with-existing-set,
- * print out an error and abort.  Other types of errors suggest nonrecoverable
- * problems.
  */
 static IpcSemaphoreId
 InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems)
@@ -118,10 +114,33 @@ InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems)
             return -1;
 
         /*
-         * Else complain and abort
+         * We might be in situation when we have "just enough" semaphores in the system
+         * limit to start but previously crashed unexpectedly, in that case we won't be
+         * able to start again - semget() will return ENOSPC, despite the semaphores
+         * are ours, and we can recycle them, so check this situation and try to remove
+         * the semaphore, if we are unable - give up and abort.
+         *
+         * We use same semkey for every start - it's gotten from inode number of the
+         * data folder. So on repeated starts we will use the same key.
          */
+        if (saved_errno == ENOSPC)
+        {
+            union semun        semun;
+
+            semId = semget(semKey, 0, 0);
+
+            semun.val = 0;            /* unused, but keep compiler quiet */
+            if (semctl(semId, 0, IPC_RMID, semun) == 0)
+            {
+                /* Recycled - get the same semaphore again */
+                semId = semget(semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection);
+
+                return semId;
+            }
+        }
+
         ereport(FATAL,
-                (errmsg("could not create semaphores: %m"),
+                (errmsg("could not create semaphores: %s", strerror(saved_errno)),
                  errdetail("Failed system call was semget(%lu, %d, 0%o).",
                            (unsigned long) semKey, numSems,
                            IPC_CREAT | IPC_EXCL | IPCProtection),
-- 
2.33.0



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Tom Lane
Date:
mp39590@gmail.com writes:
> We might be in situation when we have "just enough" semaphores in the
> system limit to start but previously crashed unexpectedly, in that case
> we won't be able to start again - semget() will return ENOSPC, despite
> the semaphores are ours, and we can recycle them, so check this
> situation and try to remove the semaphore, if we are unable - give up
> and abort.

AFAICS, this patch could be disastrous.  What if the semaphore in
question belongs to some other postmaster?

Also, you haven't explained why the existing (and much safer) recycling
logic in IpcSemaphoreCreate doesn't solve your problem.

            regards, tom lane



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Sun, Oct 17, 2021 at 10:29:24AM -0400, Tom Lane wrote:
> mp39590@gmail.com writes:
> > We might be in situation when we have "just enough" semaphores in the
> > system limit to start but previously crashed unexpectedly, in that case
> > we won't be able to start again - semget() will return ENOSPC, despite
> > the semaphores are ours, and we can recycle them, so check this
> > situation and try to remove the semaphore, if we are unable - give up
> > and abort.
> 
> AFAICS, this patch could be disastrous.  What if the semaphore in
> question belongs to some other postmaster?

Does running more than one postmaster on the same PGDATA is supported at
all? Currently seed for the semaphore key is inode number of PGDATA.

> Also, you haven't explained why the existing (and much safer) recycling
> logic in IpcSemaphoreCreate doesn't solve your problem.

The logic of creating semas:

218         /* Loop till we find a free IPC key */
219         for (nextSemaKey++;; nextSemaKey++)
220         {
221                 pid_t           creatorPID;
222 
223                 /* Try to create new semaphore set */
224                 semId = InternalIpcSemaphoreCreate(nextSemaKey, numSems + 1);
225                 if (semId >= 0)
226                         break;                          /* successful create */

InternalIpcSemaphoreCreate:

101         semId = semget(semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection);
102 
103         if (semId < 0)
104         {
105                 int                     saved_errno = errno;
106 
[...]
113                 if (saved_errno == EEXIST || saved_errno == EACCES
114 #ifdef EIDRM
115                         || saved_errno == EIDRM
116 #endif
117                         )
118                         return -1;
119 
120                 /*
121                  * Else complain and abort
122                  */
123                 ereport(FATAL, [...]

semget() returns ENOSPC, so InternalIpcSemaphoreCreate doesn't return -1
so the whole logic of IpcSemaphoreCreate is not checked.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Tom Lane
Date:
Mikhail <mp39590@gmail.com> writes:
> On Sun, Oct 17, 2021 at 10:29:24AM -0400, Tom Lane wrote:
>> AFAICS, this patch could be disastrous.  What if the semaphore in
>> question belongs to some other postmaster?

> Does running more than one postmaster on the same PGDATA is supported at
> all? Currently seed for the semaphore key is inode number of PGDATA.

That hardly guarantees no collisions.  If it did, we'd never have bothered
with the PGSemaMagic business or the IpcSemaphoreGetLastPID check.

>> Also, you haven't explained why the existing (and much safer) recycling
>> logic in IpcSemaphoreCreate doesn't solve your problem.

> semget() returns ENOSPC, so InternalIpcSemaphoreCreate doesn't return -1
> so the whole logic of IpcSemaphoreCreate is not checked.

Hmm.  Maybe you could improve this by removing the first
InternalIpcSemaphoreCreate call in IpcSemaphoreCreate, and
rearranging the logic so that the first step consists of seeing
whether a sema set is already there (and can safely be zapped),
and only then proceed with creation.

I am, however, concerned that this'll just trade off one hazard for
another.  Instead of a risk of failing with ENOSPC (which the DBA
can fix), we'll have a risk of kneecapping some other process at
random (which the DBA can do nothing to prevent).

I'm also fairly unclear on when the logic you propose would trigger
at all.  If the sema set is already there, I'd expect EEXIST or
equivalent, not ENOSPC.

            regards, tom lane



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Sun, Oct 17, 2021 at 10:52:38AM -0400, Tom Lane wrote:
> Mikhail <mp39590@gmail.com> writes:
> > On Sun, Oct 17, 2021 at 10:29:24AM -0400, Tom Lane wrote:
> >> AFAICS, this patch could be disastrous.  What if the semaphore in
> >> question belongs to some other postmaster?
> 
> > Does running more than one postmaster on the same PGDATA is supported at
> > all? Currently seed for the semaphore key is inode number of PGDATA.
> 
> That hardly guarantees no collisions.  If it did, we'd never have bothered
> with the PGSemaMagic business or the IpcSemaphoreGetLastPID check.

Got it, makes sense. Also, I was presented with examples that inode
number can be reused across mounting points for different clusters.

> >> Also, you haven't explained why the existing (and much safer) recycling
> >> logic in IpcSemaphoreCreate doesn't solve your problem.
> 
> > semget() returns ENOSPC, so InternalIpcSemaphoreCreate doesn't return -1
> > so the whole logic of IpcSemaphoreCreate is not checked.
> 
> Hmm.  Maybe you could improve this by removing the first
> InternalIpcSemaphoreCreate call in IpcSemaphoreCreate, and
> rearranging the logic so that the first step consists of seeing
> whether a sema set is already there (and can safely be zapped),
> and only then proceed with creation.

I think, I can look into this on the next weekend. On first glance the
solution works for me.

> I am, however, concerned that this'll just trade off one hazard for
> another.  Instead of a risk of failing with ENOSPC (which the DBA
> can fix), we'll have a risk of kneecapping some other process at
> random (which the DBA can do nothing to prevent).

Good argument, but I'll try to make second version of the patch with the
proposed logic change to see what we will get. I think it's "right"
behavior to recycle our own used semaphores, so the whole approach is
correct.

> I'm also fairly unclear on when the logic you propose would trigger
> at all.  If the sema set is already there, I'd expect EEXIST or
> equivalent, not ENOSPC.

The logic works - the initial call to semget() in
InternalIpcSemaphoreCreate returns -1 and errno is set to ENOSPC - I
tested the patch on OpenBSD 7.0, it successfully recycles sem's after
previous "pkill -6 postgres". Verified it with 'ipcs -s'.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Thomas Munro
Date:
On Mon, Oct 18, 2021 at 4:49 AM Mikhail <mp39590@gmail.com> wrote:
> The logic works - the initial call to semget() in
> InternalIpcSemaphoreCreate returns -1 and errno is set to ENOSPC - I
> tested the patch on OpenBSD 7.0, it successfully recycles sem's after
> previous "pkill -6 postgres". Verified it with 'ipcs -s'.

Since you mentioned OpenBSD, what do you think of the idea of making
named POSIX semas the default on that platform?  You can't run out of
those practically speaking, but then you get lots of little memory
mappings (from memory, at least it does close the fd for each one,
unlike some other OSes where we wouldn't want to use this technique).
Trivial patch:

https://www.postgresql.org/message-id/CA%2BhUKGJVSjiDjbJpHwUrvA1TikFnJRfyJanrHofAWhnqcDJayQ%40mail.gmail.com

No strong opinion on the tradeoffs here, as I'm not an OpenBSD user,
but it's something I think about whenever testing portability stuff
there and having to adjust the relevant sysctls.

Note: The best kind would be *unnamed* POSIX semas, where we get to
control their placement in existing memory; that's what we do on Linux
and FreeBSD.  They weren't supported on OpenBSD last time we checked:
it rejects requests for shared ones.  I wonder if someone could
implement them with just a few lines of user space code, using atomic
counters and futex() for waiting.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Mon, Oct 18, 2021 at 10:07:40AM +1300, Thomas Munro wrote:
> On Mon, Oct 18, 2021 at 4:49 AM Mikhail <mp39590@gmail.com> wrote:
> > The logic works - the initial call to semget() in
> > InternalIpcSemaphoreCreate returns -1 and errno is set to ENOSPC - I
> > tested the patch on OpenBSD 7.0, it successfully recycles sem's after
> > previous "pkill -6 postgres". Verified it with 'ipcs -s'.
> 
> Since you mentioned OpenBSD, what do you think of the idea of making
> named POSIX semas the default on that platform?  You can't run out of
> those practically speaking, but then you get lots of little memory
> mappings (from memory, at least it does close the fd for each one,
> unlike some other OSes where we wouldn't want to use this technique).
> Trivial patch:
> 
> https://www.postgresql.org/message-id/CA%2BhUKGJVSjiDjbJpHwUrvA1TikFnJRfyJanrHofAWhnqcDJayQ%40mail.gmail.com
> 
> No strong opinion on the tradeoffs here, as I'm not an OpenBSD user,
> but it's something I think about whenever testing portability stuff
> there and having to adjust the relevant sysctls.
> 
> Note: The best kind would be *unnamed* POSIX semas, where we get to
> control their placement in existing memory; that's what we do on Linux
> and FreeBSD.  They weren't supported on OpenBSD last time we checked:
> it rejects requests for shared ones.  I wonder if someone could
> implement them with just a few lines of user space code, using atomic
> counters and futex() for waiting.

Hello, sorry for not replying earlier - I was able to think about and
test the patch only on the weekend.

I totally agree with your approach, in conversation with one of the
OpenBSD developers he supported using of sem_open(), because most ports
use it and consistency is desirable across our ports tree. It looks like
PostgreSQL was the only port to use semget().

Switching to sem_open() also looks much safer than patching sysv_sema.c
for corner ENOSPC case as Tom already mentioned.

In your patch I've removed testing for 5.x versions, because official
releases are supported only for one year, no need to worry about them.
The patch is tested with 'make installcheck', also I can confirm that
'ipcs' shows that no semaphores are used, and server starts normally
after 'pkill -6 postgres' with the default semmns sysctl, what was the
original motivation for the work.


diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index d74d1ed7af..2dfea0662b 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -998,21 +998,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such
        <para>
         The default shared memory settings are usually good enough, unless
         you have set <literal>shared_memory_type</literal> to <literal>sysv</literal>.
-        You will usually want to
-        increase <literal>kern.seminfo.semmni</literal>
-        and <literal>kern.seminfo.semmns</literal>,
-        as <systemitem class="osname">OpenBSD</systemitem>'s default settings
-        for these are uncomfortably small.
-       </para>
-
-       <para>
-        IPC parameters can be adjusted using <command>sysctl</command>,
-        for example:
-<screen>
-<prompt>#</prompt> <userinput>sysctl kern.seminfo.semmni=100</userinput>
-</screen>
-        To make these settings persist over reboots, modify
-        <filename>/etc/sysctl.conf</filename>.
+        System V semaphores are not used on this platform.
        </para>
 
       </listitem>
diff --git a/src/template/openbsd b/src/template/openbsd
index 365268c489..41221af382 100644
--- a/src/template/openbsd
+++ b/src/template/openbsd
@@ -2,3 +2,7 @@
 
 # Extra CFLAGS for code that will go into a shared library
 CFLAGS_SL="-fPIC -DPIC"
+
+# OpenBSD 5.5 (2014) gained named POSIX semaphores.  They work out of the box
+# without changing any sysctl settings, unlike System V semaphores.
+USE_NAMED_POSIX_SEMAPHORES=1



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Tom Lane
Date:
Mikhail <mp39590@gmail.com> writes:
> In your patch I've removed testing for 5.x versions, because official
> releases are supported only for one year, no need to worry about them.

Official support or no, we have OpenBSD 5.9 in our buildfarm, so
ignoring the case isn't going to fly.

            regards, tom lane



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Thomas Munro
Date:
On Sat, Oct 23, 2021 at 8:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mikhail <mp39590@gmail.com> writes:
> > In your patch I've removed testing for 5.x versions, because official
> > releases are supported only for one year, no need to worry about them.
>
> Official support or no, we have OpenBSD 5.9 in our buildfarm, so
> ignoring the case isn't going to fly.

It was a test for < 5.5, so that aspect's OK.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Fri, Oct 22, 2021 at 03:43:00PM -0400, Tom Lane wrote:
> Mikhail <mp39590@gmail.com> writes:
> > In your patch I've removed testing for 5.x versions, because official
> > releases are supported only for one year, no need to worry about them.
> 
> Official support or no, we have OpenBSD 5.9 in our buildfarm, so
> ignoring the case isn't going to fly.

5.9 has support for unnamed POSIX semas. Do you think new machine with
OpenBSD <5.5 (when unnamed POSIX semas were introduced) can appear in
buildfarm or be used by real customer?

I have no objections on testing "openbsd5.[01234]" and using SysV semas
there and can redo and test the patch, but isn't it over caution?



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Tom Lane
Date:
Mikhail <mp39590@gmail.com> writes:
> On Fri, Oct 22, 2021 at 03:43:00PM -0400, Tom Lane wrote:
>> Official support or no, we have OpenBSD 5.9 in our buildfarm, so
>> ignoring the case isn't going to fly.

> 5.9 has support for unnamed POSIX semas. Do you think new machine with
> OpenBSD <5.5 (when unnamed POSIX semas were introduced) can appear in
> buildfarm or be used by real customer?

Nah, I misunderstood you to say that 5.9 would also be affected.

            regards, tom lane



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Tom Lane
Date:
Mikhail <mp39590@gmail.com> writes:
> +# OpenBSD 5.5 (2014) gained named POSIX semaphores.  They work out of the box
> +# without changing any sysctl settings, unlike System V semaphores.
> +USE_NAMED_POSIX_SEMAPHORES=1

I tried this on an OpenBSD 6.0 image I had handy.  The good news is
that it works, and I can successfully start the postmaster with a lot
of semaphores (I tried with max_connections=10000) without any special
system configuration.  The bad news is it's *slow*.  It takes the
postmaster over a minute to start up at 10000 max_connections, and
also about 15 seconds to shut down.  The regression tests also appear
noticeably slower, even at the default max_connections=100.  I'm
afraid that those "lots of tiny mappings" that Thomas noted have
a nasty impact on our process launch times, since the kernel
presumably has to do work to clone them into the child process.

Now this lashup that I'm testing on is by no means well suited for
performance tests, so maybe my numbers are bogus.  Also, maybe it's
better in more recent OpenBSD releases.  But I think we need to take a
harder look at performance before we decide that it's okay to change
the default semaphore type for this platform.

            regards, tom lane



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Fri, Oct 22, 2021 at 09:00:31PM -0400, Tom Lane wrote:
> I tried this on an OpenBSD 6.0 image I had handy.  The good news is
> that it works, and I can successfully start the postmaster with a lot
> of semaphores (I tried with max_connections=10000) without any special
> system configuration.  The bad news is it's *slow*.  It takes the
> postmaster over a minute to start up at 10000 max_connections, and
> also about 15 seconds to shut down.  The regression tests also appear
> noticeably slower, even at the default max_connections=100.  I'm
> afraid that those "lots of tiny mappings" that Thomas noted have
> a nasty impact on our process launch times, since the kernel
> presumably has to do work to clone them into the child process.
> 
> Now this lashup that I'm testing on is by no means well suited for
> performance tests, so maybe my numbers are bogus.  Also, maybe it's
> better in more recent OpenBSD releases.  But I think we need to take a
> harder look at performance before we decide that it's okay to change
> the default semaphore type for this platform.

I got following results for "time make installcheck" on a laptop with
OpenBSD 7.0 (amd64):

POSIX (max_connections=100) (default):    1m32.39s real 0m03.82s user 0m05.75s system
POSIX (max_connections=10000):        2m13.11s real 0m03.56s user 0m07.06s system

SysV (max_connections=100) (default):    1m24.39s real 0m03.30s user 0m04.94s system
SysV (max_connections=10000):        failed to start
after sysctl tunning:
SysV (max_connections=10000):        1m47.51s real 0m03.78s user 0m05.61s system

I can confirm that start and stop of the server was slower in POSIX
case, but not terribly different (seconds, not a minute, as in your
case).

As the OpenBSD developers said - those who use OpenBSD are never after a
good performance, the system has a lot of bottlenecks except IPCs.

I see following reasons to switch from SysV to POSIX:

- consistency in the ports tree, all major ports use POSIX, it means
  better testing of the API
- as already pointed out - OpenBSD isn't about performance, and the
  results for default max_connections are pretty close
- crash recovery with the OS defaults is automatic and don't require DBA
  intervention and knowledge of ipcs and ipcrm
- higher density is available without system tuning

The disadvantage is in a worse performance for extreme cases, but I'm
not sure OpenBSD is used for them in production.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Sun, Oct 17, 2021 at 10:52:38AM -0400, Tom Lane wrote:
> I am, however, concerned that this'll just trade off one hazard for
> another.  Instead of a risk of failing with ENOSPC (which the DBA
> can fix), we'll have a risk of kneecapping some other process at
> random (which the DBA can do nothing to prevent).

I tend to agree, and along with semas patch would like to suggest error
message improvement, it would have saved me about half a day of digging.
Tested on OpenBSD 7.0.

I'm not a native speaker though, so grammar need to be checked.

diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index 21c883ba9a..b84f70b5e2 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -133,7 +133,10 @@ InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems)
                          "respective kernel parameter.  Alternatively, reduce PostgreSQL's "
                          "consumption of semaphores by reducing its max_connections parameter.\n"
                          "The PostgreSQL documentation contains more information about "
-                         "configuring your system for PostgreSQL.") : 0));
+                         "configuring your system for PostgreSQL.\n"
+                         "If server has crashed previously there may be resources left "
+                         "after it - take a look at ipcs(1) and ipcrm(1) man pages to see "
+                         "how to remove them.") : 0));
     }
 
     return semId;



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Thomas Munro
Date:
On Mon, Oct 18, 2021 at 10:07 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Note: The best kind would be *unnamed* POSIX semas, where we get to
> control their placement in existing memory; that's what we do on Linux
> and FreeBSD.  They weren't supported on OpenBSD last time we checked:
> it rejects requests for shared ones.  I wonder if someone could
> implement them with just a few lines of user space code, using atomic
> counters and futex() for waiting.

I meant that it'd be cool if OpenBSD implemented shared memory unnamed
semas that way (as other OSes do), but just for fun I tried
implementing that in PostgreSQL.  I already had a patch to provide a
wrapper API for futexes on a bunch of OSes including OpenBSD (because
I've been looking into ways to rewrite lwlock.c to use futexes
directly and skip all the per-backend semaphore stuff).  That made it
easy to write a quick-and-dirty clone of sem_{init,wait,post}() using
atomics and futexes.

Sadly, although the attached proof-of-concept patch allows a
PREFERRED_SEMAPHORES=FUTEX build to pass tests on macOS (which also
lacks native unnamed semas), FreeBSD and Linux (which don't need this
but are interesting to test), and it also works on OpenBSD with
shared_memory_type=sysv, it doesn't work on OpenBSD with
shared_memory_type=mmap (the default).  I suspect OpenBSD's futex(2)
has a bug: inherited anonymous shared mmap memory seems to confuse it
so that wakeups are lost.  Arrrgh!

Attachment

Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Mikhail
Date:
On Sun, Oct 17, 2021 at 10:29:24AM -0400, Tom Lane wrote:
> Also, you haven't explained why the existing (and much safer) recycling
> logic in IpcSemaphoreCreate doesn't solve your problem.

I think I'll drop the diffs, you're right that current proven logic need
not to be changed for such rare corner case, which DBA can fix.

I've added references to ipcs(1) and ipcrm(1) in OpenBSD's semget(2) man
page, so newcomer won't need to spend hours digging in sysv semas
management, if one would encounter the same situation as I did.

Thanks for reviews.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Thomas Munro
Date:
On Sun, Oct 24, 2021 at 10:50 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Sadly, although the attached proof-of-concept patch allows a
> PREFERRED_SEMAPHORES=FUTEX build to pass tests on macOS (which also
> lacks native unnamed semas), FreeBSD and Linux (which don't need this
> but are interesting to test), and it also works on OpenBSD with
> shared_memory_type=sysv, it doesn't work on OpenBSD with
> shared_memory_type=mmap (the default).  I suspect OpenBSD's futex(2)
> has a bug: inherited anonymous shared mmap memory seems to confuse it
> so that wakeups are lost.  Arrrgh!

FWIW I'm trying to follow up with the OpenBSD list over here, because
it'd be nice to get that working:

https://marc.info/?l=openbsd-misc&m=163524454303022&w=2



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Thomas Munro
Date:
On Fri, Oct 29, 2021 at 4:54 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sun, Oct 24, 2021 at 10:50 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Sadly, although the attached proof-of-concept patch allows a
> > PREFERRED_SEMAPHORES=FUTEX build to pass tests on macOS (which also
> > lacks native unnamed semas), FreeBSD and Linux (which don't need this
> > but are interesting to test), and it also works on OpenBSD with
> > shared_memory_type=sysv, it doesn't work on OpenBSD with
> > shared_memory_type=mmap (the default).  I suspect OpenBSD's futex(2)
> > has a bug: inherited anonymous shared mmap memory seems to confuse it
> > so that wakeups are lost.  Arrrgh!
>
> FWIW I'm trying to follow up with the OpenBSD list over here, because
> it'd be nice to get that working:
>
> https://marc.info/?l=openbsd-misc&m=163524454303022&w=2

This has been fixed.  So now there are working basic futexes on Linux,
macOS, {Free,Open,Net,Dragonfly}BSD (though capabilities beyond basic
wait/wake vary, as do APIs).  So the question is whether it would be
worth trying to do our own futex-based semaphores, as sketched above,
just for the benefit of the OSes where the available built-in
semaphores are of the awkward SysV kind, namely macOS, NetBSD and
OpenBSD.  Perhaps we shouldn't waste our time with that, and should
instead plan to use futexes for a more ambitious lwlock rewrite.



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> This has been fixed.  So now there are working basic futexes on Linux,
> macOS, {Free,Open,Net,Dragonfly}BSD (though capabilities beyond basic
> wait/wake vary, as do APIs).  So the question is whether it would be
> worth trying to do our own futex-based semaphores, as sketched above,
> just for the benefit of the OSes where the available built-in
> semaphores are of the awkward SysV kind, namely macOS, NetBSD and
> OpenBSD.  Perhaps we shouldn't waste our time with that, and should
> instead plan to use futexes for a more ambitious lwlock rewrite.

I kind of like the latter idea, but I wonder how we make it coexist
with (admittedly legacy) code for OSes that don't have usable futexes.

            regards, tom lane



Re: [PATCH] Make ENOSPC not fatal in semaphore creation

From
Thomas Munro
Date:
On Sat, Nov 20, 2021 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > This has been fixed.  So now there are working basic futexes on Linux,
> > macOS, {Free,Open,Net,Dragonfly}BSD (though capabilities beyond basic
> > wait/wake vary, as do APIs).  So the question is whether it would be
> > worth trying to do our own futex-based semaphores, as sketched above,
> > just for the benefit of the OSes where the available built-in
> > semaphores are of the awkward SysV kind, namely macOS, NetBSD and
> > OpenBSD.  Perhaps we shouldn't waste our time with that, and should
> > instead plan to use futexes for a more ambitious lwlock rewrite.
>
> I kind of like the latter idea, but I wonder how we make it coexist
> with (admittedly legacy) code for OSes that don't have usable futexes.

One very rough idea, not yet tried, is that they could keep using
semaphores, but use them to implement fake futexes.  We'd put them in
wait lists that live in a shared memory hash table (the futex address
is the key, with some extra work needed for DSM-resident futexes),
with per-bucket spinlocks so that you can perform the value check
atomically with the decision to start waiting.