Re: OpenBSD versus semaphores - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: OpenBSD versus semaphores
Date
Msg-id CAEepm=2ndy5RSABeaf3L1hFhXoBSg09RvgudfTWfbn=DMUbJ3w@mail.gmail.com
Whole thread Raw
In response to OpenBSD versus semaphores  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: OpenBSD versus semaphores
List pgsql-hackers
On Tue, Jan 8, 2019 at 7:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've been toying with OpenBSD lately, and soon noticed a seriously
> annoying problem for running Postgres on it: by default, its limits
> for SysV semaphores are only SEMMNS=60, SEMMNI=10.  Not only does that
> greatly constrain the number of connections for a single installation,
> it means that our TAP tests fail because you can't start two postmasters
> concurrently (cf [1]).
>
> Raising the annoyance factor considerably, AFAICT the only way to
> increase these settings is to build your own custom kernel.
>
> So I looked around for an alternative, and found out that modern
> OpenBSD releases support named POSIX semaphores (though not unnamed
> ones, at least not shared unnamed ones).  What's more, it appears that
> in this implementation, named semaphores don't eat open file descriptors
> as they do on macOS, removing our major objection to using them.
>
> I don't have any OpenBSD installation on hardware that I'd take very
> seriously for performance testing, but some light testing with
> "pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
> has just about the same performance as a build with SysV semaphores.
>
> This all leads to the thought that maybe we should be selecting
> PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD.  At the very least,
> our docs ought to recommend it as a credible alternative for
> people who don't want to get into building custom kernels.
>
> I've checked that this works back to OpenBSD 6.0, and scanning
> their man pages suggests that the feature appeared in 5.5.
> 5.5 isn't that old (2014) so possibly people are still running
> older versions, but we could easily put in version-specific
> default logic similar to what's in src/template/darwin.
>
> Thoughts?

No OpenBSD here, but I was curious enough to peek at their
implementation.  Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away.  Apparently don't
support shared sem_init() yet (EPERM).  So your plan seems good to me.
CC'ing Pierre-Emmanuel (OpenBSD PostgreSQL port maintainer) in case he
is interested.

Wild speculation:  I wouldn't be surprised if POSIX named semas
perform better than SysV semas on a large enough system, since they'll
live on different pages.  At a glance, their sys_semget apparently
allocates arrays of struct sem without padding and I think they
probably get about 4 to a cacheline; see our experience with an 8
socket box leading to commit 2d306759 where we added our own padding.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: speeding up planning with partitions
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Improve selectivity estimate for range queries