Thread: BUG #14206: Switch to using POSIX semaphores on FreeBSD

BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
sobomax@freebsd.org
Date:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIwNgpMb2dnZWQgYnk6ICAg
ICAgICAgIE1ha3N5bSBTb2JvbHlldgpFbWFpbCBhZGRyZXNzOiAgICAgIHNv
Ym9tYXhAZnJlZWJzZC5vcmcKUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjUuMgpP
cGVyYXRpbmcgc3lzdGVtOiAgIEZyZWVCU0QgMTAuMy1SRUxFQVNFIGFtZDY0
CkRlc2NyaXB0aW9uOiAgICAgICAgCgpUcmFkaXRpb25hbGx5LCBTWVNWIHNl
bWFwaG9yZXMgYXJlIHVzZWQgdG8gZG8gc3luY2hyb25pemF0aW9uIG9uIEZy
ZWVCU0QuDQoNCkhvd2V2ZXIsIGFjY29yZGluZyB0byB0aGUgYW5hbHlzaXMg
ZG9uZSBieSBLb25zdGFudGluIEJlbG91c292IGhlcmUKaHR0cHM6Ly93d3cu
a2liLmtpZXYudWEva2liL3Bnc3FsX3BlcmZfdjIuMC5wZGYgdGhlcmUgaXMg
YXQgdGhlIHZlcnkgbGVhc3QKc29tZSBwZXJmb3JtYW5jZSBiZW5lZml0IG9u
IHVzaW5nIFBPU0lYIHNlbWFwaG9uZXMgaW5zdGVhZCBvZiBTWVNWCnNlbWFw
aG9uZXMgaW4gdGhlIFBHIHJ1bm5pbmcgb24gRnJlZUJTRCBob3N0Lg0KDQpJ
biBhZGRpdGlvbiB0byB0aGF0IHBlcmZvcm1hbmNlIGJlbmVmaXQsIHRoZSBT
WVNWIHByaW1pdGl2ZXMgYXJlIHVzdWFsbHkKdmVyeSBsaW1pdGVkIHJlc291
cmNlIGJ5IGRlZmF1bHQsIHNvIGluIG9yZGVyIHRvIHJ1biBhbnkgbW9yZSBv
ciBsZXNzCnNpZ25pZmljYW50IGFtb3VudCBvZiBjb25uZWN0aW9ucyBvbiB5
b3VyIERCIHNlcnZlciB5b3UgbmVlZCB0byB0d2VhayBrZXJuZWwKb3B0aW9u
IHRvIGluY3JlYXNlIG51bWJlciBvZiB0aG9zZS4gQW5kIGxhc3QgYnV0IG5v
dCBsZWFzdCwgIFNZU1YgcHJpbWl0aXZlcwpvbmNlIGFsbG9jYXRlZCBuZWVk
IGV4cGxpY2l0IHJlbW92YWwsIHdoaWNoIG1pZ2h0IG5vdCBiZSBwZXJmb3Jt
ZWQgd2hlbiBQRwpwcm9jZXNzIGRpZXMgb3IgU0lHS0lMTGVkLiBOb25lIG9m
IHRob3NlIGlzIGFuIGlzc3VlIHdpdGggUE9TSVgKc2VtYXBob3Jlcy4NCg0K
V2UndmUgYmVlbiB0ZXN0aW5nIHRoYXQgcGF0Y2ggb24gOS4xLCA5LjIgYW5k
IDkuNSB2ZXJzaW9ucyBvZiB0aGUgUEcgZm9yIGZldwp3ZWVrcyBub3cgYW5k
IGl0IHBlcmZvcm1zIGF0IGxlYXN0IGFzIGdvb2QgYXMgd2l0aCBvbGQgU1lT
ViBidWlsZHMuIFdlIGFsc28Kc2VlIGRyb3Agb2Ygc2VtYXBob3JlcyBpbiB1
c2UgdG8gMCBpbiB0aGUgaXBjcygxKSBvdXRwdXQsIHNvIHRoYXQgdGhlIHBh
dGNoCmFjdHVhbGx5IGRvZXMgd2hhdCBpdCdzIHN1cHBvc2VkIHRvIGRvLiAN
Cg0KLS0tIHNyYy90ZW1wbGF0ZS9mcmVlYnNkDQorKysgc3JjL3RlbXBsYXRl
L2ZyZWVic2QNCkBAIC0zLDMgKzMsNCBAQA0KIGNhc2UgJGhvc3RfY3B1IGlu
DQogICBhbHBoYSopICAgQ0ZMQUdTPSItTyI7OyAgIyBhbHBoYSBoYXMgcHJv
YmxlbXMgd2l0aCAtTzINCiBlc2FjDQorVVNFX05BTUVEX1BPU0lYX1NFTUFQ
SE9SRVM9MQ0KCgo=

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Tom Lane
Date:
sobomax@freebsd.org writes:
> However, according to the analysis done by Konstantin Belousov here
> https://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf there is at the very least
> some performance benefit on using POSIX semaphones instead of SYSV
> semaphones in the PG running on FreeBSD host.

I wonder how thorough that performance testing was.  The reason that the
named-POSIX-semaphore code exists is that it used to be the only kind of
semaphore available on ancient OS X versions.  But we got rid of that as
soon as we could, for the reason explained in template/darwin:

# Select appropriate semaphore support.  Darwin 6.0 (Mac OS X 10.2) and up
# support System V semaphores; before that we have to use POSIX semaphores,
# which are less good for our purposes because they eat a file descriptor
# per backend per max_connection slot.

The extra FDs slow down launching of new backends (due to having to dup
all the postmaster's FDs for the semaphores) and if max_connections is
large they can take a pretty serious chunk out of your system-wide file
table, at worst max_connections squared.

Now maybe FreeBSD is different enough from OSX that these are not problems
for you, but I'm dubious.

Have you got unnamed POSIX semaphores, and if so have you tried that
variant?

            regards, tom lane

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Maxim Sobolev
Date:
Tom, thanks for looking at it so promptly. I am adding kib@ into the
discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and
named vs. unnamed.

As far as I can tell, the sem_init(3) interface is present in the FreeBSD
10.3, so maybe we can use those instead?

-Max

On Tue, Jun 21, 2016 at 12:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> sobomax@freebsd.org writes:
> > However, according to the analysis done by Konstantin Belousov here
> > https://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf there is at the very
> least
> > some performance benefit on using POSIX semaphones instead of SYSV
> > semaphones in the PG running on FreeBSD host.
>
> I wonder how thorough that performance testing was.  The reason that the
> named-POSIX-semaphore code exists is that it used to be the only kind of
> semaphore available on ancient OS X versions.  But we got rid of that as
> soon as we could, for the reason explained in template/darwin:
>
> # Select appropriate semaphore support.  Darwin 6.0 (Mac OS X 10.2) and up
> # support System V semaphores; before that we have to use POSIX semaphores,
> # which are less good for our purposes because they eat a file descriptor
> # per backend per max_connection slot.
>
> The extra FDs slow down launching of new backends (due to having to dup
> all the postmaster's FDs for the semaphores) and if max_connections is
> large they can take a pretty serious chunk out of your system-wide file
> table, at worst max_connections squared.
>
> Now maybe FreeBSD is different enough from OSX that these are not problems
> for you, but I'm dubious.
>
> Have you got unnamed POSIX semaphores, and if so have you tried that
> variant?
>
>                         regards, tom lane
>
>

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Tom Lane
Date:
Maxim Sobolev <sobomax@freebsd.org> writes:
> Tom, thanks for looking at it so promptly. I am adding kib@ into the
> discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and
> named vs. unnamed.

BTW, I trawled our archives and found this thread concerning the switch
from POSIX to SYSV on OS X:

https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com

I'm not sure what you were using to decide that POSIX semaphores were
okay, but the points in that thread about pgbench not being a very
good test case remain relevant.

> As far as I can tell, the sem_init(3) interface is present in the FreeBSD
> 10.3, so maybe we can use those instead?

If that seems like a competitive alternative for you, it'd be nice to have
a platform where we use unnamed POSIX semaphores by default.  I'm a little
worried about whether that code has suffered bit-rot, since it's been
sitting there basically unused for so long.

            regards, tom lane

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Konstantin Belousov
Date:
On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote:
> Maxim Sobolev <sobomax@freebsd.org> writes:
> > Tom, thanks for looking at it so promptly. I am adding kib@ into the
> > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and
> > named vs. unnamed.
>
> BTW, I trawled our archives and found this thread concerning the switch
> from POSIX to SYSV on OS X:
>
> https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
>
> I'm not sure what you were using to decide that POSIX semaphores were
> okay, but the points in that thread about pgbench not being a very
> good test case remain relevant.
>
> > As far as I can tell, the sem_init(3) interface is present in the FreeBSD
> > 10.3, so maybe we can use those instead?
>
> If that seems like a competitive alternative for you, it'd be nice to have
> a platform where we use unnamed POSIX semaphores by default.  I'm a little
> worried about whether that code has suffered bit-rot, since it's been
> sitting there basically unused for so long.

On FreeBSD, there is no practical difference in the resource consumption
for named vs. unnamed semaphore. I mean that after sem_open(3) call, an
open file descriptor is not kept in the process fd table. The semaphore
is represented by the mmaped page, libc+kernel operate solely on the
page content and use umtx(2) to implement counted semaphore.

In other words, no, there is no additional overhead of starting
connection when using either named or unnamed (sem_init(3)) POSIX
semaphores on FreeBSD, and there is no any open files overhead.

That said, the problem with the SysV semaphores is that API allows
operations on arbitrary sets of the semaphores. Unless some unordinary
and complex measures are taken, implementation has to use global
internal lock to synchronize semop(2). This is what I noted in the
paper.

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Maxim Sobolev
Date:
Konstantin, would it be too much to ask to start running your tests using
unnamed semaphores instead for your performance work? As far as I properly
understand what Tom said, named code was kinda one-off workaround for some
specific ancient version of Darwin not used by any other platform that the
PG cares about, so it might rot and/or get nuked eventually. Therefore, we
might get better chance to get our changes accepted into the PostgreSQL if
we use unnamed option. And they are not using "named" part anyway for
anything functionally important, so unnamed POSIX semaphore is naturally
the best primitive to use. This might also stir some interests among other
OSes to switch to that. Thanks!

-Max

On Wed, Jun 22, 2016 at 3:00 AM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote:
> > Maxim Sobolev <sobomax@freebsd.org> writes:
> > > Tom, thanks for looking at it so promptly. I am adding kib@ into the
> > > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD
> and
> > > named vs. unnamed.
> >
> > BTW, I trawled our archives and found this thread concerning the switch
> > from POSIX to SYSV on OS X:
> >
> >
> https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
> >
> > I'm not sure what you were using to decide that POSIX semaphores were
> > okay, but the points in that thread about pgbench not being a very
> > good test case remain relevant.
> >
> > > As far as I can tell, the sem_init(3) interface is present in the
> FreeBSD
> > > 10.3, so maybe we can use those instead?
> >
> > If that seems like a competitive alternative for you, it'd be nice to
> have
> > a platform where we use unnamed POSIX semaphores by default.  I'm a
> little
> > worried about whether that code has suffered bit-rot, since it's been
> > sitting there basically unused for so long.
>
> On FreeBSD, there is no practical difference in the resource consumption
> for named vs. unnamed semaphore. I mean that after sem_open(3) call, an
> open file descriptor is not kept in the process fd table. The semaphore
> is represented by the mmaped page, libc+kernel operate solely on the
> page content and use umtx(2) to implement counted semaphore.
>
> In other words, no, there is no additional overhead of starting
> connection when using either named or unnamed (sem_init(3)) POSIX
> semaphores on FreeBSD, and there is no any open files overhead.
>
> That said, the problem with the SysV semaphores is that API allows
> operations on arbitrary sets of the semaphores. Unless some unordinary
> and complex measures are taken, implementation has to use global
> internal lock to synchronize semop(2). This is what I noted in the
> paper.
>
>

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Tom Lane
Date:
Konstantin Belousov <kostikbel@gmail.com> writes:
> On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote:
>> If that seems like a competitive alternative for you, it'd be nice to have
>> a platform where we use unnamed POSIX semaphores by default.  I'm a little
>> worried about whether that code has suffered bit-rot, since it's been
>> sitting there basically unused for so long.

> On FreeBSD, there is no practical difference in the resource consumption
> for named vs. unnamed semaphore. I mean that after sem_open(3) call, an
> open file descriptor is not kept in the process fd table. The semaphore
> is represented by the mmaped page, libc+kernel operate solely on the
> page content and use umtx(2) to implement counted semaphore.

Is there any kernel-side resource at all?  The thing that concerns me
about the POSIX APIs is that it's not very clear whether anything gets
left behind if the database crashes.  The Linux man page for sem_destroy
says

       An unnamed semaphore should be destroyed with sem_destroy() before  the
       memory  in  which it is located is deallocated.  Failure to do this can
       result in resource leaks on some implementations.

and while they don't say that their own implementation has such a problem,
it's worrisome.  We go to some lengths to ensure that we can recycle SysV
semaphores after a crash, but there's no equivalent logic in the POSIX
semaphore code, and I don't see how it would even be possible to identify
leftover "unnamed" semaphores.

> That said, the problem with the SysV semaphores is that API allows
> operations on arbitrary sets of the semaphores. Unless some unordinary
> and complex measures are taken, implementation has to use global
> internal lock to synchronize semop(2). This is what I noted in the
> paper.

It's certainly true that semop(2) is more complicated than we need.
But in practice, we only call semop(2) when we need to sleep, or to
awaken a sleeping process, so I'm not sure that performance of it
matters a lot to us.

            regards, tom lane

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Maxim Sobolev
Date:
Tom, on the related note on merits of SYSV semaphores vs. POSIX the
handling of SYSV semaphore shortage in PG is, uggh, awful. The whole server
crashes (abort()s), bumping into that on production box is not fun at all
and it's particularly easy since the resource is severely constrained by
default. Ideally it should just deny the particular connection request. I
don't know if it's also true for POSIX primitives, but at least those are
by design more abundant. This is experienced with fairly recent PG 9.1,
we've been dealing with some of those crashes just last month. Maybe
something you guys need to consider for improvement, if you have not
already.

-Max

On Tue, Jun 21, 2016 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Maxim Sobolev <sobomax@freebsd.org> writes:
> > Tom, thanks for looking at it so promptly. I am adding kib@ into the
> > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and
> > named vs. unnamed.
>
> BTW, I trawled our archives and found this thread concerning the switch
> from POSIX to SYSV on OS X:
>
>
> https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
>
> I'm not sure what you were using to decide that POSIX semaphores were
> okay, but the points in that thread about pgbench not being a very
> good test case remain relevant.
>
> > As far as I can tell, the sem_init(3) interface is present in the FreeBSD
> > 10.3, so maybe we can use those instead?
>
> If that seems like a competitive alternative for you, it'd be nice to have
> a platform where we use unnamed POSIX semaphores by default.  I'm a little
> worried about whether that code has suffered bit-rot, since it's been
> sitting there basically unused for so long.
>
>                         regards, tom lane
>
>

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Konstantin Belousov
Date:
On Wed, Jun 22, 2016 at 10:48:50AM -0400, Tom Lane wrote:
> Konstantin Belousov <kostikbel@gmail.com> writes:
> > On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote:
> >> If that seems like a competitive alternative for you, it'd be nice to have
> >> a platform where we use unnamed POSIX semaphores by default.  I'm a little
> >> worried about whether that code has suffered bit-rot, since it's been
> >> sitting there basically unused for so long.
>
> > On FreeBSD, there is no practical difference in the resource consumption
> > for named vs. unnamed semaphore. I mean that after sem_open(3) call, an
> > open file descriptor is not kept in the process fd table. The semaphore
> > is represented by the mmaped page, libc+kernel operate solely on the
> > page content and use umtx(2) to implement counted semaphore.
>
> Is there any kernel-side resource at all?  The thing that concerns me
> about the POSIX APIs is that it's not very clear whether anything gets
> left behind if the database crashes.  The Linux man page for sem_destroy
> says
>
>        An unnamed semaphore should be destroyed with sem_destroy() before  the
>        memory  in  which it is located is deallocated.  Failure to do this can
>        result in resource leaks on some implementations.
>
> and while they don't say that their own implementation has such a problem,
> it's worrisome.  We go to some lengths to ensure that we can recycle SysV
> semaphores after a crash, but there's no equivalent logic in the POSIX
> semaphore code, and I don't see how it would even be possible to identify
> leftover "unnamed" semaphores.
On FreeBSD, it is only a memory page which is mmaped into all
processes-consumers of the unnamed semaphore. Of course, if the process
is blocked on semaphore, there is some bookkeeping done in kernel so
that post would find all waiters. But it is lightweight and automatically
released on wakeup.  In other words, there is nothing to worry about
WRT cleanup after kill of unnamed semaphore consumers.  Same for named,
but there the file is left around.

>
> > That said, the problem with the SysV semaphores is that API allows
> > operations on arbitrary sets of the semaphores. Unless some unordinary
> > and complex measures are taken, implementation has to use global
> > internal lock to synchronize semop(2). This is what I noted in the
> > paper.
>
> It's certainly true that semop(2) is more complicated than we need.
> But in practice, we only call semop(2) when we need to sleep, or to
> awaken a sleeping process, so I'm not sure that performance of it
> matters a lot to us.
Issue is that the sleeps and wakeups on SysV semaphores do not scale,
at least on FreeBSD.

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Tom Lane
Date:
Maxim Sobolev <sobomax@freebsd.org> writes:
> Tom, on the related note on merits of SYSV semaphores vs. POSIX the
> handling of SYSV semaphore shortage in PG is, uggh, awful. The whole server
> crashes (abort()s), bumping into that on production box is not fun at all
> and it's particularly easy since the resource is severely constrained by
> default. Ideally it should just deny the particular connection request.

This seems like nonsense, because those are acquired once at postmaster
startup, not per connection.  You will need to decrease max_connections
to start successfully in a resource-constrained system, but the same is
true of other resource limits.

            regards, tom lane

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Maxim Sobolev
Date:
Tom, my diagnosis may be a nonsense, but the crash is real.

Jun  5 21:47:38 sippy postgres[3744]: [2-1] PANIC:  semop(id=65608) failed:
Invalid argument
Jun  5 21:47:38 sippy postgres[3743]: [2-1] PANIC:  semop(id=65608) failed:
Invalid argument
Jun  5 21:47:39 sippy postgres[3725]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 21:47:39 sippy postgres[3742]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 21:47:42 sippy postgres[3550]: [2-1] PANIC:  semop(id=65611) failed:
Invalid argument
Jun  5 21:47:42 sippy postgres[3664]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 21:47:42 sippy postgres[3667]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 21:47:42 sippy postgres[3663]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 21:47:42 sippy postgres[3666]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 21:47:42 sippy postgres[3665]: [2-1] PANIC:  semop(id=65609) failed:
Invalid argument
Jun  5 22:30:56 sippy postgres[3632]: [2-1] PANIC:  semop(id=65610) failed:
Invalid argument
Jun  5 22:30:56 sippy postgres[3633]: [2-1] PANIC:  semop(id=65610) failed:
Invalid argument

From the man semopt:

     [EINVAL]           No semaphore set corresponds to semid, or the
process
                        would exceed the system-defined limit for the number
                        of per-process SEM_UNDO structures.

AFAIK we've been hitting the second cause there. This is with FreeBSD 10.3
and postgresql 9.2.16 specifically. We've also seen this with 9.1 I think,
but I am not 100% sure. The specific limit that got exceeded
was kern.ipc.semmnu.

I can probably dig some stack traces, although they might be cleared out
now. Now looking at it again, I think you is probably right this is not
happening on the primitive creation time, but during its actual use. Which
may make graceful handling tricky if possible at all, still "good to have"
from my PG user's point of view.

Nevertheless, it just makes using POSIX primitives even more attractive
IMHO.

On Wed, Jun 22, 2016 at 8:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Maxim Sobolev <sobomax@freebsd.org> writes:
> > Tom, on the related note on merits of SYSV semaphores vs. POSIX the
> > handling of SYSV semaphore shortage in PG is, uggh, awful. The whole
> server
> > crashes (abort()s), bumping into that on production box is not fun at all
> > and it's particularly easy since the resource is severely constrained by
> > default. Ideally it should just deny the particular connection request.
>
> This seems like nonsense, because those are acquired once at postmaster
> startup, not per connection.  You will need to decrease max_connections
> to start successfully in a resource-constrained system, but the same is
> true of other resource limits.
>
>                         regards, tom lane
>
>


--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
Tel (Canada): +1-778-783-0474
Tel (Toll-Free): +1-855-747-7779
Fax: +1-866-857-6942
Web: http://www.sippysoft.com
MSN: sales@sippysoft.com
Skype: SippySoft

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Tom Lane
Date:
Maxim Sobolev <sobomax@sippysoft.com> writes:
> From the man semopt:
>      [EINVAL]           No semaphore set corresponds to semid, or the process
>                         would exceed the system-defined limit for the number
>                         of per-process SEM_UNDO structures.

> AFAIK we've been hitting the second cause there. This is with FreeBSD 10.3
> and postgresql 9.2.16 specifically. We've also seen this with 9.1 I think,
> but I am not 100% sure. The specific limit that got exceeded
> was kern.ipc.semmnu.

We never ask semop(2) for SEM_UNDO, so are you sure this isn't a kernel
bug?  I've never heard of such a report on any other platform.

            regards, tom lane

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Maxim Sobolev
Date:
Tom,

Well, now thinking about it, no I am certainly not sure about the root
cause. Mind you, until recently I had almost 0 knowledge about use of those
by postgres and, thanks God, I never had to use SYSV IPC for anything we
develop here. We've been increasing those limits to reasonably high values
for many years on kernels that we ship just to make PG run happily. And so
except of those sporadic leftover leakage of semaphores, that we've
workaround with ipcrm, it did not bother us much. (In case it's something
you want to look into that leakage seems to be happening when postgres
crashed upon write failure when disk space runs out, last time I've seen it
happening was this spring, so it must be some of the recent revisions too)

Back to those resource shortage crashes, in fact now that you are asking,
it's certainly possibly that something went south in the kernel. We also
done OS upgrade from 10.1 to 10.3 at the same time along with minor version
of postgres refresh when those crashes started to happen. But you are
always blame the software part first, so we did and in our case
bumping the kern.ipc.semmnu
did solve it for us, so we closed the case and moved on. I might look
through the changes in the semop() between 10.1 and 10.3 to see where that
EINVAL might be coming from. However, since we are considering switching to
(un)named POSIX primitives I don't feel a strong urge to do so.

But again, on a general note this kinda underlines the fact that sysv ipc
might be seen as somewhat legacy interface by the FreeBSD kernel people.
(disclamer, this is all IMHO, not talking on behalf of the freebsd project
or any part of it)

Some of it is related to the API design flaws that Konstantin alluded to,
some of it to the fact that as far as I understand, it's separate big chunk
of kernel code created decades ago by people who might no longer be
actively involved with the project and designed to run in vastly different
hardware and software environment. Nobody really owns it and it's not easy
to regression test. In my own limited experience if not for postgresql, we
would not even have that SYSV IPC enabled in our production kernels. And we
use some other 300+ opensource packages in our product, so sample is quite
representative I think.

On the other hand, as far as I understood from Konstantin's explanations,
POSIX primitives share most of the code with pthread library and as such
it's mostly modern code, well maintained, production and regression tested
and continuously optimized to run on modern things.

The point here that I am trying to make, perhaps postgres developers need
to acknowledge that clinging to SYSV IPC these days as the only supported
choice for synchronization is like using say sbrk(2) or mmap(2) to manage
heap instead of malloc(3). Yes, I understand that "don't touch it if it's
not broken" principle is important and yes, there might be some valid cases
where you'd want to do sbrk() or mmap too, but I also don't see any
technical reasons for not making POSIX primitives a first-class citizen in
PG either.

-Max

On Wed, Jun 22, 2016 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Maxim Sobolev <sobomax@sippysoft.com> writes:
> > From the man semopt:
> >      [EINVAL]           No semaphore set corresponds to semid, or the
> process
> >                         would exceed the system-defined limit for the
> number
> >                         of per-process SEM_UNDO structures.
>
> > AFAIK we've been hitting the second cause there. This is with FreeBSD
> 10.3
> > and postgresql 9.2.16 specifically. We've also seen this with 9.1 I
> think,
> > but I am not 100% sure. The specific limit that got exceeded
> > was kern.ipc.semmnu.
>
> We never ask semop(2) for SEM_UNDO, so are you sure this isn't a kernel
> bug?  I've never heard of such a report on any other platform.
>
>                         regards, tom lane
>

Re: BUG #14206: Switch to using POSIX semaphores on FreeBSD

From
Bruce Momjian
Date:
On Wed, Jun 22, 2016 at 11:49:33AM -0700, Maxim Sobolev wrote:
> Some of it is related to the API design flaws that Konstantin alluded to, some
> of it to the fact that as far as I understand, it's separate big chunk of
> kernel code created decades ago by people who might no longer be actively
> involved with the project and designed to run in vastly different hardware and
> software environment. Nobody really owns it and it's not easy to regression
> test. In my own limited experience if not for postgresql, we would not even
> have that SYSV IPC enabled in our production kernels. And we use some other
> 300+ opensource packages in our product, so sample is quite representative I
> think.

FYI, databases were the primary users of SYSV IPC even back in the old
days, so I am not surprised we might still be one of the rare users. :-)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +