Thread: BUG #14206: Switch to using POSIX semaphores on FreeBSD
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIwNgpMb2dnZWQgYnk6ICAg ICAgICAgIE1ha3N5bSBTb2JvbHlldgpFbWFpbCBhZGRyZXNzOiAgICAgIHNv Ym9tYXhAZnJlZWJzZC5vcmcKUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjUuMgpP cGVyYXRpbmcgc3lzdGVtOiAgIEZyZWVCU0QgMTAuMy1SRUxFQVNFIGFtZDY0 CkRlc2NyaXB0aW9uOiAgICAgICAgCgpUcmFkaXRpb25hbGx5LCBTWVNWIHNl bWFwaG9yZXMgYXJlIHVzZWQgdG8gZG8gc3luY2hyb25pemF0aW9uIG9uIEZy ZWVCU0QuDQoNCkhvd2V2ZXIsIGFjY29yZGluZyB0byB0aGUgYW5hbHlzaXMg ZG9uZSBieSBLb25zdGFudGluIEJlbG91c292IGhlcmUKaHR0cHM6Ly93d3cu a2liLmtpZXYudWEva2liL3Bnc3FsX3BlcmZfdjIuMC5wZGYgdGhlcmUgaXMg YXQgdGhlIHZlcnkgbGVhc3QKc29tZSBwZXJmb3JtYW5jZSBiZW5lZml0IG9u IHVzaW5nIFBPU0lYIHNlbWFwaG9uZXMgaW5zdGVhZCBvZiBTWVNWCnNlbWFw aG9uZXMgaW4gdGhlIFBHIHJ1bm5pbmcgb24gRnJlZUJTRCBob3N0Lg0KDQpJ biBhZGRpdGlvbiB0byB0aGF0IHBlcmZvcm1hbmNlIGJlbmVmaXQsIHRoZSBT WVNWIHByaW1pdGl2ZXMgYXJlIHVzdWFsbHkKdmVyeSBsaW1pdGVkIHJlc291 cmNlIGJ5IGRlZmF1bHQsIHNvIGluIG9yZGVyIHRvIHJ1biBhbnkgbW9yZSBv ciBsZXNzCnNpZ25pZmljYW50IGFtb3VudCBvZiBjb25uZWN0aW9ucyBvbiB5 b3VyIERCIHNlcnZlciB5b3UgbmVlZCB0byB0d2VhayBrZXJuZWwKb3B0aW9u IHRvIGluY3JlYXNlIG51bWJlciBvZiB0aG9zZS4gQW5kIGxhc3QgYnV0IG5v dCBsZWFzdCwgIFNZU1YgcHJpbWl0aXZlcwpvbmNlIGFsbG9jYXRlZCBuZWVk IGV4cGxpY2l0IHJlbW92YWwsIHdoaWNoIG1pZ2h0IG5vdCBiZSBwZXJmb3Jt ZWQgd2hlbiBQRwpwcm9jZXNzIGRpZXMgb3IgU0lHS0lMTGVkLiBOb25lIG9m IHRob3NlIGlzIGFuIGlzc3VlIHdpdGggUE9TSVgKc2VtYXBob3Jlcy4NCg0K V2UndmUgYmVlbiB0ZXN0aW5nIHRoYXQgcGF0Y2ggb24gOS4xLCA5LjIgYW5k IDkuNSB2ZXJzaW9ucyBvZiB0aGUgUEcgZm9yIGZldwp3ZWVrcyBub3cgYW5k IGl0IHBlcmZvcm1zIGF0IGxlYXN0IGFzIGdvb2QgYXMgd2l0aCBvbGQgU1lT ViBidWlsZHMuIFdlIGFsc28Kc2VlIGRyb3Agb2Ygc2VtYXBob3JlcyBpbiB1 c2UgdG8gMCBpbiB0aGUgaXBjcygxKSBvdXRwdXQsIHNvIHRoYXQgdGhlIHBh dGNoCmFjdHVhbGx5IGRvZXMgd2hhdCBpdCdzIHN1cHBvc2VkIHRvIGRvLiAN Cg0KLS0tIHNyYy90ZW1wbGF0ZS9mcmVlYnNkDQorKysgc3JjL3RlbXBsYXRl L2ZyZWVic2QNCkBAIC0zLDMgKzMsNCBAQA0KIGNhc2UgJGhvc3RfY3B1IGlu DQogICBhbHBoYSopICAgQ0ZMQUdTPSItTyI7OyAgIyBhbHBoYSBoYXMgcHJv YmxlbXMgd2l0aCAtTzINCiBlc2FjDQorVVNFX05BTUVEX1BPU0lYX1NFTUFQ SE9SRVM9MQ0KCgo=
sobomax@freebsd.org writes: > However, according to the analysis done by Konstantin Belousov here > https://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf there is at the very least > some performance benefit on using POSIX semaphones instead of SYSV > semaphones in the PG running on FreeBSD host. I wonder how thorough that performance testing was. The reason that the named-POSIX-semaphore code exists is that it used to be the only kind of semaphore available on ancient OS X versions. But we got rid of that as soon as we could, for the reason explained in template/darwin: # Select appropriate semaphore support. Darwin 6.0 (Mac OS X 10.2) and up # support System V semaphores; before that we have to use POSIX semaphores, # which are less good for our purposes because they eat a file descriptor # per backend per max_connection slot. The extra FDs slow down launching of new backends (due to having to dup all the postmaster's FDs for the semaphores) and if max_connections is large they can take a pretty serious chunk out of your system-wide file table, at worst max_connections squared. Now maybe FreeBSD is different enough from OSX that these are not problems for you, but I'm dubious. Have you got unnamed POSIX semaphores, and if so have you tried that variant? regards, tom lane
Tom, thanks for looking at it so promptly. I am adding kib@ into the discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and named vs. unnamed. As far as I can tell, the sem_init(3) interface is present in the FreeBSD 10.3, so maybe we can use those instead? -Max On Tue, Jun 21, 2016 at 12:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > sobomax@freebsd.org writes: > > However, according to the analysis done by Konstantin Belousov here > > https://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf there is at the very > least > > some performance benefit on using POSIX semaphones instead of SYSV > > semaphones in the PG running on FreeBSD host. > > I wonder how thorough that performance testing was. The reason that the > named-POSIX-semaphore code exists is that it used to be the only kind of > semaphore available on ancient OS X versions. But we got rid of that as > soon as we could, for the reason explained in template/darwin: > > # Select appropriate semaphore support. Darwin 6.0 (Mac OS X 10.2) and up > # support System V semaphores; before that we have to use POSIX semaphores, > # which are less good for our purposes because they eat a file descriptor > # per backend per max_connection slot. > > The extra FDs slow down launching of new backends (due to having to dup > all the postmaster's FDs for the semaphores) and if max_connections is > large they can take a pretty serious chunk out of your system-wide file > table, at worst max_connections squared. > > Now maybe FreeBSD is different enough from OSX that these are not problems > for you, but I'm dubious. > > Have you got unnamed POSIX semaphores, and if so have you tried that > variant? > > regards, tom lane > >
Maxim Sobolev <sobomax@freebsd.org> writes: > Tom, thanks for looking at it so promptly. I am adding kib@ into the > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and > named vs. unnamed. BTW, I trawled our archives and found this thread concerning the switch from POSIX to SYSV on OS X: https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com I'm not sure what you were using to decide that POSIX semaphores were okay, but the points in that thread about pgbench not being a very good test case remain relevant. > As far as I can tell, the sem_init(3) interface is present in the FreeBSD > 10.3, so maybe we can use those instead? If that seems like a competitive alternative for you, it'd be nice to have a platform where we use unnamed POSIX semaphores by default. I'm a little worried about whether that code has suffered bit-rot, since it's been sitting there basically unused for so long. regards, tom lane
On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote: > Maxim Sobolev <sobomax@freebsd.org> writes: > > Tom, thanks for looking at it so promptly. I am adding kib@ into the > > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and > > named vs. unnamed. > > BTW, I trawled our archives and found this thread concerning the switch > from POSIX to SYSV on OS X: > > https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com > > I'm not sure what you were using to decide that POSIX semaphores were > okay, but the points in that thread about pgbench not being a very > good test case remain relevant. > > > As far as I can tell, the sem_init(3) interface is present in the FreeBSD > > 10.3, so maybe we can use those instead? > > If that seems like a competitive alternative for you, it'd be nice to have > a platform where we use unnamed POSIX semaphores by default. I'm a little > worried about whether that code has suffered bit-rot, since it's been > sitting there basically unused for so long. On FreeBSD, there is no practical difference in the resource consumption for named vs. unnamed semaphore. I mean that after sem_open(3) call, an open file descriptor is not kept in the process fd table. The semaphore is represented by the mmaped page, libc+kernel operate solely on the page content and use umtx(2) to implement counted semaphore. In other words, no, there is no additional overhead of starting connection when using either named or unnamed (sem_init(3)) POSIX semaphores on FreeBSD, and there is no any open files overhead. That said, the problem with the SysV semaphores is that API allows operations on arbitrary sets of the semaphores. Unless some unordinary and complex measures are taken, implementation has to use global internal lock to synchronize semop(2). This is what I noted in the paper.
Konstantin, would it be too much to ask to start running your tests using unnamed semaphores instead for your performance work? As far as I properly understand what Tom said, named code was kinda one-off workaround for some specific ancient version of Darwin not used by any other platform that the PG cares about, so it might rot and/or get nuked eventually. Therefore, we might get better chance to get our changes accepted into the PostgreSQL if we use unnamed option. And they are not using "named" part anyway for anything functionally important, so unnamed POSIX semaphore is naturally the best primitive to use. This might also stir some interests among other OSes to switch to that. Thanks! -Max On Wed, Jun 22, 2016 at 3:00 AM, Konstantin Belousov <kostikbel@gmail.com> wrote: > On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote: > > Maxim Sobolev <sobomax@freebsd.org> writes: > > > Tom, thanks for looking at it so promptly. I am adding kib@ into the > > > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD > and > > > named vs. unnamed. > > > > BTW, I trawled our archives and found this thread concerning the switch > > from POSIX to SYSV on OS X: > > > > > https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com > > > > I'm not sure what you were using to decide that POSIX semaphores were > > okay, but the points in that thread about pgbench not being a very > > good test case remain relevant. > > > > > As far as I can tell, the sem_init(3) interface is present in the > FreeBSD > > > 10.3, so maybe we can use those instead? > > > > If that seems like a competitive alternative for you, it'd be nice to > have > > a platform where we use unnamed POSIX semaphores by default. I'm a > little > > worried about whether that code has suffered bit-rot, since it's been > > sitting there basically unused for so long. > > On FreeBSD, there is no practical difference in the resource consumption > for named vs. unnamed semaphore. I mean that after sem_open(3) call, an > open file descriptor is not kept in the process fd table. The semaphore > is represented by the mmaped page, libc+kernel operate solely on the > page content and use umtx(2) to implement counted semaphore. > > In other words, no, there is no additional overhead of starting > connection when using either named or unnamed (sem_init(3)) POSIX > semaphores on FreeBSD, and there is no any open files overhead. > > That said, the problem with the SysV semaphores is that API allows > operations on arbitrary sets of the semaphores. Unless some unordinary > and complex measures are taken, implementation has to use global > internal lock to synchronize semop(2). This is what I noted in the > paper. > >
Konstantin Belousov <kostikbel@gmail.com> writes: > On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote: >> If that seems like a competitive alternative for you, it'd be nice to have >> a platform where we use unnamed POSIX semaphores by default. I'm a little >> worried about whether that code has suffered bit-rot, since it's been >> sitting there basically unused for so long. > On FreeBSD, there is no practical difference in the resource consumption > for named vs. unnamed semaphore. I mean that after sem_open(3) call, an > open file descriptor is not kept in the process fd table. The semaphore > is represented by the mmaped page, libc+kernel operate solely on the > page content and use umtx(2) to implement counted semaphore. Is there any kernel-side resource at all? The thing that concerns me about the POSIX APIs is that it's not very clear whether anything gets left behind if the database crashes. The Linux man page for sem_destroy says An unnamed semaphore should be destroyed with sem_destroy() before the memory in which it is located is deallocated. Failure to do this can result in resource leaks on some implementations. and while they don't say that their own implementation has such a problem, it's worrisome. We go to some lengths to ensure that we can recycle SysV semaphores after a crash, but there's no equivalent logic in the POSIX semaphore code, and I don't see how it would even be possible to identify leftover "unnamed" semaphores. > That said, the problem with the SysV semaphores is that API allows > operations on arbitrary sets of the semaphores. Unless some unordinary > and complex measures are taken, implementation has to use global > internal lock to synchronize semop(2). This is what I noted in the > paper. It's certainly true that semop(2) is more complicated than we need. But in practice, we only call semop(2) when we need to sleep, or to awaken a sleeping process, so I'm not sure that performance of it matters a lot to us. regards, tom lane
Tom, on the related note on merits of SYSV semaphores vs. POSIX the handling of SYSV semaphore shortage in PG is, uggh, awful. The whole server crashes (abort()s), bumping into that on production box is not fun at all and it's particularly easy since the resource is severely constrained by default. Ideally it should just deny the particular connection request. I don't know if it's also true for POSIX primitives, but at least those are by design more abundant. This is experienced with fairly recent PG 9.1, we've been dealing with some of those crashes just last month. Maybe something you guys need to consider for improvement, if you have not already. -Max On Tue, Jun 21, 2016 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maxim Sobolev <sobomax@freebsd.org> writes: > > Tom, thanks for looking at it so promptly. I am adding kib@ into the > > discussion. Perhaps he would comment on the SYSV vs. POSIX in FreeBSD and > > named vs. unnamed. > > BTW, I trawled our archives and found this thread concerning the switch > from POSIX to SYSV on OS X: > > > https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com > > I'm not sure what you were using to decide that POSIX semaphores were > okay, but the points in that thread about pgbench not being a very > good test case remain relevant. > > > As far as I can tell, the sem_init(3) interface is present in the FreeBSD > > 10.3, so maybe we can use those instead? > > If that seems like a competitive alternative for you, it'd be nice to have > a platform where we use unnamed POSIX semaphores by default. I'm a little > worried about whether that code has suffered bit-rot, since it's been > sitting there basically unused for so long. > > regards, tom lane > >
On Wed, Jun 22, 2016 at 10:48:50AM -0400, Tom Lane wrote: > Konstantin Belousov <kostikbel@gmail.com> writes: > > On Tue, Jun 21, 2016 at 04:36:02PM -0400, Tom Lane wrote: > >> If that seems like a competitive alternative for you, it'd be nice to have > >> a platform where we use unnamed POSIX semaphores by default. I'm a little > >> worried about whether that code has suffered bit-rot, since it's been > >> sitting there basically unused for so long. > > > On FreeBSD, there is no practical difference in the resource consumption > > for named vs. unnamed semaphore. I mean that after sem_open(3) call, an > > open file descriptor is not kept in the process fd table. The semaphore > > is represented by the mmaped page, libc+kernel operate solely on the > > page content and use umtx(2) to implement counted semaphore. > > Is there any kernel-side resource at all? The thing that concerns me > about the POSIX APIs is that it's not very clear whether anything gets > left behind if the database crashes. The Linux man page for sem_destroy > says > > An unnamed semaphore should be destroyed with sem_destroy() before the > memory in which it is located is deallocated. Failure to do this can > result in resource leaks on some implementations. > > and while they don't say that their own implementation has such a problem, > it's worrisome. We go to some lengths to ensure that we can recycle SysV > semaphores after a crash, but there's no equivalent logic in the POSIX > semaphore code, and I don't see how it would even be possible to identify > leftover "unnamed" semaphores. On FreeBSD, it is only a memory page which is mmaped into all processes-consumers of the unnamed semaphore. Of course, if the process is blocked on semaphore, there is some bookkeeping done in kernel so that post would find all waiters. But it is lightweight and automatically released on wakeup. In other words, there is nothing to worry about WRT cleanup after kill of unnamed semaphore consumers. Same for named, but there the file is left around. > > > That said, the problem with the SysV semaphores is that API allows > > operations on arbitrary sets of the semaphores. Unless some unordinary > > and complex measures are taken, implementation has to use global > > internal lock to synchronize semop(2). This is what I noted in the > > paper. > > It's certainly true that semop(2) is more complicated than we need. > But in practice, we only call semop(2) when we need to sleep, or to > awaken a sleeping process, so I'm not sure that performance of it > matters a lot to us. Issue is that the sleeps and wakeups on SysV semaphores do not scale, at least on FreeBSD.
Maxim Sobolev <sobomax@freebsd.org> writes: > Tom, on the related note on merits of SYSV semaphores vs. POSIX the > handling of SYSV semaphore shortage in PG is, uggh, awful. The whole server > crashes (abort()s), bumping into that on production box is not fun at all > and it's particularly easy since the resource is severely constrained by > default. Ideally it should just deny the particular connection request. This seems like nonsense, because those are acquired once at postmaster startup, not per connection. You will need to decrease max_connections to start successfully in a resource-constrained system, but the same is true of other resource limits. regards, tom lane
Tom, my diagnosis may be a nonsense, but the crash is real. Jun 5 21:47:38 sippy postgres[3744]: [2-1] PANIC: semop(id=65608) failed: Invalid argument Jun 5 21:47:38 sippy postgres[3743]: [2-1] PANIC: semop(id=65608) failed: Invalid argument Jun 5 21:47:39 sippy postgres[3725]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 21:47:39 sippy postgres[3742]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 21:47:42 sippy postgres[3550]: [2-1] PANIC: semop(id=65611) failed: Invalid argument Jun 5 21:47:42 sippy postgres[3664]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 21:47:42 sippy postgres[3667]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 21:47:42 sippy postgres[3663]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 21:47:42 sippy postgres[3666]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 21:47:42 sippy postgres[3665]: [2-1] PANIC: semop(id=65609) failed: Invalid argument Jun 5 22:30:56 sippy postgres[3632]: [2-1] PANIC: semop(id=65610) failed: Invalid argument Jun 5 22:30:56 sippy postgres[3633]: [2-1] PANIC: semop(id=65610) failed: Invalid argument From the man semopt: [EINVAL] No semaphore set corresponds to semid, or the process would exceed the system-defined limit for the number of per-process SEM_UNDO structures. AFAIK we've been hitting the second cause there. This is with FreeBSD 10.3 and postgresql 9.2.16 specifically. We've also seen this with 9.1 I think, but I am not 100% sure. The specific limit that got exceeded was kern.ipc.semmnu. I can probably dig some stack traces, although they might be cleared out now. Now looking at it again, I think you is probably right this is not happening on the primitive creation time, but during its actual use. Which may make graceful handling tricky if possible at all, still "good to have" from my PG user's point of view. Nevertheless, it just makes using POSIX primitives even more attractive IMHO. On Wed, Jun 22, 2016 at 8:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maxim Sobolev <sobomax@freebsd.org> writes: > > Tom, on the related note on merits of SYSV semaphores vs. POSIX the > > handling of SYSV semaphore shortage in PG is, uggh, awful. The whole > server > > crashes (abort()s), bumping into that on production box is not fun at all > > and it's particularly easy since the resource is severely constrained by > > default. Ideally it should just deny the particular connection request. > > This seems like nonsense, because those are acquired once at postmaster > startup, not per connection. You will need to decrease max_connections > to start successfully in a resource-constrained system, but the same is > true of other resource limits. > > regards, tom lane > > -- Maksym Sobolyev Sippy Software, Inc. Internet Telephony (VoIP) Experts Tel (Canada): +1-778-783-0474 Tel (Toll-Free): +1-855-747-7779 Fax: +1-866-857-6942 Web: http://www.sippysoft.com MSN: sales@sippysoft.com Skype: SippySoft
Maxim Sobolev <sobomax@sippysoft.com> writes: > From the man semopt: > [EINVAL] No semaphore set corresponds to semid, or the process > would exceed the system-defined limit for the number > of per-process SEM_UNDO structures. > AFAIK we've been hitting the second cause there. This is with FreeBSD 10.3 > and postgresql 9.2.16 specifically. We've also seen this with 9.1 I think, > but I am not 100% sure. The specific limit that got exceeded > was kern.ipc.semmnu. We never ask semop(2) for SEM_UNDO, so are you sure this isn't a kernel bug? I've never heard of such a report on any other platform. regards, tom lane
Tom, Well, now thinking about it, no I am certainly not sure about the root cause. Mind you, until recently I had almost 0 knowledge about use of those by postgres and, thanks God, I never had to use SYSV IPC for anything we develop here. We've been increasing those limits to reasonably high values for many years on kernels that we ship just to make PG run happily. And so except of those sporadic leftover leakage of semaphores, that we've workaround with ipcrm, it did not bother us much. (In case it's something you want to look into that leakage seems to be happening when postgres crashed upon write failure when disk space runs out, last time I've seen it happening was this spring, so it must be some of the recent revisions too) Back to those resource shortage crashes, in fact now that you are asking, it's certainly possibly that something went south in the kernel. We also done OS upgrade from 10.1 to 10.3 at the same time along with minor version of postgres refresh when those crashes started to happen. But you are always blame the software part first, so we did and in our case bumping the kern.ipc.semmnu did solve it for us, so we closed the case and moved on. I might look through the changes in the semop() between 10.1 and 10.3 to see where that EINVAL might be coming from. However, since we are considering switching to (un)named POSIX primitives I don't feel a strong urge to do so. But again, on a general note this kinda underlines the fact that sysv ipc might be seen as somewhat legacy interface by the FreeBSD kernel people. (disclamer, this is all IMHO, not talking on behalf of the freebsd project or any part of it) Some of it is related to the API design flaws that Konstantin alluded to, some of it to the fact that as far as I understand, it's separate big chunk of kernel code created decades ago by people who might no longer be actively involved with the project and designed to run in vastly different hardware and software environment. Nobody really owns it and it's not easy to regression test. In my own limited experience if not for postgresql, we would not even have that SYSV IPC enabled in our production kernels. And we use some other 300+ opensource packages in our product, so sample is quite representative I think. On the other hand, as far as I understood from Konstantin's explanations, POSIX primitives share most of the code with pthread library and as such it's mostly modern code, well maintained, production and regression tested and continuously optimized to run on modern things. The point here that I am trying to make, perhaps postgres developers need to acknowledge that clinging to SYSV IPC these days as the only supported choice for synchronization is like using say sbrk(2) or mmap(2) to manage heap instead of malloc(3). Yes, I understand that "don't touch it if it's not broken" principle is important and yes, there might be some valid cases where you'd want to do sbrk() or mmap too, but I also don't see any technical reasons for not making POSIX primitives a first-class citizen in PG either. -Max On Wed, Jun 22, 2016 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maxim Sobolev <sobomax@sippysoft.com> writes: > > From the man semopt: > > [EINVAL] No semaphore set corresponds to semid, or the > process > > would exceed the system-defined limit for the > number > > of per-process SEM_UNDO structures. > > > AFAIK we've been hitting the second cause there. This is with FreeBSD > 10.3 > > and postgresql 9.2.16 specifically. We've also seen this with 9.1 I > think, > > but I am not 100% sure. The specific limit that got exceeded > > was kern.ipc.semmnu. > > We never ask semop(2) for SEM_UNDO, so are you sure this isn't a kernel > bug? I've never heard of such a report on any other platform. > > regards, tom lane >
On Wed, Jun 22, 2016 at 11:49:33AM -0700, Maxim Sobolev wrote: > Some of it is related to the API design flaws that Konstantin alluded to, some > of it to the fact that as far as I understand, it's separate big chunk of > kernel code created decades ago by people who might no longer be actively > involved with the project and designed to run in vastly different hardware and > software environment. Nobody really owns it and it's not easy to regression > test. In my own limited experience if not for postgresql, we would not even > have that SYSV IPC enabled in our production kernels. And we use some other > 300+ opensource packages in our product, so sample is quite representative I > think. FYI, databases were the primary users of SYSV IPC even back in the old days, so I am not surprised we might still be one of the rare users. :-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +