Thread: Regression tests fail on OpenBSD due to low semmns value
Hello hackers, A recent buildfarm timeout failure on sawshark [1] made me wonder, what's wrong with that animal — beside that failure, this animal (running on OpenBSD 7.4) produced "too many clients" errors from time to time, e. g., [2], [3]. I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4]) and as a consequence, initdb sets max_connections = 20 for the regression test database. (This can be helpful sometimes, see e.g., [5].) At the same time, paralell_schedule contains groups of 20 tests, for instance: # parallel group (20 tests): select_into random delete select_having select_distinct_on case prepared_xacts namespace select_implicit union arrays portals transactions select_distinct subselect update join aggregates hash_index btree_index Moreover, prepared_xacts performs "\c", and it adds one more connection for a short time, according to postmaster.log: 2024-12-16 06:18:20.290 EET [regression][1563560:91][client backend] [pg_regress/prepared_xacts] LOG: statement: rollback; ... 2024-12-16 06:18:20.290 EET [regression][1563561:2][client backend] [[unknown]] FATAL: sorry, too many clients already ... 2024-12-16 06:18:20.291 EET [regression][1563560:95][client backend] [pg_regress/prepared_xacts] LOG: disconnection: session time: 0:00:00.018 user=law database=regression host=[local] sysctl kern.seminfo.semmns=120 makes the issue go away on this OS; on the hand, "too many clients" failures can be reproduced on other OS, with "max_connections=20" in TEMP_CONFIG. As to the hang, it can be reproduced easily with: TEMP_CONFIG containing max_connections=2 superuser_reserved_connections=0 and parallel_schedule as simple as: test: transactions prepared_xacts test: transactions prepared_xacts Running `TEMP_CONFIG=.../extra.config make -s check`, I can see: # +++ regress check in src/test/regress +++ ... # parallel group (2 tests): prepared_xacts transactions not ok 1 + transactions 56 ms not ok 2 + prepared_xacts 21 ms # (test process exited with exit code 2) # parallel group (2 tests): ### the test is hanging here ### with one backend waiting inside: #0 0x000070c41ed2a007 in epoll_wait (epfd=6, events=0x629f1ce529e8, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #1 0x0000629f1410d64a in WaitEventSetWaitBlock (set=0x629f1ce52980, cur_timeout=-1, occurred_events=0x7ffd4c4ffed0, nevents=1) at latch.c:1564 #2 0x0000629f1410d534 in WaitEventSetWait (set=0x629f1ce52980, timeout=-1, occurred_events=0x7ffd4c4ffed0, nevents=1, wait_event_info=134217779) at latch.c:1510 #3 0x0000629f1410c764 in WaitLatch (latch=0x70c41b86bc24, wakeEvents=33, timeout=0, wait_event_info=134217779) at latch.c:538 #4 0x0000629f1413d032 in ProcWaitForSignal (wait_event_info=134217779) at proc.c:1893 #5 0x0000629f14132eb9 in GetSafeSnapshot (origSnapshot=0x629f147ad360 <CurrentSnapshotData>) at predicate.c:1579 #6 0x0000629f14133261 in GetSerializableTransactionSnapshot (snapshot=0x629f147ad360 <CurrentSnapshotData>) at predicate.c:1695 #7 0x0000629f143afafe in GetTransactionSnapshot () at snapmgr.c:253 #8 0x0000629f1414a7b8 in exec_simple_query (query_string=0x629f1ce580f0 "SELECT * FROM writetest;") at postgres.c:1172 ... So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to become empty (for other backend to remove itself from the list of possible conflicts inside ReleasePredicateLocks()), but it doesn't happen. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-12-11%2012%3A20%3A05 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-07-22%2001%3A20%3A22 [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sawshark&dt=2024-11-25%2006%3A20%3A22 [4] https://man.openbsd.org/options [5] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=73c9f91a1 Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that > hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4]) > and as a consequence, initdb sets max_connections = 20 for the regression > test database. (This can be helpful sometimes, see e.g., [5].) At the same > time, paralell_schedule contains groups of 20 tests, for instance: Yeah. That was more-or-less okay before we invented parallel query, but now there needs to be some headroom. I've thought about adjusting initdb to not allow max_connections less than 25 (can't remember if I actually proposed that on-list though). The other way would be to rearrange parallel_schedule to make the max group size less than 20, but that seems like a lot of effort for little benefit. FTR, NetBSD also has unreasonably tiny semaphore settings out-of-the box. mamba's host is using kern.ipc.semmni=100 kern.ipc.semmns=1000 and for that matter kern.maxvnodes=60000 kern.maxproc=1000 kern.maxfiles=10000 > ... > So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to > become empty (for other backend to remove itself from the list of possible conflicts > inside ReleasePredicateLocks()), but it doesn't happen. This seems like an actual bug? regards, tom lane
On 2024-12-16 Mo 12:23 AM, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:I deployed OpenBSD 7.4 locally and reproduced "too many clients" and that hang as well. It turned out that OpenBSD has semmns as low as 60 (see [4]) and as a consequence, initdb sets max_connections = 20 for the regression test database. (This can be helpful sometimes, see e.g., [5].) At the same time, paralell_schedule contains groups of 20 tests, for instance:Yeah. That was more-or-less okay before we invented parallel query, but now there needs to be some headroom. I've thought about adjusting initdb to not allow max_connections less than 25 (can't remember if I actually proposed that on-list though). The other way would be to rearrange parallel_schedule to make the max group size less than 20, but that seems like a lot of effort for little benefit.
25 seems perfectly reasonable, these days. The current minimum was set nearly 7 years ago.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 2024-12-16 Mo 12:23 AM, Tom Lane wrote: >> Yeah. That was more-or-less okay before we invented parallel query, >> but now there needs to be some headroom. I've thought about adjusting >> initdb to not allow max_connections less than 25 (can't remember if >> I actually proposed that on-list though). The other way would be to >> rearrange parallel_schedule to make the max group size less than 20, >> but that seems like a lot of effort for little benefit. > 25 seems perfectly reasonable, these days. The current minimum was set > nearly 7 years ago. I poked at this a bit on an OpenBSD installation. The out-of-the-box value of kern.seminfo.semmns seems to be 60, as Alexander said. It turns out that we can run under that with max_connections = 20, but not any higher value, the reason being that the number of semaphores we need is MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders + NUM_AUXILIARY_PROCS or 20 + 3 + 1 + 8 + 10 + 6 = 48. We allocate semaphores in groups of SEMAS_PER_SET (16), plus one for identification purposes, so with this many semaphores needed we create 3 sets of 17 semaphores = 51 semaphores. One more requested semaphore would put us up to 68 semaphores which is more than OpenBSD's SEMMNS. So we're already on the hairy edge here. Now we could just blow this off and say that we can't run on OpenBSD at all without an increase in kern.seminfo.semmns. But that seems a little sad, because there are easy things we could do to make this less tight: * Why in the world is the default value of max_wal_senders 10? I find it hard to believe that there are installations using more than about 3, and even there you can bet they are changing a lot of other parameters. * There's no reason that SEMAS_PER_SET has to be a power of 2. The commentary in sysv_sema.c says "It must be *less than* your kernel's SEMMSL (max semaphores per set) parameter, which is often around 25". If we made it, say, 19, then we could allocate 3 sets (really 20 semaphores) and accommodate up to 57 processes without having to have an increase in kern.seminfo.semmns. In short then, I propose: * Increase initdb's minimum probed max_connections to 25. * Reduce default value of max_wal_senders to 3 (or maybe 5 if people think that's too drastic). * Change sysv_sema.c's SEMAS_PER_SET to 19. On a stock OpenBSD setup, I find that this actually lets us set max_connections to 30, so that there's some headroom for the inevitable future growth of the number of background processes. Of course, none of this is going to save owners of *BSD buildfarm animals from needing to increase the kernel parameters, because the regression tests launch multiple postmasters in places. But I think it's friendly to novice PG users if they can launch one postmaster without that. regards, tom lane
Hi, On 2024-12-16 12:52:46 -0500, Tom Lane wrote: > or 20 + 3 + 1 + 8 + 10 + 6 = 48. We allocate semaphores in groups > of SEMAS_PER_SET (16), plus one for identification purposes, > so with this many semaphores needed we create 3 sets of 17 semaphores > = 51 semaphores. One more requested semaphore would put us up to 68 > semaphores which is more than OpenBSD's SEMMNS. So we're already on > the hairy edge here. > > Now we could just blow this off and say that we can't run on OpenBSD > at all without an increase in kern.seminfo.semmns. Given the numbers of users (or even testers) on openbsd that seems like it might be a reasonable answer... But, see below. > * Why in the world is the default value of max_wal_senders 10? > I find it hard to believe that there are installations using > more than about 3, and even there you can bet they are changing > a lot of other parameters. I don't think it's that rare as logical replication also needs a walsender slot... I think we're going to hurt far more users by lowering this than we'd help. But I think it might be sane to have initdb probe a lower max_wal_senders alongside lower max_connections settings. It seems to make sense to have a lower max_wal_senders settings on machines that don't have enough resources to run with max_connections=100. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-12-16 12:52:46 -0500, Tom Lane wrote: >> * Why in the world is the default value of max_wal_senders 10? >> I find it hard to believe that there are installations using >> more than about 3, and even there you can bet they are changing >> a lot of other parameters. > I don't think it's that rare as logical replication also needs a walsender > slot... I think we're going to hurt far more users by lowering this than we'd > help. Hm, okay. If we just twiddle SEMAS_PER_SET we can still have max_connections = 25 with max_wal_senders = 10, so doing that much seems free. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > Whenever I run into this, or my Mac requires manual ipcrm to clean up > leaked SysV kernel junk, I rebase my patch for sema_kind = 'futex'. > Here it goes. It could be updated to support NetBSD I believe, but I > didn't try as its futex stuff came out later. FWIW, I looked at a nearby NetBSD 10.0 machine. It has /usr/include/sys/futex.h, which includes this enticing comment: /* * Definitions for the __futex(2) synchronization primitive. * * These definitions are intended to be ABI-compatible with the * Linux futex(2) system call. */ However, the complete lack of any user-level documentation makes me misdoubt the extent of their commitment to this :-( I have the same concern about depending on undocumented macOS APIs. Other than that, getting off of SysV semaphores would be a nice thing to do. regards, tom lane
On 16.12.24 19:19, Andres Freund wrote: >> * Why in the world is the default value of max_wal_senders 10? >> I find it hard to believe that there are installations using >> more than about 3, and even there you can bet they are changing >> a lot of other parameters. > > I don't think it's that rare as logical replication also needs a walsender > slot... I think we're going to hurt far more users by lowering this than we'd > help. Here is where this change was originally discussed: https://www.postgresql.org/message-id/flat/CABUevEy4PR_EAvZEzsbF5s%2BV0eEvw7shJ2t-AUwbHOjT%2ByRb3A%40mail.gmail.com The low semaphore settings on some BSD systems were also mentioned there. Did anything change now that it is triggering more issues now?
Peter Eisentraut <peter@eisentraut.org> writes: >>> * Why in the world is the default value of max_wal_senders 10? > Here is where this change was originally discussed: > https://www.postgresql.org/message-id/flat/CABUevEy4PR_EAvZEzsbF5s%2BV0eEvw7shJ2t-AUwbHOjT%2ByRb3A%40mail.gmail.com Hmm. There was not a lot in that thread about which specific nonzero value of max_wal_senders to use, but I do see >> After some testing and searching for documentation, it seems that at >> least the BSD platforms have a very low default semmns setting >> (apparently 60, which leads to max_connections=30). > The low semaphore settings on some BSD systems were also mentioned > there. Did anything change now that it is triggering more issues now? Yeah, we have more background-process slots reserved by default now. There's parallel worker slots that were not there in v10, and I think another one or two random auxiliary processes. So we fail to reach max_connections=30 now. As things stand today, we can allocate exactly 20 max_connections because there are 28 background-process slots if all other parameters are left at default, and 48 usable semaphores is as many as we can create under the OpenBSD/NetBSD default of SEMMNS=60. So we're skating at the hairy edge of whether the parallel regression tests work reliably, and the next time somebody invents a new kind of auxiliary process, it will stop working altogether. My proposal to increase SEMAS_PER_SET to 19 would provide us nine more usable semaphores under the default *BSD configuration. With the change to initdb to probe 25 not 20 for max_connections, five of those would go into max_connections and we'd have four spares for new background processes. Maybe by the time that runs out, we'll have found a better alternative to SysV semaphores. The only downside I can see is that the current setup is able to coexist with some other service that uses a small number of SysV semaphores, while with these changes that would not work without raising the platform SEMMNS limit. Realistically though you're going to want to raise the platform limit for any sort of production usage of Postgres. I think this discussion is just about whether "make; make check" will work out-of-the-box, which I think is a good goal to have. regards, tom lane
Hi, On 2024-12-18 11:23:23 -0500, Tom Lane wrote: > Peter Eisentraut <peter@eisentraut.org> writes: > >> After some testing and searching for documentation, it seems that at > >> least the BSD platforms have a very low default semmns setting > >> (apparently 60, which leads to max_connections=30). > > > The low semaphore settings on some BSD systems were also mentioned > > there. Did anything change now that it is triggering more issues now? > > Yeah, we have more background-process slots reserved by default now. > There's parallel worker slots that were not there in v10, and I think > another one or two random auxiliary processes. So we fail to reach > max_connections=30 now. > > As things stand today, we can allocate exactly 20 max_connections > because there are 28 background-process slots if all other parameters > are left at default, and 48 usable semaphores is as many as we > can create under the OpenBSD/NetBSD default of SEMMNS=60. So we're > skating at the hairy edge of whether the parallel regression tests > work reliably, and the next time somebody invents a new kind of > auxiliary process, it will stop working altogether. > > My proposal to increase SEMAS_PER_SET to 19 would provide us nine > more usable semaphores under the default *BSD configuration. > With the change to initdb to probe 25 not 20 for max_connections, > five of those would go into max_connections and we'd have four > spares for new background processes. Maybe by the time that runs > out, we'll have found a better alternative to SysV semaphores. > > The only downside I can see is that the current setup is able > to coexist with some other service that uses a small number of > SysV semaphores, while with these changes that would not work > without raising the platform SEMMNS limit. Realistically though > you're going to want to raise the platform limit for any sort of > production usage of Postgres. I think this discussion is just > about whether "make; make check" will work out-of-the-box, which > I think is a good goal to have. Maybe we should consider switching those platforms to unnamed posix semaphores? There were some not so great performance numbers in the past: * openbsd, 2021: https://www.postgresql.org/message-id/3010886.1634950831%40sss.pgh.pa.us * netbsd, 2022: https://www.postgresql.org/message-id/20220828013914.5hzc7kvcpum5h2yn%40awork3.anarazel.de But TBH, nobody uses openbsd and netbsd if performance matters even one iota. And considering a bunch of postgres changes to deal with idiotic default sysv limits doesn't feal like a sensible thing to do in 2024. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Maybe we should consider switching those platforms to unnamed posix > semaphores? I already looked into that. OpenBSD still doesn't have cross-process posix semaphores, at least according to its man page. NetBSD does, but they consume an FD per sema, which is actually worse because the default max-open-files-per-process is none too large either. > But TBH, nobody uses openbsd and netbsd if performance matters even one > iota. And considering a bunch of postgres changes to deal with idiotic default > sysv limits doesn't feal like a sensible thing to do in 2024. Yeah, I would not expend a lot of effort on this. But two one-line changes doesn't seem unreasonable. regards, tom lane
Hi, On 2024-12-18 12:00:48 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Maybe we should consider switching those platforms to unnamed posix > > semaphores? > > I already looked into that. OpenBSD still doesn't have cross-process > posix semaphores, at least according to its man page. Ugh, I had missed that: This implementation does not support shared semaphores, and reports this fact by setting errno to EPERM. This is perhaps a stretch of the intention of POSIX, but is compliant, with the caveat that sem_init() always reports a permissions error when an attempt to create a shared semaphore is made. That's such a stupid argument that I kinda just want to rip out openbsd support out of postgres :) > NetBSD does, but they consume an FD per sema, which is actually worse > because the default max-open-files-per-process is none too large either. Doesn't seem that bad on netbsd 10. Via Bilal's netbsd CI patch, I get: # sysctl proc.curproc.rlimit.descriptors proc.curproc.rlimit.descriptors.soft = 1024 proc.curproc.rlimit.descriptors.hard = 3404 > > But TBH, nobody uses openbsd and netbsd if performance matters even one > > iota. And considering a bunch of postgres changes to deal with idiotic default > > sysv limits doesn't feal like a sensible thing to do in 2024. > > Yeah, I would not expend a lot of effort on this. But two one-line > changes doesn't seem unreasonable. Agreed for stuff like SEMAS_PER_SET. I just don't think it's a good idea to invest in lowering our default semaphore requirements by lowering various default process limits or such. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-12-18 12:00:48 -0500, Tom Lane wrote: >> NetBSD does, but they consume an FD per sema, which is actually worse >> because the default max-open-files-per-process is none too large either. > Doesn't seem that bad on netbsd 10. Via Bilal's netbsd CI patch, I get: > # sysctl proc.curproc.rlimit.descriptors > proc.curproc.rlimit.descriptors.soft = 1024 > proc.curproc.rlimit.descriptors.hard = 3404 Hmm, on mamba's host I see proc.curproc.rlimit.descriptors.soft = 128 proc.curproc.rlimit.descriptors.hard = 1772 I had actually tried building with unnamed semas there a couple days ago, and found that the postmaster failed to start. 21fb39cb0 should have alleviated that (didn't test it yet). But we're still in a very limited-resource regime. That with the old performance tests you dredged up makes me not want to switch sema types. >> Yeah, I would not expend a lot of effort on this. But two one-line >> changes doesn't seem unreasonable. > Agreed for stuff like SEMAS_PER_SET. I just don't think it's a good idea to > invest in lowering our default semaphore requirements by lowering various > default process limits or such. Fair, seems like we're on the same page. regards, tom lane
BTW, I did a little bit of performance testing using current OpenBSD (7.6), and it looks like they partially fixed the performance issues I saw with their named POSIX semaphores back in 2021. "pgbench -S" seems to show TPS rates right about on par with a SysV-sema build. There is still a measurable hit in connection startup time, about 18.8ms versus 16.7ms according to "pgbench -S -C" (with max_connections set to 100). But that's probably not something you'd notice if you weren't looking for it. Postmaster start/stop time is still awful with max_connections = 10000, but how many people are likely to try that? (It's a couple of seconds at 1000, so I detect a strong whiff of an O(N^2) issue in there somewhere.) So maybe we should think about switching OpenBSD to named semas by default. One good thing about that is we'd have some buildfarm coverage for that code path --- right now there are no platforms that use it. We'd still want to make the other changes I mentioned for NetBSD's sake, though. regards, tom lane
Hello Tom,
16.12.2024 07:23, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:... So GetSafeSnapshot() waits indefinitely for possibleUnsafeConflicts to become empty (for other backend to remove itself from the list of possible conflicts inside ReleasePredicateLocks()), but it doesn't happen.This seems like an actual bug?
I've reproduced this behavior with two reduced sqls.
prepared_xacts.sql:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
CREATE TABLE pxtest4 (a int);
PREPARE TRANSACTION 'regress_sub2';
\c -
COMMIT PREPARED 'regress_sub2';
-- the script ends prematurely and doesn't reach COMMIT when \c fails due
-- to the "too many clients" error.
transactions.sql
SELECT pg_sleep(1);
CREATE TABLE writetest (a int);
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, DEFERRABLE; -- ok
SELECT * FROM writetest; -- ok
COMMIT;
and parallel_schedule:
test: transactions prepared_xacts
So "transactions" backend just waits for the prepared transaction to
finish.
19.12.2024 01:06, Tom Lane wrote:
We'd still want to make the other changes I mentioned for NetBSD's sake, though.
Thank you for fixing that shortcoming!
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > I've reproduced this behavior with two reduced sqls. > prepared_xacts.sql: > BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; > CREATE TABLE pxtest4 (a int); > PREPARE TRANSACTION 'regress_sub2'; > \c - > COMMIT PREPARED 'regress_sub2'; > -- the script ends prematurely and doesn't reach COMMIT when \c fails due > -- to the "too many clients" error. Hmm, okay. Not really a bug, or at least I don't see much we could do about it. It does seem odd that a prepared transaction --- which, at least in theory, we should know won't do anything more --- can block other serializable transactions. Maybe that could be improved, but it sounds like a research project not a bug fix. regards, tom lane