Thread: pgsql: Add kqueue(2) support to the WaitEventSet API.
Add kqueue(2) support to the WaitEventSet API. Use kevent(2) to wait for events on the BSD family of operating systems and macOS. This is similar to the epoll(2) support added for Linux by commit 98a64d0bd. Author: Thomas Munro Reviewed-by: Andres Freund, Marko Tiikkaja, Tom Lane Tested-by: Mateusz Guzik, Matteo Beccati, Keith Fiske, Heikki Linnakangas, Michael Paquier, Peter Eisentraut, Rui DeSousa,Tom Lane, Mark Wong Discussion: https://postgr.es/m/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/815c2f0972c8386aba7c606f1ee6690d13b04db2 Modified Files -------------- configure | 4 +- configure.in | 2 + src/backend/storage/ipc/latch.c | 300 +++++++++++++++++++++++++++++++++++++++- src/include/pg_config.h.in | 6 + src/tools/msvc/Solution.pm | 2 + 5 files changed, 311 insertions(+), 3 deletions(-)
Hi Thomas, On Wed, Feb 05, 2020 at 04:59:10AM +0000, Thomas Munro wrote: > Add kqueue(2) support to the WaitEventSet API. > > Use kevent(2) to wait for events on the BSD family of operating > systems and macOS. This is similar to the epoll(2) support added > for Linux by commit 98a64d0bd. Worth noting this issue with the test suite of postgres_fdw for buildfarm animal coypu, running on NetBSD: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2020-02-19%2023%3A01%3A01 +ERROR: kqueue failed: Too many open files -- Michael
Attachment
On Thu, Feb 20, 2020 at 8:24 PM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Feb 05, 2020 at 04:59:10AM +0000, Thomas Munro wrote: > > Add kqueue(2) support to the WaitEventSet API. > > > > Use kevent(2) to wait for events on the BSD family of operating > > systems and macOS. This is similar to the epoll(2) support added > > for Linux by commit 98a64d0bd. > > Worth noting this issue with the test suite of postgres_fdw for > buildfarm animal coypu, running on NetBSD: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2020-02-19%2023%3A01%3A01 > +ERROR: kqueue failed: Too many open files Hmm. So coypu just came back after 48 days, and the new kqueue() code fails for process 19829 after successfully running 265 log lines' worth of postgres_fdw tests, because it's run out of file descriptors. I can see that WaitLatchOrSocket() actually could leak an epoll/kqueue socket if WaitEventSetWait() raises an error, which is interesting, but apparently not the explanation here because we don't see a preceding error report. Another theory would be that this machine has a low max_safe_fds, and NUM_RESERVED_FDS is only just enough to handle the various sockets that postgres_fdw.sql creates and at some point kqueue()'s demand for just one more pushed it over the edge. From the error text and a look at the man page for errno, this error is EMFILE (per process limit, which could be as low as 64) rather then ENFILE (system limit). Remi, any chance you could run gmake installcheck under contrib/postgres_fdw on that host, to see if this is repeatable? Can you tell us about the relevant limits? Maybe ulimit -n (for the user that runs the build farm), and also sysctl -a | grep descriptors, sysctl -a | grep maxfiles?
> Le 20 févr. 2020 à 12:15, Thomas Munro <thomas.munro@gmail.com> a écrit : > > On Thu, Feb 20, 2020 at 8:24 PM Michael Paquier <michael@paquier.xyz> wrote: >> On Wed, Feb 05, 2020 at 04:59:10AM +0000, Thomas Munro wrote: >>> Add kqueue(2) support to the WaitEventSet API. >>> >>> Use kevent(2) to wait for events on the BSD family of operating >>> systems and macOS. This is similar to the epoll(2) support added >>> for Linux by commit 98a64d0bd. >> >> Worth noting this issue with the test suite of postgres_fdw for >> buildfarm animal coypu, running on NetBSD: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=coypu&dt=2020-02-19%2023%3A01%3A01 >> +ERROR: kqueue failed: Too many open files > > Hmm. So coypu just came back after 48 days, and the new kqueue() code > fails for process 19829 after successfully running 265 log lines' > worth of postgres_fdw tests, because it's run out of file > descriptors. I can see that WaitLatchOrSocket() actually could leak > an epoll/kqueue socket if WaitEventSetWait() raises an error, which is > interesting, but apparently not the explanation here because we don't > see a preceding error report. Another theory would be that this > machine has a low max_safe_fds, and NUM_RESERVED_FDS is only just > enough to handle the various sockets that postgres_fdw.sql creates and > at some point kqueue()'s demand for just one more pushed it over the > edge. From the error text and a look at the man page for errno, this > error is EMFILE (per process limit, which could be as low as 64) > rather then ENFILE (system limit). > > Remi, any chance you could run gmake installcheck under > contrib/postgres_fdw on that host, to see if this is repeatable? Can > you tell us about the relevant limits? Maybe ulimit -n (for the user > that runs the build farm), and also sysctl -a | grep descriptors, > sysctl -a | grep maxfiles? Hi, Unfortunalty, coypu went offline again. I will run tests as soon as I can bring it back up. Rémi
=?utf-8?Q?R=C3=A9mi_Zara?= <remi_zara@mac.com> writes: >> Le 20 févr. 2020 à 12:15, Thomas Munro <thomas.munro@gmail.com> a écrit : >> Remi, any chance you could run gmake installcheck under >> contrib/postgres_fdw on that host, to see if this is repeatable? Can >> you tell us about the relevant limits? Maybe ulimit -n (for the user >> that runs the build farm), and also sysctl -a | grep descriptors, >> sysctl -a | grep maxfiles? > Unfortunalty, coypu went offline again. I will run tests as soon as I can bring it back up. I have a working NetBSD 8/ppc installation, will try to reproduce there. regards, tom lane
On Wed, Feb 5, 2020 at 7:59 AM Thomas Munro <tmunro@postgresql.org> wrote: > Add kqueue(2) support to the WaitEventSet API. > > Use kevent(2) to wait for events on the BSD family of operating > systems and macOS. This is similar to the epoll(2) support added > for Linux by commit 98a64d0bd. I'm not sure if it was already reported in this thread (it seems to be not at the first glance), but I've discovered following issue on macos 10.13.6. If backend is under lldb and does XactLockTableWait(), then it does proc_exit(1). The full reproduction case is following. s1# create table test (id serial primary key, value int); s1# insert into test values (1,1); s1# begin; s1# update test set value = value + 1 where id = 1; lldb attached to s2: b proc_exit s2# update test set value = value + 1 where id = 1; (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1 * frame #0: 0x000000010f9a7f5b postgres`proc_exit(code=1) at ipc.c:107 frame #1: 0x000000010f9aa3b9 postgres`WaitEventSetWaitBlock(set=0x00007fabdd847c90, cur_timeout=-1, occurred_events=0x00007ffee0796c10, nevents=1) at latch.c:1427 frame #2: 0x000000010f9a9a43 postgres`WaitEventSetWait(set=0x00007fabdd847c90, timeout=-1, occurred_events=0x00007ffee0796c10, nevents=1, wait_event_info=50331652) at latch.c:1237 frame #3: 0x000000010f9a93b5 postgres`WaitLatchOrSocket(latch=0x00000001197eacc4, wakeEvents=33, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:428 frame #4: 0x000000010f9a91c1 postgres`WaitLatch(latch=0x00000001197eacc4, wakeEvents=33, timeout=0, wait_event_info=50331652) at latch.c:368 frame #5: 0x000000010f9d65b6 postgres`ProcSleep(locallock=0x00007fabdd01d5d8, lockMethodTable=0x000000010fdb5cf8) at proc.c:1286 frame #6: 0x000000010f9c2af9 postgres`WaitOnLock(locallock=0x00007fabdd01d5d8, owner=0x00007fabdf0056d0) at lock.c:1766 frame #7: 0x000000010f9c13a1 postgres`LockAcquireExtended(locktag=0x00007ffee0797140, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0000000000000000) at lock.c:1048 frame #8: 0x000000010f9c08b5 postgres`LockAcquire(locktag=0x00007ffee0797140, lockmode=5, sessionLock=false, dontWait=false) at lock.c:713 frame #9: 0x000000010f9bef32 postgres`XactLockTableWait(xid=511, rel=0x000000011031e148, ctid=0x00007ffee0797394, oper=XLTW_Update) at lmgr.c:658 frame #10: 0x000000010f4e9cab postgres`heap_update(relation=0x000000011031e148, otid=0x00007ffee0797818, newtup=0x00007fabdd847a48, cid=0, crosscheck=0x0000000000000000, wait=true, tmfd=0x00007ffee07976f0, lockmode=0x00007ffee07976d8) at heapam.c:3239 frame #11: 0x000000010f4f9353 postgres`heapam_tuple_update(relation=0x000000011031e148, otid=0x00007ffee0797818, slot=0x00007fabdc828558, cid=0, snapshot=0x00007fabdc818170, crosscheck=0x0000000000000000, wait=true, tmfd=0x00007ffee07976f0, lockmode=0x00007ffee07976d8, update_indexes=0x00007ffee07976d6) at heapam_handler.c:326 frame #12: 0x000000010f7ba73d postgres`table_tuple_update(rel=0x000000011031e148, otid=0x00007ffee0797818, slot=0x00007fabdc828558, cid=0, snapshot=0x00007fabdc818170, crosscheck=0x0000000000000000, wait=true, tmfd=0x00007ffee07976f0, lockmode=0x00007ffee07976d8, update_indexes=0x00007ffee07976d6) at tableam.h:1293 frame #13: 0x000000010f7b8952 postgres`ExecUpdate(mtstate=0x00007fabdc826ca8, tupleid=0x00007ffee0797818, oldtuple=0x0000000000000000, slot=0x00007fabdc828558, planSlot=0x00007fabdc828408, epqstate=0x00007fabdc826da0, estate=0x00007fabdc826920, canSetTag=true) at nodeModifyTable.c:1336 frame #14: 0x000000010f7b6d5a postgres`ExecModifyTable(pstate=0x00007fabdc826ca8) at nodeModifyTable.c:2246 frame #15: 0x000000010f780e82 postgres`ExecProcNodeFirst(node=0x00007fabdc826ca8) at execProcnode.c:444 frame #16: 0x000000010f779332 postgres`ExecProcNode(node=0x00007fabdc826ca8) at executor.h:245 frame #17: 0x000000010f7751b1 postgres`ExecutePlan(estate=0x00007fabdc826920, planstate=0x00007fabdc826ca8, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x00007fabdd843840, execute_once=true) at execMain.c:1646 frame #18: 0x000000010f775072 postgres`standard_ExecutorRun(queryDesc=0x00007fabdd81ff20, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364 frame #19: 0x000000010f774e42 postgres`ExecutorRun(queryDesc=0x00007fabdd81ff20, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308 frame #20: 0x000000010f9eb63e postgres`ProcessQuery(plan=0x00007fabdd8447a8, sourceText="update test set value = value + 1 where id = 1;", params=0x0000000000000000, queryEnv=0x0000000000000000, dest=0x00007fabdd843840, qc=0x00007ffee0797d70) at pquery.c:160 frame #21: 0x000000010f9ea71d postgres`PortalRunMulti(portal=0x00007fabdd823720, isTopLevel=true, setHoldSnapshot=false, dest=0x00007fabdd843840, altdest=0x00007fabdd843840, qc=0x00007ffee0797d70) at pquery.c:1265 frame #22: 0x000000010f9e9d92 postgres`PortalRun(portal=0x00007fabdd823720, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x00007fabdd843840, altdest=0x00007fabdd843840, qc=0x00007ffee0797d70) at pquery.c:779 frame #23: 0x000000010f9e5279 postgres`exec_simple_query(query_string="update test set value = value + 1 where id = 1;") at postgres.c:1236 frame #24: 0x000000010f9e43b8 postgres`PostgresMain(argc=1, argv=0x00007fabdd01fe78, dbname="postgres", username="smagen") at postgres.c:4295 frame #25: 0x000000010f9147a0 postgres`BackendRun(port=0x00007fabde000320) at postmaster.c:4510 frame #26: 0x000000010f913b9a postgres`BackendStartup(port=0x00007fabde000320) at postmaster.c:4202 frame #27: 0x000000010f912aea postgres`ServerLoop at postmaster.c:1727 frame #28: 0x000000010f9104fa postgres`PostmasterMain(argc=3, argv=0x00007fabdbd009b0) at postmaster.c:1400 frame #29: 0x000000010f7fae19 postgres`main(argc=3, argv=0x00007fabdbd009b0) at main.c:210 frame #30: 0x00007fff69069015 libdyld.dylib`start + 1 ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Mar 17, 2020 at 12:55 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Wed, Feb 5, 2020 at 7:59 AM Thomas Munro <tmunro@postgresql.org> wrote: > > Add kqueue(2) support to the WaitEventSet API. > > > > Use kevent(2) to wait for events on the BSD family of operating > > systems and macOS. This is similar to the epoll(2) support added > > for Linux by commit 98a64d0bd. > > I'm not sure if it was already reported in this thread (it seems to be > not at the first glance), but I've discovered following issue on macos > 10.13.6. If backend is under lldb and does XactLockTableWait(), then > it does proc_exit(1). /me digs out a Macintosh Reproduced here. The problem seems to be that macOS's getppid() returns the debugger's PID, while the debugger is attached. This doesn't happen on FreeBSD (even though the debugger does internally become the parent, getppid() is careful to return the "real" parent PID so that user space doesn't notice this trickery; apparently Apple made a different choice). The getppid() check is there to close a vanishingly rare race condition: when creating a WaitEventSet, we ask the kernel to tell us when the postmaster exits, but there is a possibility that the postmaster has already exited; normally that causes an error with errno == ESRCH (no such PID, it's already gone), but another unrelated process might have started that has the same PID, so we check if our ppid has changed after a successful return code. That's not going to work under a debugger on this OS. Looking into some options.
On 2020-Mar-17, Thomas Munro wrote: > Reproduced here. The problem seems to be that macOS's getppid() > returns the debugger's PID, while the debugger is attached. This > doesn't happen on FreeBSD (even though the debugger does internally > become the parent, getppid() is careful to return the "real" parent > PID so that user space doesn't notice this trickery; apparently Apple > made a different choice). Wow ... Yeah, that was a known problem with FreeBSD, see https://postgr.es/m/1292851036-sup-5399@alvh.no-ip.org Evidently FreeBSD must have fixed it, but macOS has not caught up with that ... > The getppid() check is there to close a vanishingly rare race > condition: when creating a WaitEventSet, we ask the kernel to tell us > when the postmaster exits, but there is a possibility that the > postmaster has already exited; normally that causes an error with > errno == ESRCH (no such PID, it's already gone), but another unrelated > process might have started that has the same PID, so we check if our > ppid has changed after a successful return code. That's not going to > work under a debugger on this OS. Irk. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 17, 2020 at 9:30 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2020-Mar-17, Thomas Munro wrote: > > Reproduced here. The problem seems to be that macOS's getppid() > > returns the debugger's PID, while the debugger is attached. This > > doesn't happen on FreeBSD (even though the debugger does internally > > become the parent, getppid() is careful to return the "real" parent > > PID so that user space doesn't notice this trickery; apparently Apple > > made a different choice). > > Wow ... Yeah, that was a known problem with FreeBSD, see > https://postgr.es/m/1292851036-sup-5399@alvh.no-ip.org > Evidently FreeBSD must have fixed it, but macOS has not caught up with > that ... Oh, interesting. Sorry to bring a variant of this problem back. > > The getppid() check is there to close a vanishingly rare race > > condition: when creating a WaitEventSet, we ask the kernel to tell us > > when the postmaster exits, but there is a possibility that the > > postmaster has already exited; normally that causes an error with > > errno == ESRCH (no such PID, it's already gone), but another unrelated > > process might have started that has the same PID, so we check if our > > ppid has changed after a successful return code. That's not going to > > work under a debugger on this OS. > > Irk. I'm now far away from my home Mac so I can't test until later but I think we can fix this by double checking with the pipe: - else if (event->events == WL_POSTMASTER_DEATH && PostmasterPid != getppid()) + else if (event->events == WL_POSTMASTER_DEATH && + PostmasterPid != getppid() && + !PostmasterIsAliveInternal()) + { + /* + * The extra PostmasterIsAliveInternal() check prevents false alarms + * from systems where getppid() returns a debugger PID while being + * traced. + */ set->report_postmaster_not_running = true; + } The fast getppid() check will prevent the slow and redundant PostmasterIsAliveInternal() check from being reached on production systems, until the postmaster really is gone in the race scenario described. (Note that all of this per-lock-wait work will go away with https://commitfest.postgresql.org/27/2452/, so I'm glad Alexander found this now).
On Tue, Mar 17, 2020 at 10:21 AM Thomas Munro <thomas.munro@gmail.com> wrote: > I'm now far away from my home Mac so I can't test until later but I > think we can fix this by double checking with the pipe: Pushed.
Thomas Munro <thomas.munro@gmail.com> writes: > Pushed. prairiedog just turned up a different issue in this area [1]. I wondered why it hadn't reported in for awhile, and upon investigation I found that the test run was stuck in the final pg_dump step of the pg_upgrade test. pg_dump itself was waiting for a query result, while the connected backend was sitting here: (gdb) bt #0 0x9002ec88 in kevent () #1 0x0039cff8 in WaitEventSetWait (set=0x20c502c, timeout=-1, occurred_events=0xbfffdd4c, nevents=1, wait_event_info=100663296)at latch.c:1443 #2 0x00261d98 in secure_read (port=0x2401ba0, ptr=0x713558, len=8192) at be-secure.c:184 #3 0x00269d34 in pq_recvbuf () at pqcomm.c:950 #4 0x00269e24 in pq_getbyte () at pqcomm.c:993 #5 0x003cec2c in PostgresMain (argc=1, argv=0x38060ac, dbname=0x20c5154 "regression", username=0x20c5138 "buildfarm") atpostgres.c:337 #6 0x0032de0c in BackendStartup (port=0x2401ba0) at postmaster.c:4510 #7 0x0032fcf8 in PostmasterMain (argc=1585338749, argv=0x5e7e59b9) at postmaster.c:1727 #8 0x0026f32c in main (argc=6, argv=0x24009b0) at main.c:210 It'd appear that we dropped an input-is-available condition. Now prairiedog is running a museum-grade macOS release, so it's hardly impossible that this is a kernel bug not a Postgres bug. But we shouldn't jump to that conclusion, either, given that our kevent support is so new. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2020-03-27%2018%3A55%3A51 The log shows a SIGABRT trap, but that's because I manually did "kill -ABRT" to unblock the buildfarm animal.
On Sun, Mar 29, 2020 at 7:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Pushed. > > prairiedog just turned up a different issue in this area [1]. > I wondered why it hadn't reported in for awhile, and upon > investigation I found that the test run was stuck in the > final pg_dump step of the pg_upgrade test. pg_dump itself > was waiting for a query result, while the connected backend > was sitting here: > > (gdb) bt > #0 0x9002ec88 in kevent () > #1 0x0039cff8 in WaitEventSetWait (set=0x20c502c, timeout=-1, occurred_events=0xbfffdd4c, nevents=1, wait_event_info=100663296)at latch.c:1443 > #2 0x00261d98 in secure_read (port=0x2401ba0, ptr=0x713558, len=8192) at be-secure.c:184 > #3 0x00269d34 in pq_recvbuf () at pqcomm.c:950 > #4 0x00269e24 in pq_getbyte () at pqcomm.c:993 > #5 0x003cec2c in PostgresMain (argc=1, argv=0x38060ac, dbname=0x20c5154 "regression", username=0x20c5138 "buildfarm")at postgres.c:337 > #6 0x0032de0c in BackendStartup (port=0x2401ba0) at postmaster.c:4510 > #7 0x0032fcf8 in PostmasterMain (argc=1585338749, argv=0x5e7e59b9) at postmaster.c:1727 > #8 0x0026f32c in main (argc=6, argv=0x24009b0) at main.c:210 > > It'd appear that we dropped an input-is-available condition. > > Now prairiedog is running a museum-grade macOS release, so > it's hardly impossible that this is a kernel bug not a > Postgres bug. But we shouldn't jump to that conclusion, > either, given that our kevent support is so new. My first thought was that it might have been due to the EV_CLEAR flag problem discussed elsewhere, but the failing build has commit 9b8aa092 so that's not it. About the kernel bug hypothesis: I see that the libevent project doesn't use kqueue on early macOS versions due to some bug that it tests for that apparently fails on 10.4/kernel 8.11 (what you have there). Kqueue was added to macOS 10.3 (which pulled a bunch of code from FreeBSD 5 including this), so in 10.4 I suppose it was still somewhat new. I also found a few other vague complaints about bugs from that era including some claims of missing events, but without conclusions. The kernel source is mirrored on github with change history[1], but without commit log messages or a public bug tracker it's practically impossible for a drive-by reader to figure out what was broken and fixed. That seems like a bit of a wild dino-goose chase. Hmm, I see that Remi also runs an ancient PowerPC Mac on macOS 10.5/Darwin 9.8. His build farm animal "locust" hasn't reported in 22 days. Remi, is that animal down for other reasons, or could it be stuck like this? Further evidence for a version-specific problem is that there are surely many in our hacker community working on modern Macs, and I haven't heard of any problems so far. Of course that doesn't rule anything out. [1] https://github.com/apple/darwin-xnu/blob/master/bsd/kern/kern_event.c
> Le 28 mars 2020 à 23:25, Thomas Munro <thomas.munro@gmail.com> a écrit : > > Hmm, I see that Remi also runs an ancient PowerPC Mac on macOS > 10.5/Darwin 9.8. His build farm animal "locust" hasn't reported in 22 > days. Remi, is that animal down for other reasons, or could it be > stuck like this? Hi, locust is down, and due to circulation restrictions, I cannot access it for the moment. Sorry. Rémi