Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? |
Date | |
Msg-id | 20220701201423.sic5b77ei3atjbes@alap3.anarazel.de Whole thread Raw |
In response to | Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
|
List | pgsql-hackers |
Hi, On 2022-06-21 19:33:01 -0700, Andres Freund wrote: > On 2022-06-21 17:22:05 +1200, Thomas Munro wrote: > > Problem: I saw 031_recovery_conflict.pl time out while waiting for a > > buffer pin conflict, but so far once only, on CI: > > > > https://cirrus-ci.com/task/5956804860444672 > > > > timed out waiting for match: (?^:User was holding shared buffer pin > > for too long) at t/031_recovery_conflict.pl line 367. > > > > Hrmph. Still trying to reproduce that, which may be a bug in this > > patch, a bug in the test or a pre-existing problem. Note that > > recovery didn't say something like: > > > > 2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting > > after 11.197 ms: recovery conflict on buffer pin > > > > (That's what I'd expect to see in > > https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log > > if the startup process had decided to send the signal). > > > > ... so it seems like the problem in that run is upstream of the interrupt stuff. > > Odd. The only theory I have so far is that the manual vacuum on the primary > somehow decided to skip the page, and thus didn't trigger a conflict. Because > clearly replay progressed past the records of the VACUUM. Perhaps we should > use VACUUM VERBOSE? In contrast to pg_regress tests that should be > unproblematic? I saw a couple failures of 031 on CI for the meson patch - which obviously doesn't change anything around this. However it adds a lot more distributions, and the added ones run in docker containers on a shared host, presumably adding a lot of noise. I saw this more frequently when I accidentally had the test runs at the number of CPUs in the host, rather than the allotted CPUs in the container. That made me look more into these issues. I played with adding a pg_usleep() to RecoveryConflictInterrupt() to simulate slow signal delivery. Found a couple things: - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can completely swamp the target(s) on a busy system. This surely is exascerbated by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace does seem like a bad idea. - we process the same recovery conflict (not a single signal, but a single "real conflict") multiple times in the target of a conflict, presumably while handling the error. That includes handling the same interrupt once as an ERROR and once as FATAL. E.g. 2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin 2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead100; blkref #0: rel 1663/16385/1> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recoveryat character 15 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2; 2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin 2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead100; blkref #0: rel 1663/16385/1> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. 2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1; note that the startup process considers the conflict resolved by the time the backend handles the interrupt. I also see cases where a FATAL is repeated: 2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot 2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753. 2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected 18ndead 0; blkref #0: rel 1663/16385/> 2054753: reporting recovery conflict 9 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versionsthat must be removed. 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. ... 2054753: reporting recovery conflict 9 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versionsthat must be removed. 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. the FATAL one seems like it might at least partially be due to RecoveryConflictPending not being reset in at least some of the FATAL recovery conflict paths. It seems pretty obvious that the proc_exit_inprogress check in RecoveryConflictInterrupt() is misplaced, and needs to be where the errors are thrown. But that won't help, because it turns out, we don't yet set that necessarily. Look at this stack from an assertion in ProcessInterrupts() ensuring that the same FATAL isn't raised twice: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x00007fd47897b546 in __GI_abort () at abort.c:79 #2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion", fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259) at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69 #3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259 #4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8>"internal_flush") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683 #5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393 #6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340 #7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283 #8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541 #9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9>"ProcessInterrupts") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592 #10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266 #11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497 #12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192) reporting a FATAL error in process of reporting a FATAL error. Yeha. I assume this could lead to sending out the same message quite a few times. This is quite the mess. Greetings, Andres Freund
pgsql-hackers by date: