Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Date
Msg-id 20220701231833.vh76xkf3udani3np@alap3.anarazel.de
Whole thread Raw
In response to Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?  (Andres Freund <andres@anarazel.de>)
Responses Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
List pgsql-hackers
HHi,

On 2022-07-01 13:14:23 -0700, Andres Freund wrote:
> I saw a couple failures of 031 on CI for the meson patch - which obviously
> doesn't change anything around this. However it adds a lot more distributions,
> and the added ones run in docker containers on a shared host, presumably
> adding a lot of noise.  I saw this more frequently when I accidentally had the
> test runs at the number of CPUs in the host, rather than the allotted CPUs in
> the container.
>
> That made me look more into these issues. I played with adding a pg_usleep()
> to RecoveryConflictInterrupt() to simulate slow signal delivery.
>
> Found a couple things:
>
> - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
>   completely swamp the target(s) on a busy system. This surely is exascerbated
>   by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
>   does seem like a bad idea.

This one is also implicated in
https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
and related issues.

Besides being very short, it also seems like a bad idea to wait when we might
not need to? Seems we should only wait if we subsequently couldn't get the
lock?

Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at
least I wouldn't expect given its name.


A minimal fix would be to increase the wait time, similar how it is done with
standbyWait_us?

Medium term it seems we ought to set the startup process's latch when handling
a conflict, and use a latch wait. But avoiding races probably isn't quite
trivial.


> - we process the same recovery conflict (not a single signal, but a single
>   "real conflict") multiple times in the target of a conflict, presumably
>   while handling the error. That includes handling the same interrupt once as
>   an ERROR and once as FATAL.
>
>   E.g.
>
> 2022-07-01 12:19:14.428 PDT [2000572] LOG:  recovery still waiting after 10.032 ms: recovery conflict on buffer pin
> 2022-07-01 12:19:14.428 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected
0ndead 100; blkref #0: rel 1663/16385/1>
 
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR:  canceling statement due to conflict with
recoveryat character 15
 
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User transaction caused buffer deadlock with
recovery.
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT:  SELECT * FROM
test_recovery_conflict_table2;
> 2022-07-01 12:19:54.778 PDT [2000572] LOG:  recovery finished waiting after 40359.937 ms: recovery conflict on buffer
pin
> 2022-07-01 12:19:54.778 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected
0ndead 100; blkref #0: rel 1663/16385/1>
 
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL:  terminating connection due to conflict with
recovery
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User transaction caused buffer deadlock with
recovery.
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command.
 
> 2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG:  statement: SELECT 1;
>
>   note that the startup process considers the conflict resolved by the time
>   the backend handles the interrupt.

I guess the reason we first get an ERROR and then a FATAL is that the second
iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit,
because we end up there after handling the first error? And that's a FATAL.

I suspect that Thomas' fix will address at least part of this, as the check
whether we're still waiting for a lock will be made just before the error is
thrown.


>   I also see cases where a FATAL is repeated:
>
> 2022-07-01 12:43:18.190 PDT [2054721] LOG:  recovery still waiting after 15.410 ms: recovery conflict on snapshot
> 2022-07-01 12:43:18.190 PDT [2054721] DETAIL:  Conflicting process: 2054753.
> 2022-07-01 12:43:18.190 PDT [2054721] CONTEXT:  WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732
nredirected18 ndead 0; blkref #0: rel 1663/16385/>
 
> 2054753: reporting recovery conflict 9
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL:  terminating connection due to conflict with
recovery
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL:  User query might have needed to see row
versionsthat must be removed.
 
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command.
 
> ...
> 2054753: reporting recovery conflict 9
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL:  terminating connection due to conflict with
recovery
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL:  User query might have needed to see row
versionsthat must be removed.
 
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command.
 
>
>   the FATAL one seems like it might at least partially be due to
>   RecoveryConflictPending not being reset in at least some of the FATAL
>   recovery conflict paths.
>
>   It seems pretty obvious that the proc_exit_inprogress check in
>   RecoveryConflictInterrupt() is misplaced, and needs to be where the errors
>   are thrown. But that won't help, because it turns out, we don't yet set that
>   necessarily.
>
>   Look at this stack from an assertion in ProcessInterrupts() ensuring that
>   the same FATAL isn't raised twice:
>
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
> #1  0x00007fd47897b546 in __GI_abort () at abort.c:79
> #2  0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6
"FailedAssertion",
>     fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259)
>     at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
> #3  0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259
> #4  0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0
<__func__.8>"internal_flush")
 
>     at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683
> #5  0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393
> #6  0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340
> #7  0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at
/home/andres/src/postgresql/src/backend/utils/error/elog.c:3283
> #8  0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
> #9  0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0
<__func__.9>"ProcessInterrupts")
 
>     at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
> #10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266
> #11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at
/home/andres/src/postgresql/src/backend/tcop/postgres.c:497
> #12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192)
>
>   reporting a FATAL error in process of reporting a FATAL error. Yeha.
>
>   I assume this could lead to sending out the same message quite a few
>   times.

This seems like it needs to be fixed in elog.c. ISTM that at the very least we
ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: pg15b2: large objects lost on upgrade
Next
From: Tom Lane
Date:
Subject: Re: should check interrupts in BuildRelationExtStatistics ?