Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Date
Msg-id 20220701201423.sic5b77ei3atjbes@alap3.anarazel.de
Whole thread Raw
In response to Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?  (Andres Freund <andres@anarazel.de>)
Responses Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
List pgsql-hackers
Hi,

On 2022-06-21 19:33:01 -0700, Andres Freund wrote:
> On 2022-06-21 17:22:05 +1200, Thomas Munro wrote:
> > Problem: I saw 031_recovery_conflict.pl time out while waiting for a
> > buffer pin conflict, but so far once only, on CI:
> >
> > https://cirrus-ci.com/task/5956804860444672
> >
> > timed out waiting for match: (?^:User was holding shared buffer pin
> > for too long) at t/031_recovery_conflict.pl line 367.
> >
> > Hrmph.  Still trying to reproduce that, which may be a bug in this
> > patch, a bug in the test or a pre-existing problem.  Note that
> > recovery didn't say something like:
> >
> > 2022-06-21 17:05:40.931 NZST [57674] LOG:  recovery still waiting
> > after 11.197 ms: recovery conflict on buffer pin
> >
> > (That's what I'd expect to see in
> >
https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
> > if the startup process had decided to send the signal).
> >
> > ... so it seems like the problem in that run is upstream of the interrupt stuff.
>
> Odd. The only theory I have so far is that the manual vacuum on the primary
> somehow decided to skip the page, and thus didn't trigger a conflict. Because
> clearly replay progressed past the records of the VACUUM. Perhaps we should
> use VACUUM VERBOSE? In contrast to pg_regress tests that should be
> unproblematic?

I saw a couple failures of 031 on CI for the meson patch - which obviously
doesn't change anything around this. However it adds a lot more distributions,
and the added ones run in docker containers on a shared host, presumably
adding a lot of noise.  I saw this more frequently when I accidentally had the
test runs at the number of CPUs in the host, rather than the allotted CPUs in
the container.

That made me look more into these issues. I played with adding a pg_usleep()
to RecoveryConflictInterrupt() to simulate slow signal delivery.

Found a couple things:

- the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
  completely swamp the target(s) on a busy system. This surely is exascerbated
  by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
  does seem like a bad idea.

- we process the same recovery conflict (not a single signal, but a single
  "real conflict") multiple times in the target of a conflict, presumably
  while handling the error. That includes handling the same interrupt once as
  an ERROR and once as FATAL.

  E.g.

2022-07-01 12:19:14.428 PDT [2000572] LOG:  recovery still waiting after 10.032 ms: recovery conflict on buffer pin
2022-07-01 12:19:14.428 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0
ndead100; blkref #0: rel 1663/16385/1>
 
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR:  canceling statement due to conflict with
recoveryat character 15
 
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User transaction caused buffer deadlock with
recovery.
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT:  SELECT * FROM
test_recovery_conflict_table2;
2022-07-01 12:19:54.778 PDT [2000572] LOG:  recovery finished waiting after 40359.937 ms: recovery conflict on buffer
pin
2022-07-01 12:19:54.778 PDT [2000572] CONTEXT:  WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0
ndead100; blkref #0: rel 1663/16385/1>
 
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL:  terminating connection due to conflict with
recovery
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL:  User transaction caused buffer deadlock with
recovery.
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command.
 
2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG:  statement: SELECT 1;

  note that the startup process considers the conflict resolved by the time
  the backend handles the interrupt.

  I also see cases where a FATAL is repeated:

2022-07-01 12:43:18.190 PDT [2054721] LOG:  recovery still waiting after 15.410 ms: recovery conflict on snapshot
2022-07-01 12:43:18.190 PDT [2054721] DETAIL:  Conflicting process: 2054753.
2022-07-01 12:43:18.190 PDT [2054721] CONTEXT:  WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected
18ndead 0; blkref #0: rel 1663/16385/>
 
2054753: reporting recovery conflict 9
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL:  terminating connection due to conflict with
recovery
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL:  User query might have needed to see row
versionsthat must be removed.
 
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command.
 
...
2054753: reporting recovery conflict 9
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL:  terminating connection due to conflict with
recovery
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL:  User query might have needed to see row
versionsthat must be removed.
 
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT:  In a moment you should be able to reconnect to
thedatabase and repeat your command.
 

  the FATAL one seems like it might at least partially be due to
  RecoveryConflictPending not being reset in at least some of the FATAL
  recovery conflict paths.

  It seems pretty obvious that the proc_exit_inprogress check in
  RecoveryConflictInterrupt() is misplaced, and needs to be where the errors
  are thrown. But that won't help, because it turns out, we don't yet set that
  necessarily.

  Look at this stack from an assertion in ProcessInterrupts() ensuring that
  the same FATAL isn't raised twice:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007fd47897b546 in __GI_abort () at abort.c:79
#2  0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6
"FailedAssertion",
    fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259)
    at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
#3  0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259
#4  0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0
<__func__.8>"internal_flush")
 
    at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683
#5  0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393
#6  0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340
#7  0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at
/home/andres/src/postgresql/src/backend/utils/error/elog.c:3283
#8  0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
#9  0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0
<__func__.9>"ProcessInterrupts")
 
    at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
#10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266
#11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at
/home/andres/src/postgresql/src/backend/tcop/postgres.c:497
#12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192)

  reporting a FATAL error in process of reporting a FATAL error. Yeha.

  I assume this could lead to sending out the same message quite a few times.



This is quite the mess.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ibrar Ahmed
Date:
Subject: Re: Minimal logical decoding on standbys
Next
From: "Finnerty, Jim"
Date:
Subject: Re: Making Vars outer-join aware