Thread: Unstable tests for recovery conflict handling

Unstable tests for recovery conflict handling

From
Tom Lane
Date:
[ starting a new thread cuz the shared-stats one is way too long ]

Andres Freund <andres@anarazel.de> writes:
> Add minimal tests for recovery conflict handling.

It's been kind of hidden by other buildfarm noise, but
031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].

Three of those failures look like

[11:08:46.806](105.129s) ok 1 - buffer pin conflict: cursor with conflicting pin established
Waiting for replication conn standby's replay_lsn to pass 0/33EF190 on primary
[12:01:49.614](3182.807s) # poll_query_until timed out executing this query:
# SELECT '0/33EF190' <= replay_lsn AND state = 'streaming'
#          FROM pg_catalog.pg_stat_replication
#          WHERE application_name IN ('standby', 'walreceiver')
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for catchup at t/031_recovery_conflict.pl line 123.

In each of these examples we can see in the standby's log that it
detected the expected buffer pin conflict:

2022-04-27 11:08:46.353 UTC [1961604][client backend][2/2:0] LOG:  statement: BEGIN;
2022-04-27 11:08:46.416 UTC [1961604][client backend][2/2:0] LOG:  statement: DECLARE test_recovery_conflict_cursor
CURSORFOR SELECT b FROM test_recovery_conflict_table1; 
2022-04-27 11:08:46.730 UTC [1961604][client backend][2/2:0] LOG:  statement: FETCH FORWARD FROM
test_recovery_conflict_cursor;
2022-04-27 11:08:47.825 UTC [1961298][startup][1/0:0] LOG:  recovery still waiting after 13.367 ms: recovery conflict
onbuffer pin 
2022-04-27 11:08:47.825 UTC [1961298][startup][1/0:0] CONTEXT:  WAL redo at 0/33E6E80 for Heap2/PRUNE: latestRemovedXid
0nredirected 0 ndead 1; blkref #0: rel 1663/16385/16386, blk 0 

but then nothing happens until the script times out and kills the test.
I think this is showing us a real bug, ie we sometimes fail to cancel
the conflicting query.

The other example [3] looks different:

[01:02:43.582](2.357s) ok 1 - buffer pin conflict: cursor with conflicting pin established
Waiting for replication conn standby's replay_lsn to pass 0/342C000 on primary
done
[01:02:43.747](0.165s) ok 2 - buffer pin conflict: logfile contains terminated connection due to recovery conflict
[01:02:43.804](0.057s) not ok 3 - buffer pin conflict: stats show conflict on standby
[01:02:43.805](0.000s)
[01:02:43.805](0.000s) #   Failed test 'buffer pin conflict: stats show conflict on standby'
#   at t/031_recovery_conflict.pl line 295.
[01:02:43.805](0.000s) #          got: '0'
#     expected: '1'

Not sure what to make of that --- could there be a race condition in the
reporting of the conflict?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-04-27%2007%3A16%3A51
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2022-04-21%2021%3A20%3A15
[3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2022-04-13%2022%3A45%3A30
[4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-04-11%2005%3A40%3A41



Re: Unstable tests for recovery conflict handling

From
Mark Dilger
Date:

> On Apr 27, 2022, at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> [ starting a new thread cuz the shared-stats one is way too long ]
>
> Andres Freund <andres@anarazel.de> writes:
>> Add minimal tests for recovery conflict handling.
>
> It's been kind of hidden by other buildfarm noise, but
> 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].
>
> Three of those failures look like

Interesting,

I have been getting failures on REL_14_STABLE:

t/012_subtransactions.pl ............. 11/12
#   Failed test 'Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted standby'
#   at t/012_subtransactions.pl line 211.
#          got: '3'
#     expected: '0'
t/012_subtransactions.pl ............. 12/12 # Looks like you failed 1 test of 12.
t/012_subtransactions.pl ............. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/12 subtests

And the logs, tmp_check/log/regress_log_012_subtransactions, showing:

### Enabling streaming replication for node "primary"
### Starting node "primary"
# Running: pg_ctl -D
/Users/mark.dilger/recovery_test/postgresql/src/test/recovery/tmp_check/t_012_subtransactions_primary_data/pgdata-l
/Users/mark.dilger/recovery_test/postgresql/src/test/recovery/tmp_check/log/012_subtransactions_primary.log-o
--cluster-name=primarystart 
waiting for server to start.... done
server started
# Postmaster PID for node "primary" is 46270
psql:<stdin>:1: ERROR:  prepared transaction with identifier "xact_012_1" does not exist
not ok 11 - Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted standby

#   Failed test 'Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted standby'
#   at t/012_subtransactions.pl line 211.
#          got: '3'
#     expected: '0'


This is quite consistent for me, but only when I configure with --enable-coverage and --enable-dtrace.  (I haven't yet
triedone of those without the other.) 

I wasn't going to report this yet, having not yet completely narrowed this down, but I wonder if anybody else is seeing
this?

I'll try again on master....

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Unstable tests for recovery conflict handling

From
Mark Dilger
Date:

> On Apr 27, 2022, at 10:11 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
> I'll try again on master....

Still with coverage and dtrace enabled, I get the same thing, except that master formats the logs a bit differently:

# Postmaster PID for node "primary" is 19797
psql:<stdin>:1: ERROR:  prepared transaction with identifier "xact_012_1" does not exist
[10:26:16.314](1.215s) not ok 11 - Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted standby
[10:26:16.314](0.000s)
[10:26:16.314](0.000s) #   Failed test 'Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted
standby'
[10:26:16.314](0.000s) #   at t/012_subtransactions.pl line 208.
[10:26:16.314](0.000s) #          got: '3'
#     expected: '0'


With coverage but not dtrace enabled, I still get the error, though the log leading up to the error now has a bunch of
coveragenoise lines like: 

profiling: /Users/mark.dilger/recovery_test/postgresql/src/backend/utils/sort/tuplesort.gcda: cannot merge previous
GCDAfile: corrupt arc tag 

The error itself looks the same except the timing numbers differ a little.


With neither enabled, all tests pass.


I'm inclined to think that either the recovery code or the test have a race condition, and that enabling coverage
causesthe race to come out differently.  I'll keep poking.... 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Unstable tests for recovery conflict handling

From
Tom Lane
Date:
I wrote:
> It's been kind of hidden by other buildfarm noise, but
> 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].
> ...
> I think this is showing us a real bug, ie we sometimes fail to cancel
> the conflicting query.

After digging around in the code, I think this is almost certainly
some manifestation of the previously-complained-of problem [1] that
RecoveryConflictInterrupt is not safe to call in a signal handler,
leading the conflicting backend to sometimes decide that it's not
the problem.  That squares with the observation that skink is more
prone to show this than other animals: you'd have to get the SIGUSR1
while the target backend isn't idle, so a very slow machine ought to
show it more.  We don't seem to have that issue on the open items
list, but I'll go add it.

Not sure if the 'buffer pin conflict: stats show conflict on standby'
failure could trace to a similar cause.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com



Re: Unstable tests for recovery conflict handling

From
Andres Freund
Date:
Hi,

On 2022-04-27 10:11:53 -0700, Mark Dilger wrote:
> 
> 
> > On Apr 27, 2022, at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > [ starting a new thread cuz the shared-stats one is way too long ]
> > 
> > Andres Freund <andres@anarazel.de> writes:
> >> Add minimal tests for recovery conflict handling.
> > 
> > It's been kind of hidden by other buildfarm noise, but
> > 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].
> > 
> > Three of those failures look like
> 
> Interesting,
> 
> I have been getting failures on REL_14_STABLE:
> 
> t/012_subtransactions.pl ............. 11/12                                                            
> #   Failed test 'Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted standby'
> #   at t/012_subtransactions.pl line 211.
> #          got: '3'
> #     expected: '0'
> t/012_subtransactions.pl ............. 12/12 # Looks like you failed 1 test of 12.
> t/012_subtransactions.pl ............. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/12 subtests 
> 
> And the logs, tmp_check/log/regress_log_012_subtransactions, showing:

I'm a bit confused - what's the relation of that failure to this thread / the
tests / this commit?

Greetings,

Andres Freund



Re: Unstable tests for recovery conflict handling

From
Andres Freund
Date:
Hi,

On 2022-04-27 14:08:45 -0400, Tom Lane wrote:
> I wrote:
> > It's been kind of hidden by other buildfarm noise, but
> > 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].
> > ...
> > I think this is showing us a real bug, ie we sometimes fail to cancel
> > the conflicting query.
> 
> After digging around in the code, I think this is almost certainly
> some manifestation of the previously-complained-of problem [1] that
> RecoveryConflictInterrupt is not safe to call in a signal handler,
> leading the conflicting backend to sometimes decide that it's not
> the problem.

I think at least some may actually be because of
https://postgr.es/m/20220409045515.35ypjzddp25v72ou%40alap3.anarazel.de
rather than RecoveryConflictInterrupt itself.

I'll go an finish up the comment bits that I still need to clean up in my
bugix and commit that...

Greetings,

Andres Freund



Re: Unstable tests for recovery conflict handling

From
Mark Dilger
Date:

> On Apr 27, 2022, at 6:26 PM, Andres Freund <andres@anarazel.de> wrote:
>
> I'm a bit confused - what's the relation of that failure to this thread / the
> tests / this commit?

None, upon further reflection.  It turned out to be unrelated.  Sorry for the noise.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Unstable tests for recovery conflict handling

From
Thomas Munro
Date:
On Thu, Apr 28, 2022 at 5:50 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> psql:<stdin>:1: ERROR:  prepared transaction with identifier "xact_012_1" does not exist
> [10:26:16.314](1.215s) not ok 11 - Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted standby
> [10:26:16.314](0.000s)
> [10:26:16.314](0.000s) #   Failed test 'Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted
standby'
> [10:26:16.314](0.000s) #   at t/012_subtransactions.pl line 208.
> [10:26:16.314](0.000s) #          got: '3'
> #     expected: '0'

FWIW I see that on my FBSD/clang system when I build with
-fsanitize=undefined -fno-sanitize-recover=all.  It's something to do
with our stack depth detection and tricks being used by -fsanitize,
because there's a stack depth exceeded message in the log.



Re: Unstable tests for recovery conflict handling

From
Tom Lane
Date:
I wrote:
>> It's been kind of hidden by other buildfarm noise, but
>> 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].

> After digging around in the code, I think this is almost certainly
> some manifestation of the previously-complained-of problem [1] that
> RecoveryConflictInterrupt is not safe to call in a signal handler,
> leading the conflicting backend to sometimes decide that it's not
> the problem.

I happened to notice that while skink continues to fail off-and-on
in 031_recovery_conflict.pl, the symptoms have changed!  What
we're getting now typically looks like [1]:

[10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting
Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary
done
timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl
line367. 

where absolutely nothing happens in the standby log, until we time out:

2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG:  statement: SELECT * FROM
test_recovery_conflict_table2;
2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG:  statement: SELECT 'waiting' FROM pg_locks WHERE
locktype= 'relation' AND NOT granted; 
2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL:  could not receive data from WAL stream: server closed
theconnection unexpectedly 

So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
the startup process hasn't detected the buffer conflict in the first
place.  Don't know what to make of that, but I vaguely suspect a test
timing problem.  gull has shown this once as well, although at a different
step in the script [2].

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-07-24%2007%3A00%3A29
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2022-07-23%2009%3A34%3A54



Re: Unstable tests for recovery conflict handling

From
Andres Freund
Date:
Hi,

On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> I happened to notice that while skink continues to fail off-and-on
> in 031_recovery_conflict.pl, the symptoms have changed!  What
> we're getting now typically looks like [1]:
> 
> [10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting
> Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary
> done
> timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at
t/031_recovery_conflict.plline 367.
 
> 
> where absolutely nothing happens in the standby log, until we time out:
> 
> 2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG:  statement: SELECT * FROM
test_recovery_conflict_table2;
> 2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG:  statement: SELECT 'waiting' FROM pg_locks WHERE
locktype= 'relation' AND NOT granted;
 
> 2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL:  could not receive data from WAL stream: server closed
theconnection unexpectedly
 
> 
> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> the startup process hasn't detected the buffer conflict in the first
> place.

I wonder if this, at least partially, could be be due to the elog thing
I was complaining about nearby. I.e. we decide to FATAL as part of a
recovery conflict interrupt, and then during that ERROR out as part of
another recovery conflict interrupt (because nothing holds interrupts as
part of FATAL).

Greetings,

Andres Freund



Re: Unstable tests for recovery conflict handling

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
>> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
>> the startup process hasn't detected the buffer conflict in the first
>> place.

> I wonder if this, at least partially, could be be due to the elog thing
> I was complaining about nearby. I.e. we decide to FATAL as part of a
> recovery conflict interrupt, and then during that ERROR out as part of
> another recovery conflict interrupt (because nothing holds interrupts as
> part of FATAL).

There are all sorts of things one could imagine going wrong in the
backend receiving the recovery conflict interrupt, but AFAICS in these
failures, the startup process hasn't sent a recovery conflict interrupt.
It certainly hasn't logged anything suggesting it noticed a conflict.

            regards, tom lane



Re: Unstable tests for recovery conflict handling

From
Andres Freund
Date:
Hi,

On 2022-07-26 14:30:30 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> >> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> >> the startup process hasn't detected the buffer conflict in the first
> >> place.
> 
> > I wonder if this, at least partially, could be be due to the elog thing
> > I was complaining about nearby. I.e. we decide to FATAL as part of a
> > recovery conflict interrupt, and then during that ERROR out as part of
> > another recovery conflict interrupt (because nothing holds interrupts as
> > part of FATAL).
> 
> There are all sorts of things one could imagine going wrong in the
> backend receiving the recovery conflict interrupt, but AFAICS in these
> failures, the startup process hasn't sent a recovery conflict interrupt.
> It certainly hasn't logged anything suggesting it noticed a conflict.

I don't think we reliably emit a log message before the recovery
conflict is resolved.

I've wondered a couple times now about making tap test timeouts somehow
trigger a core dump of all processes. Certainly would make it easier to
debug some of these kinds of issues.

Greetings,

Andres Freund



Re: Unstable tests for recovery conflict handling

From
Andres Freund
Date:
Hi,

On 2022-07-26 13:03:54 -0700, Andres Freund wrote:
> On 2022-07-26 14:30:30 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> > >> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> > >> the startup process hasn't detected the buffer conflict in the first
> > >> place.
> >
> > > I wonder if this, at least partially, could be be due to the elog thing
> > > I was complaining about nearby. I.e. we decide to FATAL as part of a
> > > recovery conflict interrupt, and then during that ERROR out as part of
> > > another recovery conflict interrupt (because nothing holds interrupts as
> > > part of FATAL).
> >
> > There are all sorts of things one could imagine going wrong in the
> > backend receiving the recovery conflict interrupt, but AFAICS in these
> > failures, the startup process hasn't sent a recovery conflict interrupt.
> > It certainly hasn't logged anything suggesting it noticed a conflict.
>
> I don't think we reliably emit a log message before the recovery
> conflict is resolved.

I played around trying to reproduce this kind of issue.

One way to quickly run into trouble on a slow system is that
ResolveRecoveryConflictWithVirtualXIDs() can end up sending signals more
frequently than the target can process them. The next signal can arrive by the
time SIGUSR1 processing finished, which, at least on linux, causes the queued
signal to immediately be processed, without "normal" postgres code gaining
control.

The reason nothing might get logged in some cases is that
e.g. ResolveRecoveryConflictWithLock() tells
ResolveRecoveryConflictWithVirtualXIDs() to *not* report the waiting:
        /*
         * Prevent ResolveRecoveryConflictWithVirtualXIDs() from reporting
         * "waiting" in PS display by disabling its argument report_waiting
         * because the caller, WaitOnLock(), has already reported that.
         */

so ResolveRecoveryConflictWithLock() can end up looping indefinitely without
logging anything.



Another question I have about ResolveRecoveryConflictWithLock() is whether
it's ok that we don't check deadlocks around the
ResolveRecoveryConflictWithVirtualXIDs() call? It might be ok, because we'd
only block if there's a recovery conflict, in which killing the process ought
to succeed?


I think there's also might be a problem with the wait loop in ProcSleep() wrt
recovery conflicts: We rely on interrupts to be processed to throw recovery
conflict errors, but ProcSleep() is called in a bunch of places with
interrupts held. An Assert(INTERRUPTS_CAN_BE_PROCESSED()) after releasing the
partition lock triggers a bunch.  It's possible that these aren't problematic
cases for recovery conflicts, because they're all around extension locks:

#2  0x0000562032f1968d in ExceptionalCondition (conditionName=0x56203310623a "INTERRUPTS_CAN_BE_PROCESSED()",
errorType=0x562033105f6c"FailedAssertion",
 
    fileName=0x562033105f30 "/home/andres/src/postgresql/src/backend/storage/lmgr/proc.c", lineNumber=1208)
    at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
#3  0x0000562032d50f41 in ProcSleep (locallock=0x562034cafaf0, lockMethodTable=0x562033281740 <default_lockmethod>)
    at /home/andres/src/postgresql/src/backend/storage/lmgr/proc.c:1208
#4  0x0000562032d3e2ce in WaitOnLock (locallock=0x562034cafaf0, owner=0x562034d12c58) at
/home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1859
#5  0x0000562032d3cd0a in LockAcquireExtended (locktag=0x7ffc7b4d0810, lockmode=7, sessionLock=false, dontWait=false,
reportMemoryError=true,locallockp=0x0)
 
    at /home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1101
#6  0x0000562032d3c1c4 in LockAcquire (locktag=0x7ffc7b4d0810, lockmode=7, sessionLock=false, dontWait=false)
    at /home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:752
#7  0x0000562032d3a696 in LockRelationForExtension (relation=0x7f54646b1dd8, lockmode=7) at
/home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:439
#8  0x0000562032894276 in _bt_getbuf (rel=0x7f54646b1dd8, blkno=4294967295, access=2) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtpage.c:975
#9  0x000056203288f1cb in _bt_split (rel=0x7f54646b1dd8, itup_key=0x562034ea7428, buf=770, cbuf=0, newitemoff=408,
newitemsz=16,newitem=0x562034ea3fc8,
 
    orignewitem=0x0, nposting=0x0, postingoff=0) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:1715
#10 0x000056203288e4bb in _bt_insertonpg (rel=0x7f54646b1dd8, itup_key=0x562034ea7428, buf=770, cbuf=0,
stack=0x562034ea1fb8,itup=0x562034ea3fc8, itemsz=16,
 
    newitemoff=408, postingoff=0, split_only_page=false) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:1212
#11 0x000056203288caf9 in _bt_doinsert (rel=0x7f54646b1dd8, itup=0x562034ea3fc8, checkUnique=UNIQUE_CHECK_YES,
indexUnchanged=false,heapRel=0x7f546823dde0)
 
    at /home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:258
#12 0x000056203289851f in btinsert (rel=0x7f54646b1dd8, values=0x7ffc7b4d0c50, isnull=0x7ffc7b4d0c30,
ht_ctid=0x562034dd083c,heapRel=0x7f546823dde0,
 
    checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x562034ea71c0) at
/home/andres/src/postgresql/src/backend/access/nbtree/nbtree.c:200
#13 0x000056203288710b in index_insert (indexRelation=0x7f54646b1dd8, values=0x7ffc7b4d0c50, isnull=0x7ffc7b4d0c30,
heap_t_ctid=0x562034dd083c,
    heapRelation=0x7f546823dde0, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x562034ea71c0)
    at /home/andres/src/postgresql/src/backend/access/index/indexam.c:193
#14 0x000056203292e9da in CatalogIndexInsert (indstate=0x562034dd02b0, heapTuple=0x562034dd0838)

(gdb) p num_held_lwlocks
$14 = 1
(gdb) p held_lwlocks[0]
$15 = {lock = 0x7f1a0d18d2e4, mode = LW_EXCLUSIVE}
(gdb) p held_lwlocks[0].lock->tranche
$16 = 56
(gdb) p BuiltinTrancheNames[held_lwlocks[0].lock->tranche - NUM_INDIVIDUAL_LWLOCKS]
$17 = 0x558ce5710ede "BufferContent"


Independent of recovery conflicts, isn't it dangerous that we acquire the
relation extension lock with a buffer content lock held? I *guess* it might be
ok because BufferAlloc(P_NEW) only acquires buffer content locks in a
conditional way.


Greetings,

Andres Freund



Re: Unstable tests for recovery conflict handling

From
Robert Haas
Date:
On Wed, Aug 3, 2022 at 1:57 PM Andres Freund <andres@anarazel.de> wrote:
> The reason nothing might get logged in some cases is that
> e.g. ResolveRecoveryConflictWithLock() tells
> ResolveRecoveryConflictWithVirtualXIDs() to *not* report the waiting:
>                 /*
>                  * Prevent ResolveRecoveryConflictWithVirtualXIDs() from reporting
>                  * "waiting" in PS display by disabling its argument report_waiting
>                  * because the caller, WaitOnLock(), has already reported that.
>                  */
>
> so ResolveRecoveryConflictWithLock() can end up looping indefinitely without
> logging anything.

I understand why we need to avoid adding "waiting" to the PS status
when we've already done that, but it doesn't seem like that should
imply skipping ereport() of log messages.

I think we could redesign the way the ps display works to make things
a whole lot simpler. Let's have a function set_ps_display() and
another function set_ps_display_suffix(). What gets reported to the OS
is the concatenation of the two. Calling set_ps_display() implicitly
resets the suffix to empty.

AFAICS, that'd let us get rid of this tricky logic, and some other
tricky logic as well. Here, we'd just say set_ps_display_suffix("
waiting") and not worry about whether the caller might have already
done something similar.

> Another question I have about ResolveRecoveryConflictWithLock() is whether
> it's ok that we don't check deadlocks around the
> ResolveRecoveryConflictWithVirtualXIDs() call? It might be ok, because we'd
> only block if there's a recovery conflict, in which killing the process ought
> to succeed?

The startup process is supposed to always "win" in any deadlock
situation, so I'm not sure what you think is a problem here. We get
the conflicting lockers. We kill them. If they don't die, that's a
bug, but killing ourselves doesn't really help anything; if we die,
the whole system goes down, which seems undesirable.

> I think there's also might be a problem with the wait loop in ProcSleep() wrt
> recovery conflicts: We rely on interrupts to be processed to throw recovery
> conflict errors, but ProcSleep() is called in a bunch of places with
> interrupts held. An Assert(INTERRUPTS_CAN_BE_PROCESSED()) after releasing the
> partition lock triggers a bunch.  It's possible that these aren't problematic
> cases for recovery conflicts, because they're all around extension locks:
> [...]
> Independent of recovery conflicts, isn't it dangerous that we acquire the
> relation extension lock with a buffer content lock held? I *guess* it might be
> ok because BufferAlloc(P_NEW) only acquires buffer content locks in a
> conditional way.

These things both seem a bit sketchy but it's not 100% clear to me
that anything is actually broken. Now it's also not clear to me that
nothing is broken ...

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Unstable tests for recovery conflict handling

From
Andres Freund
Date:
Hi,

On 2022-08-03 16:33:46 -0400, Robert Haas wrote:
> On Wed, Aug 3, 2022 at 1:57 PM Andres Freund <andres@anarazel.de> wrote:
> > The reason nothing might get logged in some cases is that
> > e.g. ResolveRecoveryConflictWithLock() tells
> > ResolveRecoveryConflictWithVirtualXIDs() to *not* report the waiting:
> >                 /*
> >                  * Prevent ResolveRecoveryConflictWithVirtualXIDs() from reporting
> >                  * "waiting" in PS display by disabling its argument report_waiting
> >                  * because the caller, WaitOnLock(), has already reported that.
> >                  */
> >
> > so ResolveRecoveryConflictWithLock() can end up looping indefinitely without
> > logging anything.
> 
> I understand why we need to avoid adding "waiting" to the PS status
> when we've already done that, but it doesn't seem like that should
> imply skipping ereport() of log messages.
> 
> I think we could redesign the way the ps display works to make things
> a whole lot simpler. Let's have a function set_ps_display() and
> another function set_ps_display_suffix(). What gets reported to the OS
> is the concatenation of the two. Calling set_ps_display() implicitly
> resets the suffix to empty.
> 
> AFAICS, that'd let us get rid of this tricky logic, and some other
> tricky logic as well. Here, we'd just say set_ps_display_suffix("
> waiting") and not worry about whether the caller might have already
> done something similar.

That sounds like it'd be an improvement.  Of course we still need to fix that
we can signal at a rate not allowing the other side to handle the conflict,
but at least that'd be easier to identify...


> > Another question I have about ResolveRecoveryConflictWithLock() is whether
> > it's ok that we don't check deadlocks around the
> > ResolveRecoveryConflictWithVirtualXIDs() call? It might be ok, because we'd
> > only block if there's a recovery conflict, in which killing the process ought
> > to succeed?
> 
> The startup process is supposed to always "win" in any deadlock
> situation, so I'm not sure what you think is a problem here. We get
> the conflicting lockers. We kill them. If they don't die, that's a
> bug, but killing ourselves doesn't really help anything; if we die,
> the whole system goes down, which seems undesirable.

The way deadlock timeout for the startup process works is that we wait for it
to pass and then send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK to the
backends. So it's not that the startup process would die.

The question is basically whether there are cases were
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK would resolve a conflict but
PROCSIG_RECOVERY_CONFLICT_LOCK wouldn't. It seems plausible that there isn't,
but it's also not obvious enough that I'd fully trust it.

Greetings,

Andres Freund