Re: Refactor recovery conflict signaling a little - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Refactor recovery conflict signaling a little
Date
Msg-id 25f20e5b-786e-42c2-9f0c-b253742ddbf5@gmail.com
Whole thread Raw
In response to Re: Refactor recovery conflict signaling a little  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Refactor recovery conflict signaling a little
List pgsql-hackers
Hello Heikki,

03.03.2026 17:39, Heikki Linnakangas wrote:
> On 24/02/2026 10:00, Alexander Lakhin wrote:
>> The "terminating process ..." message doesn't appear when the test passes
>> successfully.
>
> Hmm, right, looks like something wrong in signaling the recovery conflict. I can't tell if the signal is being sent,

> or it's not processed correctly. Looking at the code, I don't see anything wrong.
>
>> I've managed to reproduce this with multiple (20) test instances running
>> in a loop (it failed within 10 iterations for me); `git bisect` for this
>> anomaly pointed at 17f51ea81.
>
> I've been trying to reproduce this locally, but so far not success, after thousands of iterations.
>
> If you can still reproduce this, can you try it with code changes from the attached 
> recovery-conflict-fail-extra-logging.patch, which adds some extra logging, and send over the logs please?
>
> The recovery-conflict-fail-repro-attempt.patch contains very hacky changes to the test, to run the just the failing 
> part 100 times in a loop. That's just to show what I used to try to reproduce this, but no luck.

Thank you for your attention to it!

I can easily reproduce the failure with the attached script when using SSD
(the test didn't fail for me on tmpfs and HDD, probably some tuning needed
):
ITERATION 1
...
19      t/035_standby_logical_decoding.pl .. ok
19      All tests successful.
19      Files=1, Tests=28, 20 wallclock secs ( 0.02 usr  0.01 sys + 0.35 cusr  0.80 csys =  1.18 CPU)
19      Result: PASS
2       # poll_query_until timed out executing this query:
2       # SELECT '0/0403F950' <= replay_lsn AND state = 'streaming'
2       #          FROM pg_catalog.pg_stat_replication
2       #          WHERE application_name IN ('standby', 'walreceiver')
2       # expecting this output:
2       # t
2       # last actual query output:
2       # f
2       # with stderr:
...

035_standby_logical_decoding_standby.log contains:
2026-03-03 20:34:14.198 EET startup[844699] LOG:  invalidating obsolete replication slot "row_removal_inactiveslot"
2026-03-03 20:34:14.198 EET startup[844699] DETAIL:  The slot conflicted with xid horizon 748.
2026-03-03 20:34:14.198 EET startup[844699] CONTEXT:  WAL redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: 
snapshotConflictHorizon: 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, dead: [33, 34, 35], 
unused: [36]; blkref #0: rel 1663/16384/16418, blk 10
2026-03-03 20:34:14.198 EET startup[844699] LOG:  terminating process 845647 to release replication slot 
"row_removal_activeslot"
2026-03-03 20:34:14.198 EET startup[844699] DETAIL:  The slot conflicted with xid horizon 748.
2026-03-03 20:34:14.198 EET startup[844699] CONTEXT:  WAL redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: 
snapshotConflictHorizon: 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, dead: [33, 34, 35], 
unused: [36]; blkref #0: rel 1663/16384/16418, blk 10
2026-03-03 20:34:14.198 EET startup[844699] LOG:  XXX: SendProcSignal sending SIGUSR1 to pid 845647
2026-03-03 20:34:14.198 EET startup[844699] CONTEXT:  WAL redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: 
snapshotConflictHorizon: 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, dead: [33, 34, 35], 
unused: [36]; blkref #0: rel 1663/16384/16418, blk 10
2026-03-03 20:37:26.827 EET walreceiver[844817] FATAL:  could not receive data from WAL stream: server closed the 
connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.

The full logs are attached too. I can add any other logging you want.

Best regards,
Alexander
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pg_plan_advice (now with transparent SQL plan performance overrides - pg_stash_advice)
Next
From: "David G. Johnston"
Date:
Subject: Re: pg_plan_advice (now with transparent SQL plan performance overrides - pg_stash_advice)