Home > mailing lists

Re: Refactor recovery conflict signaling a little - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Refactor recovery conflict signaling a little
Date	March 3 18:39:27
Msg-id	1f354762-c6a8-46a5-9626-efeb45f5f112@iki.fi Whole thread Raw
In response to	Re: Refactor recovery conflict signaling a little (Alexander Lakhin <exclusion@gmail.com>)
Responses	Re: Refactor recovery conflict signaling a little
List	pgsql-hackers

Tree view

On 24/02/2026 10:00, Alexander Lakhin wrote:
> Could you please look at a new failure of 035_standby_logical_decoding.pl
> produced at buildfarm [1]?:
> [05:34:51.345](233.134s) # poll_query_until timed out executing this query:
> # SELECT '0/0403F950' <= replay_lsn AND state = 'streaming'
> #          FROM pg_catalog.pg_stat_replication
> #          WHERE application_name IN ('standby', 'walreceiver')
> # expecting this output:
> # t
> # last actual query output:
> # f
> # with stderr:
> 
> 035_standby_logical_decoding_standby.log contains:
> 2026-02-24 05:30:58.300 CET [1512377][client backend][:0] LOG: 
> disconnection: session time: 0:00:00.035 user=bf database=testdb 
> host=[local]
> 2026-02-24 05:30:58.425 CET [1507982][startup][32/0:0] LOG: invalidating 
> obsolete replication slot "row_removal_inactiveslot"
> 2026-02-24 05:30:58.425 CET [1507982][startup][32/0:0] DETAIL:  The slot 
> conflicted with xid horizon 748.
> 2026-02-24 05:30:58.425 CET [1507982][startup][32/0:0] CONTEXT:  WAL 
> redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 
> 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, 
> dead: [33, 34, 35], unused: [36]; blkref #0: rel 1663/16384/16418, blk 10
> 2026-02-24 05:30:58.425 CET [1507982][startup][32/0:0] LOG: terminating 
> process 1512360 to release replication slot "row_removal_activeslot"
> 2026-02-24 05:30:58.425 CET [1507982][startup][32/0:0] DETAIL:  The slot 
> conflicted with xid horizon 748.
> 2026-02-24 05:30:58.425 CET [1507982][startup][32/0:0] CONTEXT:  WAL 
> redo at 0/040214F0 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 
> 748, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 3, nunused: 1, 
> dead: [33, 34, 35], unused: [36]; blkref #0: rel 1663/16384/16418, blk 10
> 2026-02-24 05:34:51.400 CET [1508227][walreceiver][:0] FATAL:  could not 
> receive data from WAL stream: server closed the connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.
> 
> The "terminating process ..." message doesn't appear when the test passes
> successfully.

Hmm, right, looks like something wrong in signaling the recovery 
conflict. I can't tell if the signal is being sent, or it's not 
processed correctly. Looking at the code, I don't see anything wrong.

> I've managed to reproduce this with multiple (20) test instances running
> in a loop (it failed within 10 iterations for me); `git bisect` for this
> anomaly pointed at 17f51ea81.

I've been trying to reproduce this locally, but so far not success, 
after thousands of iterations.

If you can still reproduce this, can you try it with code changes from 
the attached recovery-conflict-fail-extra-logging.patch, which adds some 
extra logging, and send over the logs please?

The recovery-conflict-fail-repro-attempt.patch contains very hacky 
changes to the test, to run the just the failing part 100 times in a 
loop. That's just to show what I used to try to reproduce this, but no luck.

- Heikki

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 03 March, 18:31:06
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From: Shlok Kyal
Date: 03 March, 18:46:44
Subject: Re: Skipping schema changes in publication

Re: Refactor recovery conflict signaling a little - Mailing list pgsql-hackers

Attachment

Previous

Next