Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date
Msg-id CAH2-WzkZUDr8TQDPuax1SmJg9B5yz-Qhr7NdoQJD5PpXLAUA7Q@mail.gmail.com
Whole thread Raw
In response to Vacuum ERRORs out considering freezing dead tuples from before OldestXmin  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
List pgsql-hackers
On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> If vacuum fails to remove a tuple with xmax older than
> VacuumCutoffs->OldestXmin and younger than
> GlobalVisState->maybe_needed, it will ERROR out when determining
> whether or not to freeze the tuple with "cannot freeze committed
> xmax".
>
> In back branches starting with 14, failing to remove tuples older than
> OldestXmin during pruning caused vacuum to infinitely loop in
> lazy_scan_prune(), as investigated on this [1] thread.

This is a great summary.

> We can fix this by always removing tuples considered dead before
> VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby
> has a transaction that sees that tuple as alive, because it will
> simply wait to replay the removal until it would be correct to do so
> or recovery conflict handling will cancel the transaction that sees
> the tuple as alive and allow replay to continue.

I think that this is the right general approach.

> The repro forces a round of index vacuuming after the standby
> reconnects and before pruning a dead tuple whose xmax is older than
> OldestXmin.
>
> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> calls GetOldestNonRemovableTransactionId(), thereby updating the
> backend's GlobalVisState and moving maybe_needed backwards.

Right. I saw details exactly consistent with this when I used GDB
against a production instance.

I'm glad that you were able to come up with a repro that involves
exactly the same basic elements, including index page deletion.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: PG 17 and GUC variables
Next
From: Amit Langote
Date:
Subject: Re: ON ERROR in json_query and the like