Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Date
Msg-id CAH2-Wz=tHyPne5Srw8UPHUJM02ifXOTM5JHbBY48nxXtXLhZqg@mail.gmail.com
Whole thread Raw
In response to Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Wed, Nov 20, 2024 at 4:41 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > On 15 Nov 2024, at 21:33, Peter Geoghegan <pg@bowt.ie> wrote:
> > I propose this for the master branch only.
>
> The change seems correct to me: anyway cycle must be less than cycle of any future vacuum after promotion.

The cycles set in the page special area during page splits that happen
to run while a VACUUM also runs must  use that same VACUUM's cycle ID
(which is stored in shared memory for the currently running VACUUM).
That way the VACUUM will know when it must backtrack later on, to
avoid missing index tuples that it is expected to remove.

It doesn't matter if the cycle_id that VACUUM sees is less than or
greater than its own one -- only that it matches its own one when it
needs to match to get correct behavior from VACUUM. (Though it's also
possible to get a false positive, in rare cases where we get unlucky
and there's a collision.  This might waste cycles within VACUUM, but
it shouldn't lead to truly incorrect behavior.)

> I cannot say anything about beauty of resetting or not resetting the field.

The main reason to do this is simple: we should make REDO routines
match original execution, unless there is a very good reason not to do
so -- original execution is "authoritative". There's no good reason
not to do so here.

There is a secondary reason to do things this way, though: resetting
the cycle ID within the REDO routine can save future work from within
btvacuumpage, after a standby gets promoted, when nbtree VACUUM runs
against an affected index for the first time. Note that nbtree VACUUM
goes out of its way to clear an old cycle ID in passing, even when it
has nothing else to do on the page. And so by taking care of that on
the REDO side, right from the start, the newly promoted standby won't
need to dirty so many pages during the first nbtree VACUUM after it is
promoted.

(Actually, it's worse than that -- we're not just dirtying pages these
days. It's quite likely that we'll need to write a WAL record to clear
the cycle ID, since page-level checksums are enabled by default.)

> I'd suggest renaming the field into something like "btpo_split_vaccycleid". I was aware of index vacuum backtracking,
butit took me a while to build context again. 

I'm not inclined to change it now. It's been like this for many years now.

Pushed this patch just now. Thanks for the review!

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Make tuple deformation faster
Next
From: Peter Geoghegan
Date:
Subject: Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID