Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New strategies for freezing, advancing relfrozenxid early
Date
Msg-id CAH2-Wzn9MquY1=msQUaS9Rj0HMGfgZisCCoVdc38T=AZM_ZV9w@mail.gmail.com
Whole thread Raw
In response to Re: New strategies for freezing, advancing relfrozenxid early  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New strategies for freezing, advancing relfrozenxid early
List pgsql-hackers
On Thu, Sep 8, 2022 at 1:23 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v3. There is a new patch included here -- v3-0004-*patch,
> or "Unify aggressive VACUUM with antiwraparound VACUUM". No other
> notable changes.
>
> I decided to work on this now because it seems like it might give a
> more complete picture of the high level direction that I'm pushing
> towards. Perhaps this will make it easier to review the patch series
> as a whole, even.

This needed to be rebased over the guc.c work recently pushed to HEAD.

Attached is v4. This isn't just to fix bitrot, though; I'm also
including one new patch -- v4-0006-*.patch. This small patch teaches
VACUUM to size dead_items while capping the allocation at the space
required for "scanned_pages * MaxHeapTuplesPerPage" item pointers. In
other words, we now use scanned_pages instead of rel_pages to cap the
size of dead_items, potentially saving quite a lot of memory. There is
no possible downside to this approach, because we already know exactly
how many pages will be scanned from the VM snapshot -- there is zero
added risk of a second pass over the indexes.

This is still only scratching the surface of what is possible with
dead_items. The visibility map snapshot concept can enable a far more
sophisticated approach to resource management in vacuumlazy.c. It
could help us to replace a simple array of item pointers (the current
dead_items array) with a faster and more space-efficient data
structure. Masahiko Sawada has done a lot of work on this recently, so
this may interest him.

We don't just have up-front knowledge of the total number of
scanned_pages with VM snapshots -- we also have up-front knowledge of
which specific pages will be scanned. So we have reliable information
about the final distribution of dead_items (which specific heap blocks
might have dead_items) right from the start. While this extra
information/context is not a totally complete picture, it still seems
like it could be very useful as a way of driving how some new
dead_items data structure compresses TIDs. That will depend on the
distribution of TIDs -- the final "heap TID key space".

VM snapshots could also make it practical for the new data structure
to spill to disk to avoid multiple index scans/passed by VACUUM.
Perhaps this will result in behavior that's similar to how hash joins
spill to disk -- having 90% of the memory required to do everything
in-memory *usually* has similar performance characteristics to just
doing everything in memory. Most individual TID lookups from
ambulkdelete() will find that the TID *doesn't* need to be deleted --
a little like a hash join with low join selectivity (the common case
for hash joins). It's not like a merge join + sort, where we must
either spill everything or nothing (a merge join can be better than a
hash join with high join selectivity).

-- 
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Making pg_rewind faster
Next
From: Alvaro Herrera
Date:
Subject: wrong shell trap