Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAH2-WznrV9o9TNaQg3MUFso772m5gA_9D6oLcs_AAkTgdUVtNw@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Fri, Nov 4, 2022 at 8:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> For parallel heap pruning, multiple workers will insert key-value
> pairs to the radix tree concurrently. The simplest solution would be a
> single lock to protect writes but the performance will not be good.
> Another solution would be that we can divide the tables into multiple
> ranges so that keys derived from TIDs are not conflicted with each
> other and have parallel workers process one or more ranges. That way,
> parallel vacuum workers can build *sub-trees* and the leader process
> can merge them. In use cases of lazy vacuum, since the write phase and
> read phase are separated the readers don't need to worry about
> concurrent updates.

I think that the VM snapshot concept can eventually be used to
implement parallel heap pruning. Since every page that will become a
scanned_pages is known right from the start with VM snapshots, it will
be relatively straightforward to partition these pages into distinct
ranges with an equal number of pages, one per worker planned. The VM
snapshot structure can also be used for I/O prefetching, which will be
more important with parallel heap pruning (and with aio).

Working off of an immutable structure that describes which pages to
process right from the start is naturally easy to work with, in
general. We can "reorder work" flexibly (i.e. process individual
scanned_pages in any order that is convenient). Another example is
"changing our mind" about advancing relfrozenxid when it turns out
that we maybe should have decided to do that at the start of VACUUM
[1]. Maybe the specific "changing our mind" idea will turn out to not
be a very useful idea, but it is at least an interesting and thought
provoking concept.

[1] https://postgr.es/m/CAH2-WzkQ86yf==mgAF=cQ0qeLRWKX3htLw9Qo+qx3zbwJJkPiQ@mail.gmail.com
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Non-emergency patch for bug #17679
Next
From: Tom Lane
Date:
Subject: Out-of-date clang/llvm version lists in PGAC_LLVM_SUPPORT