Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoCKMqgUv7+tCFpAk1DF0a3h=6ysZDLOoCS8=8aOZm-BSA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Jul 8, 2021 at 7:51 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Jul 7, 2021 at 1:24 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > I wonder how much it would help to break up that loop into two loops.
> > Make the callback into a batch operation that generates state that
> > describes what to do with each and every index tuple on the leaf page.
> > The first loop would build a list of TIDs, then you'd call into
> > vacuumlazy.c and get it to process the TIDs, and finally the second
> > loop would physically delete the TIDs that need to be deleted. This
> > would mean that there would be only one call per leaf page per
> > btbulkdelete(). This would reduce the number of calls to the callback
> > by at least 100x, and maybe more than 1000x.
>
> Maybe for something like rtbm.c (which is inspired by Roaring
> bitmaps), you would really want to use an "intersection" operation for
> this. The TIDs that we need to physically delete from the leaf page
> inside btvacuumpage() are the intersection of two bitmaps: our bitmap
> of all TIDs on the leaf page, and our bitmap of all TIDs that need to
> be deleting by the ongoing btbulkdelete() call.

Agreed. In such a batch operation, what we need to do here is to
compute the intersection of two bitmaps.

>
> Obviously the typical case is that most TIDs in the index do *not* get
> deleted -- needing to delete more than ~20% of all TIDs in the index
> will be rare. Ideally it would be very cheap to figure out that a TID
> does not need to be deleted at all. Something a little like a negative
> cache (but not a true negative cache). This is a little bit like how
> hash joins can be made faster by adding a Bloom filter -- most hash
> probes don't need to join a tuple in the real world, and we can make
> these hash probes even faster by using a Bloom filter as a negative
> cache.

Agreed.

>
> If you had the list of TIDs from a leaf page sorted for batch
> processing, and if you had roaring bitmap style "chunks" with
> "container" metadata stored in the data structure, you could then use
> merging/intersection -- that has some of the same advantages. I think
> that this would be a lot more efficient than having one binary search
> per TID. Most TIDs from the leaf page can be skipped over very
> quickly, in large groups. It's very rare for VACUUM to need to delete
> TIDs from completely random heap table blocks in the real world (some
> kind of pattern is much more common).
>
> When this merging process finds 1 TID that might really be deletable
> then it's probably going to find much more than 1 -- better to make
> that cache miss take care of all of the TIDs together. Also seems like
> the CPU could do some clever prefetching with this approach -- it
> could prefetch TIDs where the initial chunk metadata is insufficient
> to eliminate them early -- these are the groups of TIDs that will have
> many TIDs that we actually need to delete. ISTM that improving
> temporal locality through batching could matter a lot here.

That's a promising approach.

In rtbm, the pair of one hash entry and one container is used per
block. Therefore, we can skip TID from the leaf page by checking the
hash table, if there is no dead tuple in the block. If there is the
hash entry, since it means the block has at least one dead tuple, we
can look for the offset of TID from the leaf page from the container.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Domingo Alvarez Duarte
Date:
Subject: Re: Grammar railroad diagram
Next
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum