Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoAXkuV7+57AOZXDkVUDHu6zat_cqAKcXn2Te0j=18yhFw@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sat, Jul 10, 2021 at 2:17 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote:
> > Currently, the TIDs of dead tuples are stored in an array that is
> > collectively allocated at the start of lazy vacuum and TID lookup uses
> > bsearch(). There are the following challenges and limitations:
>
> > So I prototyped a new data structure dedicated to storing dead tuples
> > during lazy vacuum while borrowing the idea from Roaring Bitmap[2].
> > The authors provide an implementation of Roaring Bitmap[3]  (Apache
> > 2.0 license). But I've implemented this idea from scratch because we
> > need to integrate it with Dynamic Shared Memory/Area to support
> > parallel vacuum and need to support ItemPointerData, 6-bytes integer
> > in total, whereas the implementation supports only 4-bytes integers.
> > Also, when it comes to vacuum, we neither need to compute the
> > intersection, the union, nor the difference between sets, but need
> > only an existence check.
> >
> > The data structure is somewhat similar to TIDBitmap. It consists of
> > the hash table and the container area; the hash table has entries per
> > block and each block entry allocates its memory space, called a
> > container, in the container area to store its offset numbers. The
> > container area is actually an array of bytes and can be enlarged as
> > needed. In the container area, the data representation of offset
> > numbers varies depending on their cardinality. It has three container
> > types: array, bitmap, and run.
>
> How are you thinking of implementing iteration efficiently for rtbm? The
> second heap pass needs that obviously... I think the only option would
> be to qsort the whole thing?

Yes, I'm thinking that the iteration of rtbm is somewhat similar to
tbm. That is, we iterate and collect hash table entries and do qsort
hash entries by the block number. Then fetch the entry along with its
container one by one in order of the block number.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors