Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoDFaBQrxCQpLetqeoiJCQn7N1G+Sf+QZQy0sn6Wa8mSsg@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
Hi,

On Sun, Feb 13, 2022 at 12:39 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-02-13 12:36:13 +0900, Masahiko Sawada wrote:
> > Actually, I'm working on simplifying and improving radix tree
> > implementation for PG16 dev cycle. From the discussion so far I think
> > it's better to have a data structure that can be used for
> > general-purpose and is also good for storing TID, not very specific to
> > store TID. So I think radix tree would be a potent candidate. I have
> > done the insertion and search implementation.
>
> Awesome!

To move this project forward, I've implemented radix tree
implementation from scratch while studying Andres's implementation. It
supports insertion, search, and iteration but not deletion yet. In my
implementation, I use Datum as the value so internal and lead nodes
have the same data structure, simplifying the implementation. The
iteration on the radix tree returns keys with the value in ascending
order of the key. The patch has regression tests for radix tree but is
still in PoC state: left many debugging codes, not supported SSE2 SIMD
instructions, added -mavx2 flag is hard-coded.

I've measured the size and loading and lookup performance for each
candidate data structure with two test cases: dense and sparse, by
using the test tool[1]. Here are the results:

* Case1 - Dense (simulating the case where there are 1000 consecutive
pages each of which has 100 dead tuples, at 100 page intervals.)
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within  a page
1000, -- # of consecutive pages having dead tuples
1100 -- page interval
);

name                 size            attach             lookup
array           520 MB      248.60 ms   89891.92 ms
hash         3188 MB  28029.59 ms   50850.32 ms
intset           85 MB       644.96 ms   39801.17 ms
tbm              96 MB       474.06 ms     6641.38 ms
radix            37 MB       173.03 ms     9145.97 ms
radix_tree    36 MB       184.51 ms     9729.94 ms

* Case2 - Sparse (simulating a case where there are pages that have 2
dead tuples every 1000 pages.)
select prepare(
10000000, -- max block
2, -- # of dead tuples per page
50, -- dead tuples interval within  a page
1, -- # of consecutive pages having dead tuples
1000 -- page interval
);

name                 size           attach             lookup
array              125 kB      0.53 ms    82183.61 ms
hash             1032 kB      1.31 ms   28128.33 ms
intset              222 kB      0.51 ms    87775.68 ms
tbm                768 MB      1.24 ms   98674.60 ms
radix             1080 kB      1.66 ms    20698.07 ms
radix_tree       949 kB      1.50 ms    21465.23 ms

Each test virtually generates TIDs and loads them to the data
structure, and then searches for virtual index TIDs.
'array' is a sorted array which is the current method, 'hash' is HTAB,
'intset' is IntegerSet, and 'tbm' is TIDBitmap. The last two results
are radix tree implementations: 'radix' is Andres's radix tree
implementation and 'radix_tree' is my radix tree implementation. In
both radix tree tests, I converted TIDs into an int64 and store the
lower 6 bits in the value part of the radix tree.

Overall, radix tree implementations have good numbers. Once we got an
agreement on moving in this direction, I'll start a new thread for
that and move the implementation further; there are many things to do
and discuss: deletion, API design, SIMD support, more tests etc.

Regards,

[1] https://github.com/MasahikoSawada/pgtools/tree/master/bdbench
[2] https://www.postgresql.org/message-id/CAFiTN-visUO9VTz2%2Bh224z5QeUjKhKNdSfjaCucPhYJdbzxx0g%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: waiting for reload in tests
Next
From: Andres Freund
Date:
Subject: Re: waiting for reload in tests