Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoDFaBQrxCQpLetqeoiJCQn7N1G+Sf+QZQy0sn6Wa8mSsg@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum |
List | pgsql-hackers |
Hi, On Sun, Feb 13, 2022 at 12:39 PM Andres Freund <andres@anarazel.de> wrote: > > On 2022-02-13 12:36:13 +0900, Masahiko Sawada wrote: > > Actually, I'm working on simplifying and improving radix tree > > implementation for PG16 dev cycle. From the discussion so far I think > > it's better to have a data structure that can be used for > > general-purpose and is also good for storing TID, not very specific to > > store TID. So I think radix tree would be a potent candidate. I have > > done the insertion and search implementation. > > Awesome! To move this project forward, I've implemented radix tree implementation from scratch while studying Andres's implementation. It supports insertion, search, and iteration but not deletion yet. In my implementation, I use Datum as the value so internal and lead nodes have the same data structure, simplifying the implementation. The iteration on the radix tree returns keys with the value in ascending order of the key. The patch has regression tests for radix tree but is still in PoC state: left many debugging codes, not supported SSE2 SIMD instructions, added -mavx2 flag is hard-coded. I've measured the size and loading and lookup performance for each candidate data structure with two test cases: dense and sparse, by using the test tool[1]. Here are the results: * Case1 - Dense (simulating the case where there are 1000 consecutive pages each of which has 100 dead tuples, at 100 page intervals.) select prepare( 1000000, -- max block 100, -- # of dead tuples per page 1, -- dead tuples interval within a page 1000, -- # of consecutive pages having dead tuples 1100 -- page interval ); name size attach lookup array 520 MB 248.60 ms 89891.92 ms hash 3188 MB 28029.59 ms 50850.32 ms intset 85 MB 644.96 ms 39801.17 ms tbm 96 MB 474.06 ms 6641.38 ms radix 37 MB 173.03 ms 9145.97 ms radix_tree 36 MB 184.51 ms 9729.94 ms * Case2 - Sparse (simulating a case where there are pages that have 2 dead tuples every 1000 pages.) select prepare( 10000000, -- max block 2, -- # of dead tuples per page 50, -- dead tuples interval within a page 1, -- # of consecutive pages having dead tuples 1000 -- page interval ); name size attach lookup array 125 kB 0.53 ms 82183.61 ms hash 1032 kB 1.31 ms 28128.33 ms intset 222 kB 0.51 ms 87775.68 ms tbm 768 MB 1.24 ms 98674.60 ms radix 1080 kB 1.66 ms 20698.07 ms radix_tree 949 kB 1.50 ms 21465.23 ms Each test virtually generates TIDs and loads them to the data structure, and then searches for virtual index TIDs. 'array' is a sorted array which is the current method, 'hash' is HTAB, 'intset' is IntegerSet, and 'tbm' is TIDBitmap. The last two results are radix tree implementations: 'radix' is Andres's radix tree implementation and 'radix_tree' is my radix tree implementation. In both radix tree tests, I converted TIDs into an int64 and store the lower 6 bits in the value part of the radix tree. Overall, radix tree implementations have good numbers. Once we got an agreement on moving in this direction, I'll start a new thread for that and move the implementation further; there are many things to do and discuss: deletion, API design, SIMD support, more tests etc. Regards, [1] https://github.com/MasahikoSawada/pgtools/tree/master/bdbench [2] https://www.postgresql.org/message-id/CAFiTN-visUO9VTz2%2Bh224z5QeUjKhKNdSfjaCucPhYJdbzxx0g%40mail.gmail.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: