Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoDy2rmw7dGaGoFqkzxsd+kFg2ttz-bV1D8i=epCXrfWWA@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Mon, Jul 26, 2021 at 1:07 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > > Hi, > > I've dreamed to write more compact structure for vacuum for three > years, but life didn't give me a time to. > > Let me join to friendly competition. > > I've bet on HATM approach: popcount-ing bitmaps for non-empty elements. Thank you for proposing the new idea! > > Novelties: > - 32 consecutive pages are stored together in a single sparse array > (called "chunks"). > Chunk contains: > - its number, > - 4 byte bitmap of non-empty pages, > - array of non-empty page headers 2 byte each. > Page header contains offset of page's bitmap in bitmaps container. > (Except if there is just one dead tuple in a page. Then it is > written into header itself). > - container of concatenated bitmaps. > > Ie, page metadata overhead varies from 2.4byte (32pages in single > chunk) > to 18byte (1 page in single chunk) per page. > > - If page's bitmap is sparse ie contains a lot of "all-zero" bytes, > it is compressed by removing zero byte and indexing with two-level > bitmap index. > Two-level index - zero bytes in first level are removed using > second level. It is mostly done for 32kb pages, but let it stay since > it is almost free. > > - If page's bitmaps contains a lot of "all-one" bytes, it is inverted > and then encoded as sparse. > > - Chunks are allocated with custom "allocator" that has no > per-allocation overhead. It is possible because there is no need > to perform "free": allocator is freed as whole at once. > > - Array of pointers to chunks is also bitmap indexed. It saves cpu time > when not every 32 consecutive pages has at least one dead tuple. > But consumes time otherwise. Therefore additional optimization is > added > to quick skip lookup for first non-empty run of chunks. > (Ahhh, I believe this explanation is awful). It sounds better than my proposal. > > Andres Freund wrote 2021-07-20 02:49: > > Hi, > > > > On 2021-07-19 15:20:54 +0900, Masahiko Sawada wrote: > >> BTW is the implementation of the radix tree approach available > >> somewhere? If so I'd like to experiment with that too. > >> > >> > > >> > I have toyed with implementing adaptively large radix nodes like > >> > proposed in https://db.in.tum.de/~leis/papers/ART.pdf - but haven't > >> > gotten it quite working. > >> > >> That seems promising approach. > > > > I've since implemented some, but not all of the ideas of that paper > > (adaptive node sizes, but not the tree compression pieces). > > > > E.g. for > > > > select prepare( > > 1000000, -- max block > > 20, -- # of dead tuples per page > > 10, -- dead tuples interval within a page > > 1 -- page inteval > > ); > > attach size shuffled ordered > > array 69 ms 120 MB 84.87 s 8.66 s > > intset 173 ms 65 MB 68.82 s 11.75 s > > rtbm 201 ms 67 MB 11.54 s 1.35 s > > tbm 232 ms 100 MB 8.33 s 1.26 s > > vtbm 162 ms 58 MB 10.01 s 1.22 s > > radix 88 ms 42 MB 11.49 s 1.67 s > > > > and for > > select prepare( > > 1000000, -- max block > > 10, -- # of dead tuples per page > > 1, -- dead tuples interval within a page > > 1 -- page inteval > > ); > > > > attach size shuffled ordered > > array 24 ms 60MB 3.74s 1.02 s > > intset 97 ms 49MB 3.14s 0.75 s > > rtbm 138 ms 36MB 0.41s 0.14 s > > tbm 198 ms 101MB 0.41s 0.14 s > > vtbm 118 ms 27MB 0.39s 0.12 s > > radix 33 ms 10MB 0.28s 0.10 s > > > > (this is an almost unfairly good case for radix) > > > > Running out of time to format the results of the other testcases before > > I have to run, unfortunately. radix uses 42MB both in test case 3 and > > 4. > > My results (Ubuntu 20.04 Intel Core i7-1165G7): > > Test1. > > select prepare(1000000, 10, 20, 1); -- original > > attach size shuffled > array 29ms 60MB 93.99s > intset 93ms 49MB 80.94s > rtbm 171ms 67MB 14.05s > tbm 238ms 100MB 8.36s > vtbm 148ms 59MB 9.12s > radix 100ms 42MB 11.81s > svtm 75ms 29MB 8.90s > > select prepare(1000000, 20, 10, 1); -- Andres's variant > > attach size shuffled > array 61ms 120MB 111.91s > intset 163ms 66MB 85.00s > rtbm 236ms 67MB 10.72s > tbm 290ms 100MB 8.40s > vtbm 190ms 59MB 9.28s > radix 117ms 42MB 12.00s > svtm 98ms 29MB 8.77s > > Test2. > > select prepare(1000000, 10, 1, 1); > > attach size shuffled > array 31ms 60MB 4.68s > intset 97ms 49MB 4.03s > rtbm 163ms 36MB 0.42s > tbm 240ms 100MB 0.42s > vtbm 136ms 27MB 0.36s > radix 60ms 10MB 0.72s > svtm 39ms 6MB 0.19s > > (Bad radix result probably due to smaller cache in notebook's CPU ?) > > Test3 > > select prepare(1000000, 2, 100, 1); > > attach size shuffled > array 6ms 12MB 53.42s > intset 23ms 16MB 54.99s > rtbm 115ms 38MB 8.19s > tbm 186ms 100MB 8.37s > vtbm 105ms 59MB 9.08s > radix 64ms 42MB 10.41s > svtm 73ms 10MB 7.49s > > Test4 > > select prepare(1000000, 100, 1, 1); > > attach size shuffled > array 304ms 600MB 75.12s > intset 775ms 98MB 47.49s > rtbm 356ms 38MB 4.11s > tbm 539ms 100MB 4.20s > vtbm 493ms 42MB 4.44s > radix 263ms 42MB 6.05s > svtm 360ms 8MB 3.49s > > Therefore Specialized Vaccum Tid Map always consumes least memory amount > and usually faster. I'll experiment with the proposed ideas including this idea in more scenarios and share the results tomorrow. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: