Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoCiiKz_sxNFAvKY_--7-O6ioOwr6Rf0aL71cQvccipiPQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum |
List | pgsql-hackers |
On Mon, Nov 21, 2022 at 6:30 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > On Mon, Nov 21, 2022 at 3:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Nov 21, 2022 at 4:20 PM John Naylor > > <john.naylor@enterprisedb.com> wrote: > > > > Assuming the smallest node is fixed size (i.e. fanout/capacity member not part of the common set, so only part of variable-sizednodes), 3 has a nice property: no wasted padding space: > > > > > > node4: 5 + 4+(7) + 4*8 = 48 bytes > > > node3: 5 + 3 + 3*8 = 32 > > > > IIUC if we store the fanout member only in variable-sized nodes, > > rt_node has only count, shift, and chunk, so 4 bytes in total. If so, > > the size of node3 (ie. fixed-sized node) is (4 + 3 + (1) + 3*8)? The > > size doesn't change but there is 1 byte padding space. > > I forgot to mention I'm assuming no pointer-tagging for this exercise. You've demonstrated it can be done in a small amountof code, and I hope we can demonstrate a speedup in search. Just in case there is some issue with portability, valgrind,or some other obstacle, I'm being pessimistic in my calculations. > > > Also, even if we have the node3 a variable-sized node, size class 1 > > for node3 could be a good choice since it also doesn't need padding > > space and could be a good alternative to path compression. > > > > node3 : 5 + 3 + 3*8 = 32 bytes > > size class 1 : 5 + 3 + 1*8 = 16 bytes > > Precisely! I have that scenario in my notes as well -- it's quite compelling. So it seems that there are two candidates of rt_node structure: (1) all nodes except for node256 are variable-size nodes and use pointer tagging, and (2) node32 and node128 are variable-sized nodes and do not use pointer tagging (fanout is in part of only these two nodes). rt_node can be 5 bytes in both cases. But before going to this step, I started to verify the idea of variable-size nodes by using 6-bytes rt_node. We can adjust the node kinds and node classes later. In this verification, I have all nodes except for node256 variable-sized nodes, and the sizes are: radix tree node 1 : 6 + 4 + (6) + 1*8 = 24 bytes radix tree node 4 : 6 + 4 + (6) + 4*8 = 48 radix tree node 15 : 6 + 32 + (2) + 15*8 = 160 radix tree node 32 : 6 + 32 + (2) + 32*8 = 296 radix tree node 61 : inner 6 + 256 + (2) + 61*8 = 752, leaf 6 + 256 + (2) + 16 + 61*8 = 768 radix tree node 128 : inner 6 + 256 + (2) + 128*8 = 1288, leaf 6 + 256 + (2) + 16 + 128*8 = 1304 radix tree node 256 : inner 6 + (2) + 256*8 = 2056, leaf 6 + (2) + 32 + 256*8 = 2088 I did some performance tests against two radix trees: a radix tree supporting only fixed-size nodes (i.e. applying up to 0003 patch), and a radix tree supporting variable-size nodes (i.e. applying all attached patches). Also, I changed bench_search_random_nodes() function so that we can specify the filter via a function argument. Here are results: Here are results: * Query select * from bench_seq_search(0, 1*1000*1000, false) * Fixed-size NOTICE: num_keys = 1000000, height = 2, n4 = 0, n32 = 31251, n128 = 1, n256 = 122 nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms ---------+------------------+---------------------+------------+---------------+--------------+----------------- 1000000 | 9871216 | | 67 | | 212 | (1 row) * Variable-size NOTICE: num_keys = 1000000, height = 2, n1 = 0, n4 = 0, n15 = 0, n32 = 31251, n61 = 0, n128 = 1, n256 = 122 nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms ---------+------------------+---------------------+------------+---------------+--------------+----------------- 1000000 | 9871280 | | 74 | | 212 | (1 row) --- * Query select * from bench_seq_search(0, 2*1000*1000, true) NOTICE: num_keys = 999654, height = 2, n4 = 1, n32 = 62499, n128 = 1, n256 = 245 * Fixed-size nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms --------+------------------+---------------------+------------+---------------+--------------+----------------- 999654 | 19680848 | | 74 | | 201 | (1 row) * Variable-size NOTICE: num_keys = 999654, height = 2, n1 = 0, n4 = 1, n15 = 26951, n32 = 35548, n61 = 1, n128 = 0, n256 = 245 nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms --------+------------------+---------------------+------------+---------------+--------------+----------------- 999654 | 16009040 | | 85 | | 201 | (1 row) --- * Query select * from bench_search_random_nodes(10 * 1000 * 1000, '0x7F07FF00FF') * Fixed-size NOTICE: num_keys = 9291812, height = 4, n4 = 262144, n32 = 79603, n128 = 182670, n256 = 1024 mem_allocated | search_ms ---------------+----------- 343001456 | 1151 (1 row) * Variable-size NOTICE: num_keys = 9291812, height = 4, n1 = 262144, n4 = 0, n15 = 138, n32 = 79465, n61 = 182665, n128 = 5, n256 = 1024 mem_allocated | search_ms ---------------+----------- 230504328 | 1077 (1 row) --- * Query select * from bench_search_random_nodes(10 * 1000 * 1000, '0xFFFF0000003F') * Fixed-size NOTICE: num_keys = 3807650, height = 5, n4 = 196608, n32 = 0, n128 = 65536, n256 = 257 mem_allocated | search_ms ---------------+----------- 99911920 | 632 (1 row) * Variable-size NOTICE: num_keys = 3807650, height = 5, n1 = 196608, n4 = 0, n15 = 0, n32 = 0, n61 = 61747, n128 = 3789, n256 = 257 mem_allocated | search_ms ---------------+----------- 64045688 | 554 (1 row) Overall, the idea of variable-sized nodes is good, smaller size without losing search performance. I'm going to check the load performance as well. I've attached the patches I used for the verification. I don't include patches for pointer tagging, DSA support, and vacuum integration since I'm investigating the issue on cfbot that Andres reported. Also, I've modified tests to improve the test coverage. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: