Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
| From | Masahiko Sawada |
|---|---|
| Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
| Date | |
| Msg-id | CAD21AoCiiKz_sxNFAvKY_--7-O6ioOwr6Rf0aL71cQvccipiPQ@mail.gmail.com Whole thread |
| In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
| Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum |
| List | pgsql-hackers |
On Mon, Nov 21, 2022 at 6:30 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Mon, Nov 21, 2022 at 3:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Nov 21, 2022 at 4:20 PM John Naylor
> > <john.naylor@enterprisedb.com> wrote:
>
> > > Assuming the smallest node is fixed size (i.e. fanout/capacity member not part of the common set, so only part of
variable-sizednodes), 3 has a nice property: no wasted padding space:
> > >
> > > node4: 5 + 4+(7) + 4*8 = 48 bytes
> > > node3: 5 + 3 + 3*8 = 32
> >
> > IIUC if we store the fanout member only in variable-sized nodes,
> > rt_node has only count, shift, and chunk, so 4 bytes in total. If so,
> > the size of node3 (ie. fixed-sized node) is (4 + 3 + (1) + 3*8)? The
> > size doesn't change but there is 1 byte padding space.
>
> I forgot to mention I'm assuming no pointer-tagging for this exercise. You've demonstrated it can be done in a small
amountof code, and I hope we can demonstrate a speedup in search. Just in case there is some issue with portability,
valgrind,or some other obstacle, I'm being pessimistic in my calculations.
>
> > Also, even if we have the node3 a variable-sized node, size class 1
> > for node3 could be a good choice since it also doesn't need padding
> > space and could be a good alternative to path compression.
> >
> > node3 : 5 + 3 + 3*8 = 32 bytes
> > size class 1 : 5 + 3 + 1*8 = 16 bytes
>
> Precisely! I have that scenario in my notes as well -- it's quite compelling.
So it seems that there are two candidates of rt_node structure: (1)
all nodes except for node256 are variable-size nodes and use pointer
tagging, and (2) node32 and node128 are variable-sized nodes and do
not use pointer tagging (fanout is in part of only these two nodes).
rt_node can be 5 bytes in both cases. But before going to this step, I
started to verify the idea of variable-size nodes by using 6-bytes
rt_node. We can adjust the node kinds and node classes later.
In this verification, I have all nodes except for node256
variable-sized nodes, and the sizes are:
radix tree node 1 : 6 + 4 + (6) + 1*8 = 24 bytes
radix tree node 4 : 6 + 4 + (6) + 4*8 = 48
radix tree node 15 : 6 + 32 + (2) + 15*8 = 160
radix tree node 32 : 6 + 32 + (2) + 32*8 = 296
radix tree node 61 : inner 6 + 256 + (2) + 61*8 = 752, leaf 6 +
256 + (2) + 16 + 61*8 = 768
radix tree node 128 : inner 6 + 256 + (2) + 128*8 = 1288, leaf 6 +
256 + (2) + 16 + 128*8 = 1304
radix tree node 256 : inner 6 + (2) + 256*8 = 2056, leaf 6 + (2) + 32
+ 256*8 = 2088
I did some performance tests against two radix trees: a radix tree
supporting only fixed-size nodes (i.e. applying up to 0003 patch), and
a radix tree supporting variable-size nodes (i.e. applying all
attached patches). Also, I changed bench_search_random_nodes()
function so that we can specify the filter via a function argument.
Here are results:
Here are results:
* Query
select * from bench_seq_search(0, 1*1000*1000, false)
* Fixed-size
NOTICE: num_keys = 1000000, height = 2, n4 = 0, n32 = 31251, n128 =
1, n256 = 122
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
---------+------------------+---------------------+------------+---------------+--------------+-----------------
1000000 | 9871216 | | 67 |
| 212 |
(1 row)
* Variable-size
NOTICE: num_keys = 1000000, height = 2, n1 = 0, n4 = 0, n15 = 0, n32
= 31251, n61 = 0, n128 = 1, n256 = 122
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
---------+------------------+---------------------+------------+---------------+--------------+-----------------
1000000 | 9871280 | | 74 |
| 212 |
(1 row)
---
* Query
select * from bench_seq_search(0, 2*1000*1000, true)
NOTICE: num_keys = 999654, height = 2, n4 = 1, n32 = 62499, n128 = 1,
n256 = 245
* Fixed-size
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
--------+------------------+---------------------+------------+---------------+--------------+-----------------
999654 | 19680848 | | 74 |
| 201 |
(1 row)
* Variable-size
NOTICE: num_keys = 999654, height = 2, n1 = 0, n4 = 1, n15 = 26951,
n32 = 35548, n61 = 1, n128 = 0, n256 = 245
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
--------+------------------+---------------------+------------+---------------+--------------+-----------------
999654 | 16009040 | | 85 |
| 201 |
(1 row)
---
* Query
select * from bench_search_random_nodes(10 * 1000 * 1000, '0x7F07FF00FF')
* Fixed-size
NOTICE: num_keys = 9291812, height = 4, n4 = 262144, n32 = 79603,
n128 = 182670, n256 = 1024
mem_allocated | search_ms
---------------+-----------
343001456 | 1151
(1 row)
* Variable-size
NOTICE: num_keys = 9291812, height = 4, n1 = 262144, n4 = 0, n15 =
138, n32 = 79465, n61 = 182665, n128 = 5, n256 = 1024
mem_allocated | search_ms
---------------+-----------
230504328 | 1077
(1 row)
---
* Query
select * from bench_search_random_nodes(10 * 1000 * 1000, '0xFFFF0000003F')
* Fixed-size
NOTICE: num_keys = 3807650, height = 5, n4 = 196608, n32 = 0, n128 =
65536, n256 = 257
mem_allocated | search_ms
---------------+-----------
99911920 | 632
(1 row)
* Variable-size
NOTICE: num_keys = 3807650, height = 5, n1 = 196608, n4 = 0, n15 = 0,
n32 = 0, n61 = 61747, n128 = 3789, n256 = 257
mem_allocated | search_ms
---------------+-----------
64045688 | 554
(1 row)
Overall, the idea of variable-sized nodes is good, smaller size
without losing search performance. I'm going to check the load
performance as well.
I've attached the patches I used for the verification. I don't include
patches for pointer tagging, DSA support, and vacuum integration since
I'm investigating the issue on cfbot that Andres reported. Also, I've
modified tests to improve the test coverage.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: