Home > mailing lists

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date	October 7, 2022 05:29:11
Msg-id	CAFBsxsEnooJq8jsWg7ujQv8RCCwjZsV8+S0S-jb0nsoiWTAa1Q@mail.gmail.com Whole thread Raw
In response to	Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: [PoC] Improve dead tuple storage for lazy vacuum
List	pgsql-hackers

Tree view

On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> In addition to two patches, I've attached the third patch. It's not
> part of radix tree implementation but introduces a contrib module
> bench_radix_tree, a tool for radix tree performance benchmarking. It
> measures loading and lookup performance of both the radix tree and a
> flat array.

Hi Masahiko, I've been using these benchmarks, along with my own variations, to try various things that I've mentioned. I'm long overdue for an update, but the picture is not yet complete.

For now, I have two questions that I can't figure out on my own:

1. There seems to be some non-obvious limit on the number of keys that are loaded (or at least what the numbers report). This is independent of the number of tids per block. Example below:

john=# select * from bench_shuffle_search(0, 8*1000*1000);
NOTICE: num_keys = 8000000, height = 3, n4 = 0, n16 = 1, n32 = 0, n128 = 250000, n256 = 981
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
---------+------------------+---------------------+------------+---------------+--------------+-----------------
8000000 | 268435456 | 48000000 | 661 | 29 | 276 | 389

john=# select * from bench_shuffle_search(0, 9*1000*1000);
NOTICE: num_keys = 8388608, height = 3, n4 = 0, n16 = 1, n32 = 0, n128 = 262144, n256 = 1028
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
---------+------------------+---------------------+------------+---------------+--------------+-----------------
8388608 | 276824064 | 54000000 | 718 | 33 | 311 | 446

The array is the right size, but nkeys hasn't kept pace. Can you reproduce this? Attached is the patch I'm using to show the stats when running the test. (Side note: The numbers look unfavorable for radix tree because I'm using 1 tid per block here.)

2. I found that bench_shuffle_search() is much *faster* for traditional binary search on an array than bench_seq_search(). I've found this to be true in every case. This seems counterintuitive to me -- any idea why this is? Example:

john=# select * from bench_seq_search(0, 1000000);
NOTICE: num_keys = 1000000, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128 = 1, n256 = 122
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
---------+------------------+---------------------+------------+---------------+--------------+-----------------
1000000 | 10199040 | 180000000 | 168 | 106 | 827 | 3348

john=# select * from bench_shuffle_search(0, 1000000);
NOTICE: num_keys = 1000000, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128 = 1, n256 = 122
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
---------+------------------+---------------------+------------+---------------+--------------+-----------------
1000000 | 10199040 | 180000000 | 171 | 107 | 827 | 1400

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

v65-0001-Turn-on-per-node-counts-in-benchmark.patch

pgsql-hackers by date:

From: Pavel Stehule
Date: 07 October 2022, 05:26:08
Subject: Re: proposal: possibility to read dumped table's name from file

From: "houzj.fnst@fujitsu.com"
Date: 07 October 2022, 06:15:06
Subject: RE: Perform streaming logical transactions by background workers and parallel apply

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

Attachment

Previous

Next