Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id b3406b45caa85f4b0ee5c18625f3250c@postgrespro.ru
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Masahiko Sawada писал 2021-07-27 07:06:
> On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada 
> <sawada.mshk@gmail.com> wrote:
>> 
>> I'll experiment with the proposed ideas including this idea in more
>> scenarios and share the results tomorrow.
>> 
> 
> I've done some benchmarks for proposed data structures. In this trial,
> I've done with the scenario where dead tuples are concentrated on a
> particular range of table blocks (test 5-8), in addition to the
> scenarios I've done in the previous trial. Also, I've done benchmarks
> of each scenario while increasing table size. In the first test, the
> maximum block number of the table is 1,000,000 (i.g., 8GB table) and
> in the second test, it's 10,000,000 (80GB table). We can see how
> performance and memory consumption changes with a large-scale table.
> Here are the results:
> 
> * Test 1
> select prepare(
> 1000000, -- max block
> 10, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 1, -- # of consecutive pages having dead tuples
> 20 -- page interval
> );
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 57.23 MB  |  0.040 |   98.613 | 572.21 MB  |     0.387 |    
> 1521.981
>  intset | 46.88 MB  |  0.114 |   75.944 | 468.67 MB  |     0.961 |     
> 997.760
>  radix  | 40.26 MB  |  0.102 |   18.427 | 336.64 MB  |     0.797 |     
> 266.146
>  rtbm   | 64.02 MB  |  0.234 |   22.443 | 512.02 MB  |     2.230 |     
> 275.143
>  svtm   | 27.28 MB  |  0.060 |   13.568 | 274.07 MB  |     0.476 |     
> 211.073
>  tbm    | 96.01 MB  |  0.273 |   10.347 | 768.01 MB  |     2.882 |     
> 128.103
> 
> * Test 2
> select prepare(
> 1000000, -- max block
> 10, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 1, -- # of consecutive pages having dead tuples
> 1 -- page interval
> );
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 57.23 MB  |  0.041 |    4.757 | 572.21 MB  |     0.344 |      
> 71.228
>  intset | 46.88 MB  |  0.127 |    3.762 | 468.67 MB  |     1.093 |      
> 49.573
>  radix  | 9.95 MB   |  0.048 |    0.679 | 82.57 MB   |     0.371 |      
> 16.211
>  rtbm   | 34.02 MB  |  0.179 |    0.534 | 288.02 MB  |     2.092 |      
>  8.693
>  svtm   | 5.78 MB   |  0.043 |    0.239 | 54.60 MB   |     0.342 |      
>  7.759
>  tbm    | 96.01 MB  |  0.274 |    0.521 | 768.01 MB  |     2.685 |      
>  6.360
> 
> * Test 3
> select prepare(
> 1000000, -- max block
> 2, -- # of dead tuples per page
> 100, -- dead tuples interval within  a page
> 1, -- # of consecutive pages having dead tuples
> 1 -- page interval
> );
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 11.45 MB  |  0.009 |   57.698 | 114.45 MB  |     0.076 |    
> 1045.639
>  intset | 15.63 MB  |  0.031 |   46.083 | 156.23 MB  |     0.243 |     
> 848.525
>  radix  | 40.26 MB  |  0.063 |   13.755 | 336.64 MB  |     0.501 |     
> 223.413
>  rtbm   | 36.02 MB  |  0.123 |   11.527 | 320.02 MB  |     1.843 |     
> 180.977
>  svtm   | 9.28 MB   |  0.053 |    9.631 | 92.59 MB   |     0.438 |     
> 212.626
>  tbm    | 96.01 MB  |  0.228 |   10.381 | 768.01 MB  |     2.258 |     
> 126.630
> 
> * Test 4
> select prepare(
> 1000000, -- max block
> 100, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 1, -- # of consecutive pages having dead tuples
> 1 -- page interval
> );
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 572.21 MB |  0.367 |   78.047 | 5722.05 MB |     3.942 |    
> 1154.776
>  intset | 93.74 MB  |  0.777 |   45.146 | 937.34 MB  |     7.716 |     
> 643.708
>  radix  | 40.26 MB  |  0.203 |    9.015 | 336.64 MB  |     1.775 |     
> 133.294
>  rtbm   | 36.02 MB  |  0.369 |    5.639 | 320.02 MB  |     3.823 |      
> 88.832
>  svtm   | 7.28 MB   |  0.294 |    3.891 | 73.60 MB   |     2.690 |     
> 103.744
>  tbm    | 96.01 MB  |  0.534 |    5.223 | 768.01 MB  |     5.679 |      
> 60.632
> 
> 
> * Test 5
> select prepare(
> 1000000, -- max block
> 150, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 10000, -- # of consecutive pages having dead tuples
> 20000 -- page interval
> );
> 
> There are 10000 consecutive pages that have 150 dead tuples at every
> 20000 pages.
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 429.16 MB |  0.274 |   75.664 | 4291.54 MB |     3.067 |    
> 1259.501
>  intset | 46.88 MB  |  0.559 |   36.449 | 468.67 MB  |     4.565 |     
> 517.445
>  radix  | 20.26 MB  |  0.166 |    8.466 | 196.90 MB  |     1.273 |     
> 166.587
>  rtbm   | 18.02 MB  |  0.242 |    8.491 | 160.02 MB  |     2.407 |     
> 171.725
>  svtm   | 3.66 MB   |  0.243 |    3.635 | 37.10 MB   |     2.022 |      
> 86.165
>  tbm    | 48.01 MB  |  0.344 |    9.763 | 384.01 MB  |     3.327 |     
> 151.824
> 
> * Test 6
> select prepare(
> 1000000, -- max block
> 10, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 10000, -- # of consecutive pages having dead tuples
> 20000 -- page interval
> );
> 
> There are 10000 consecutive pages that have 10 dead tuples at every 
> 20000 pages.
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 28.62 MB  |  0.022 |    2.791 | 286.11 MB  |     0.170 |      
> 46.920
>  intset | 23.45 MB  |  0.061 |    2.156 | 234.34 MB  |     0.501 |      
> 32.577
>  radix  | 5.04 MB   |  0.026 |    0.433 | 48.57 MB   |     0.191 |      
> 11.060
>  rtbm   | 17.02 MB  |  0.074 |    0.533 | 144.02 MB  |     0.954 |      
> 11.502
>  svtm   | 3.16 MB   |  0.023 |    0.206 | 27.60 MB   |     0.175 |      
>  4.886
>  tbm    | 48.01 MB  |  0.132 |    0.656 | 384.01 MB  |     1.284 |      
> 10.231
> 
> * Test 7
> select prepare(
> 1000000, -- max block
> 150, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 1000, -- # of consecutive pages having dead tuples
> 999000 -- page interval
> );
> 
> There are pages that have 150 dead tuples at first 1000 blocks and
> last 1000 blocks.
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 1.72 MB   |  0.002 |    7.507 | 17.17 MB   |     0.011 |      
> 76.510
>  intset | 0.20 MB   |  0.003 |    6.742 | 1.89 MB    |     0.022 |      
> 52.122
>  radix  | 0.20 MB   |  0.001 |    1.023 | 1.07 MB    |     0.007 |      
> 12.023
>  rtbm   | 0.15 MB   |  0.001 |    2.637 | 0.65 MB    |     0.009 |      
> 34.528
>  svtm   | 0.52 MB   |  0.002 |    0.721 | 0.61 MB    |     0.010 |      
>  6.434
>  tbm    | 0.20 MB   |  0.002 |    2.733 | 1.51 MB    |     0.015 |      
> 38.538
> 
> * Test 8
> select prepare(
> 1000000, -- max block
> 100, -- # of dead tuples per page
> 1, -- dead tuples interval within  a page
> 50, -- # of consecutive pages having dead tuples
> 100 -- page interval
> );
> 
> There are 50 consecutive pages that have 100 dead tuples at every 100 
> pages.
> 
>   name  |  attach   | attach | shuffled |  size_x10  | attach_x10| 
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
>  array  | 286.11 MB |  0.184 |   67.233 | 2861.03 MB |     1.743 |     
> 979.070
>  intset | 46.88 MB  |  0.389 |   35.176 | 468.67 MB  |     3.698 |     
> 505.322
>  radix  | 21.82 MB  |  0.116 |    6.160 | 186.86 MB  |     0.891 |     
> 117.730
>  rtbm   | 18.02 MB  |  0.182 |    5.909 | 160.02 MB  |     1.870 |     
> 112.550
>  svtm   | 4.28 MB   |  0.152 |    3.213 | 37.60 MB   |     1.383 |      
> 79.073
>  tbm    | 48.01 MB  |  0.265 |    6.673 | 384.01 MB  |     2.586 |     
> 101.327
> 
> Overall, 'svtm' is faster and consumes less memory. 'radix' tree also
> has good performance and memory usage.
> 
> From these results, svtm is the best data structure among proposed
> ideas for dead tuple storage used during lazy vacuum in terms of
> performance and memory usage. I think it can support iteration by
> extracting the offset of dead tuples for each block while iterating
> chunks.
> 
> Apart from performance and memory usage points of view, we also need
> to consider the reusability of the code. When I started this thread, I
> thought the best data structure would be the one optimized for
> vacuum's dead tuple storage. However, if we can use a data structure
> that can also be used in general, we can use it also for other
> purposes. Moreover, if it's too optimized for the current TID system
> (32 bits block number, 16 bits offset number, maximum block/offset
> number, etc.) it may become a blocker for future changes.
> 
> In that sense, radix tree also seems good since it can also be used in
> gist vacuum as a replacement for intset, or a replacement for hash
> table for shared buffer as discussed before. Are there any other use
> cases? On the other hand, I’m concerned that radix tree would be an
> over-engineering in terms of vacuum's dead tuples storage since the
> dead tuple storage is static data and requires only lookup operation,
> so if we want to use radix tree as dead tuple storage, I'd like to see
> further use cases.

I can evolve svtm to transparent intset replacement certainly. Using
same trick from radix_to_key it will store tids efficiently:

   shift = pg_ceil_log2_32(MaxHeapTuplesPerPage);
   tid_i = ItemPointerGetOffsetNumber(tid);
   tid_i |= ItemPointerGetBlockNumber(tid) << shift;

Will do today's evening.

regards
Yura Sokolov aka funny_falcon



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Inaccurate error message when set fdw batch_size to 0
Next
From: Peter Smith
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions