Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAFBsxsFDhOSg4kGRLEbqqpdUANGHRREHd5EGZ+mDARe5wB5Zog@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Thu, Jul 13, 2023 at 3:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> 0007, 0008, 0010, and 0011 are straightforward and agree to merge them.

[Part 1 - clear the deck of earlier performance work etc]

Thanks for taking a look! I've merged 0007 and 0008. The others need a performance test to justify them -- an eyeball check is not enough. I've now made the time to do that.

==== sparse loads

v38 0001-0006 (still using node3 for this test only):

select avg(load_ms) from generate_series(1,100) x(x), lateral (select * from bench_load_random_int(100 * 1000 * (1+x-x))) a;
         avg        
---------------------
 27.1000000000000000

select avg(load_ms) from generate_series(1,30) x(x), lateral (select * from bench_load_random_int(500 * 1000 * (1+x-x))) a;
         avg          
----------------------
 165.6333333333333333

v38-0007-Optimize-RT_EXTEND_DOWN.patch

select avg(load_ms) from generate_series(1,100) x(x), lateral (select * from bench_load_random_int(100 * 1000 * (1+x-x))) a;
         avg        
---------------------
 25.0900000000000000

select avg(load_ms) from generate_series(1,30) x(x), lateral (select * from bench_load_random_int(500 * 1000 * (1+x-x))) a;
         avg          
----------------------
 157.3666666666666667

That seems worth doing.

v38-0008-Use-4-children-for-node-4-also-attempt-portable-.patch

This combines two things because I messed up a rebase: Use fanout of 4, and try some macros for shmem sizes, both 32- and 64-bit. Looking at this much, I no longer have a goal to have a separate set of size-classes for non-SIMD platforms, because that would cause global maintenance problems -- it's probably better to reduce worst-case search time where necessary. That would be much more localized.

> I have some questions on 0009 patch:

> According to the comment, this optimization is for only gcc?

No, not at all. That tells me the comment is misleading.

> I think this change reduces
> readability and maintainability.

Well, that much is obvious. What is not obvious is how much it gains us over the alternatives. I do have a simpler idea, though...

==== load mostly node4

select * from bench_search_random_nodes(250*1000, '0xFFFFFF');
n4 = 42626, n16 = 21492, n32 = 0, n64 = 0, n256 = 257
 mem_allocated | load_ms | search_ms
---------------+---------+-----------
       7352384 |      25 |         0

v38-0009-TEMP-take-out-search-time-from-bench.patch

This is just to allow LATERAL queries for better measurements.

select avg(load_ms) from generate_series(1,100) x(x), lateral (select * from bench_search_random_nodes(250*1000 * (1+x-x), '0xFFFFFF')) a;

         avg        
---------------------
 24.8333333333333333

v38-0010-Try-a-simpler-way-to-avoid-memmove.patch

This slightly rewrites the standard loop so that gcc doesn't turn it into a memmove(). Unlike the patch you didn't like, this *is* gcc-specific. (needs a comment, which I forgot)

         avg        
---------------------
 21.9600000000000000

So, that's not a trivial difference. I wasn't a big fan of Andres' __asm("") workaround, but that may be just my ignorance about it. We need something like either of the two.

v38-0011-Optimize-add_child_4-take-2.patch
         avg        
---------------------
 21.3500000000000000

This is possibly faster than v38-0010, but looking like not worth the complexity, assuming the other way avoids the bug going forward.

> According to the bugzilla ticket
> referred to in the comment, it's realized as a bug in the community,
> so once the gcc bug fixes, we might no longer need this trick, no?

No comment in two years...

v38-0013-Use-constant-for-initial-copy-of-chunks-and-chil.patch

This is the same as v37-0011. I wasn't quite satisfied with it since it still has two memcpy() calls, but it actually seems to regress:

         avg        
---------------------
 22.0900000000000000

v38-0012-Use-branch-free-coding-to-skip-new-element-index.patch

This patch uses a single loop for the copy.

         avg        
---------------------
 21.0300000000000000

Within noise level of v38-0011, but it's small and simple, so I like it, at least for small arrays.

v38-0014-node48-Remove-need-for-RIGHTMOST_ONE-in-radix-tr.patch
v38-0015-node48-Remove-dead-code-by-using-loop-local-var.patch

Just small cleanups.

v38-0016-Use-memcpy-for-children-when-growing-into-node48.patch

Makes sense, but untested.

===============
[Part 2]

Per off-list discussion with Masahiko, it makes sense to take some of the ideas I've used locally on tidbitmap, and start incorporating them into earlier vacuum work to get that out the door faster. With that in mind...

v38-0017-Make-tidstore-more-similar-to-tidbitmap.patch

This uses a simplified PagetableEntry (unimaginatively called BlocktableEntry just to avoid confusion), to be replaced with the real thing at a later date. This is still fixed size, to be replaced with a varlen type. 

Looking at the tidstore tests again after some months, I'm not particularly pleased with the amount of code required for how little it seems to be testing, nor the output when something fails. (I wonder how hard it would be to have SQL functions that add blocks/offsets to the tid store, and emit tuples of tids found in the store.)

I'm also concerned about the number of places that have to know if the store is using shared memory or not. Something to think about later.

v38-0018-Consolidate-inserting-updating-values.patch

This is something I coded up to get to an API more similar to one in simplehash, as used in tidbitmap.c. It seem worth doing on its own to reduce code duplication, and also simplifies coding of varlen types and "runtime-embeddable values".

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: pgbench with libevent?
Next
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: Ignore 2PC transaction GIDs in query jumbling