Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoB8RY5aM_+PzWPEb5M=MZ9J4aVUH+xCj_D8uUx4JWcZDA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
Hi,

On Mon, Aug 14, 2023 at 8:05 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Thu, Jul 13, 2023 at 3:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > 0007, 0008, 0010, and 0011 are straightforward and agree to merge them.

Thank you for updating the patch!

>
> [Part 1 - clear the deck of earlier performance work etc]
>
> Thanks for taking a look! I've merged 0007 and 0008. The others need a performance test to justify them -- an eyeball
checkis not enough. I've now made the time to do that. 
>
> ==== sparse loads
>
> v38 0001-0006 (still using node3 for this test only):
>
> select avg(load_ms) from generate_series(1,100) x(x), lateral (select * from bench_load_random_int(100 * 1000 *
(1+x-x)))a; 
>          avg
> ---------------------
>  27.1000000000000000
>
> select avg(load_ms) from generate_series(1,30) x(x), lateral (select * from bench_load_random_int(500 * 1000 *
(1+x-x)))a; 
>          avg
> ----------------------
>  165.6333333333333333
>
> v38-0007-Optimize-RT_EXTEND_DOWN.patch
>
> select avg(load_ms) from generate_series(1,100) x(x), lateral (select * from bench_load_random_int(100 * 1000 *
(1+x-x)))a; 
>          avg
> ---------------------
>  25.0900000000000000
>
> select avg(load_ms) from generate_series(1,30) x(x), lateral (select * from bench_load_random_int(500 * 1000 *
(1+x-x)))a; 
>          avg
> ----------------------
>  157.3666666666666667
>
> That seems worth doing.
>
> v38-0008-Use-4-children-for-node-4-also-attempt-portable-.patch
>
> This combines two things because I messed up a rebase: Use fanout of 4, and try some macros for shmem sizes, both 32-
and64-bit. Looking at this much, I no longer have a goal to have a separate set of size-classes for non-SIMD platforms,
becausethat would cause global maintenance problems -- it's probably better to reduce worst-case search time where
necessary.That would be much more localized. 
>
> > I have some questions on 0009 patch:
>
> > According to the comment, this optimization is for only gcc?
>
> No, not at all. That tells me the comment is misleading.
>
> > I think this change reduces
> > readability and maintainability.
>
> Well, that much is obvious. What is not obvious is how much it gains us over the alternatives. I do have a simpler
idea,though... 
>
> ==== load mostly node4
>
> select * from bench_search_random_nodes(250*1000, '0xFFFFFF');
> n4 = 42626, n16 = 21492, n32 = 0, n64 = 0, n256 = 257
>  mem_allocated | load_ms | search_ms
> ---------------+---------+-----------
>        7352384 |      25 |         0
>
> v38-0009-TEMP-take-out-search-time-from-bench.patch
>
> This is just to allow LATERAL queries for better measurements.
>
> select avg(load_ms) from generate_series(1,100) x(x), lateral (select * from bench_search_random_nodes(250*1000 *
(1+x-x),'0xFFFFFF')) a; 
>
>          avg
> ---------------------
>  24.8333333333333333

0007, 0008, and 0009 look good to me.

>
> v38-0010-Try-a-simpler-way-to-avoid-memmove.patch
>
> This slightly rewrites the standard loop so that gcc doesn't turn it into a memmove(). Unlike the patch you didn't
like,this *is* gcc-specific. (needs a comment, which I forgot) 
>
>          avg
> ---------------------
>  21.9600000000000000
>
> So, that's not a trivial difference. I wasn't a big fan of Andres' __asm("") workaround, but that may be just my
ignoranceabout it. We need something like either of the two. 
>
> v38-0011-Optimize-add_child_4-take-2.patch
>          avg
> ---------------------
>  21.3500000000000000
>
> This is possibly faster than v38-0010, but looking like not worth the complexity, assuming the other way avoids the
buggoing forward. 

I prefer 0010 but is it worth testing with other compilers such as clang?

>
> > According to the bugzilla ticket
> > referred to in the comment, it's realized as a bug in the community,
> > so once the gcc bug fixes, we might no longer need this trick, no?
>
> No comment in two years...
>
> v38-0013-Use-constant-for-initial-copy-of-chunks-and-chil.patch
>
> This is the same as v37-0011. I wasn't quite satisfied with it since it still has two memcpy() calls, but it actually
seemsto regress: 
>
>          avg
> ---------------------
>  22.0900000000000000
>
> v38-0012-Use-branch-free-coding-to-skip-new-element-index.patch
>
> This patch uses a single loop for the copy.
>
>          avg
> ---------------------
>  21.0300000000000000
>
> Within noise level of v38-0011, but it's small and simple, so I like it, at least for small arrays.

Agreed.

>
> v38-0014-node48-Remove-need-for-RIGHTMOST_ONE-in-radix-tr.patch
> v38-0015-node48-Remove-dead-code-by-using-loop-local-var.patch
>
> Just small cleanups.
>
> v38-0016-Use-memcpy-for-children-when-growing-into-node48.patch
>
> Makes sense, but untested.

Agreed.

BTW cfbot reported that some regression tests failed due to OOM. I've
attached the patch to fix it.

>
> ===============
> [Part 2]
>
> Per off-list discussion with Masahiko, it makes sense to take some of the ideas I've used locally on tidbitmap, and
startincorporating them into earlier vacuum work to get that out the door faster. With that in mind... 
>
> v38-0017-Make-tidstore-more-similar-to-tidbitmap.patch
>
> This uses a simplified PagetableEntry (unimaginatively called BlocktableEntry just to avoid confusion), to be
replacedwith the real thing at a later date. This is still fixed size, to be replaced with a varlen type. 

That's more readable.

>
> Looking at the tidstore tests again after some months, I'm not particularly pleased with the amount of code required
forhow little it seems to be testing, nor the output when something fails. (I wonder how hard it would be to have SQL
functionsthat add blocks/offsets to the tid store, and emit tuples of tids found in the store.) 

It would not be hard to have such SQL functions. I'll try it.

>
> I'm also concerned about the number of places that have to know if the store is using shared memory or not. Something
tothink about later. 
>
> v38-0018-Consolidate-inserting-updating-values.patch
>
> This is something I coded up to get to an API more similar to one in simplehash, as used in tidbitmap.c. It seem
worthdoing on its own to reduce code duplication, and also simplifies coding of varlen types and "runtime-embeddable
values".

Agreed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Xing Guo
Date:
Subject: Re: [RFC] Clang plugin for catching suspicious typedef casting
Next
From: jian he
Date:
Subject: Re: Do we want a hashset type?