Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoDePt8x14Vrg+xLFd2akRAKHS_8j2uVmyoRVMYOze-VSw@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
Hi,

On Tue, Feb 7, 2023 at 6:25 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
>
> On Tue, Jan 31, 2023 at 9:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I've attached v24 patches. The locking support patch is separated
> > (0005 patch). Also I kept the updates for TidStore and the vacuum
> > integration from v23 separate.
>
> Okay, that's a lot more simple, and closer to what I imagined. For v25, I squashed v24's additions and added a couple
ofmy own. I've kept the CF status at "needs review" because no specific action is required at the moment. 
>
> I did start to review the TID store some more, but that's on hold because something else came up: On a lark I decided
tore-run some benchmarks to see if anything got lost in converting to a template, and that led me down a rabbit hole --
somegood and bad news on that below. 
>
> 0001:
>
> I removed the uint64 case, as discussed. There is now a brief commit message, but needs to be fleshed out a bit. I
tookanother look at the Arm optimization that Nathan found some month ago, for forming the highbit mask, but that
doesn'tplay nicely with how node32 uses it, so I decided against it. I added a comment to describe the reasoning in
casesomeone else gets a similar idea. 
>
> I briefly looked into "separate-commit TODO: move non-SIMD fallbacks to their own header to clean up the #ifdef
maze.",but decided it wasn't such a clear win to justify starting the work now. It's still in the back of my mind, but
Iremoved the reminder from the commit message. 

The changes make sense to me.

>
> 0003:
>
> The template now requires the value to be passed as a pointer. That was a pretty trivial change, but affected
multipleother patches, so not sent separately. Also adds a forgotten RT_ prefix to the bitmap macros and adds a top
commentto the *_impl.h headers. There are some comment fixes. The changes were either trivial or discussed earlier, so
alsonot sent separately. 

Great.

>
> 0004/5: I wanted to measure the load time as well as search time in bench_search_random_nodes(). That's kept separate
tomake it easier to test other patch versions. 
>
> The bad news is that the speed of loading TIDs in bench_seq/shuffle_search() has regressed noticeably. I can't
reproducethis in any other bench function and was the reason for writing 0005 to begin with. More confusingly, my
effortsto fix this improved *other* functions, but the former didn't budge at all. First the patches: 
>
> 0006 adds and removes some "inline" declarations (where it made sense), and added some for "pg_noinline" based on
Andres'advice some months ago. 

Agreed.

>
> 0007 removes some dead code. RT_NODE_INSERT_INNER is only called during RT_SET_EXTEND, so it can't possibly find an
existingkey. This kind of change is much easier with the inner/node cases handled together in a template, as far as
beingsure of how those cases are different. I thought about trying the search in assert builds and verifying it doesn't
exist,but thought yet another #ifdef would be too messy. 

Agreed.

>
> v25-addendum-try-no-maintain-order.txt -- It makes optional keeping the key chunks in order for the linear-search
nodes.I believe the TID store no longer cares about the ordering, but this is a text file for now because I don't want
toclutter the CI with a behavior change. Also, the second ART paper (on concurrency) mentioned that some locking
schemesdon't allow these arrays to be shifted. So it might make sense to give up entirely on guaranteeing ordered
iteration,or at least make it optional as in the patch. 

I think it's still important for lazy vacuum that an iteration over a
TID store returns TIDs in ascending order, because otherwise a heap
vacuum does random writes. That being said, we can have
RT_ITERATE_NEXT() return key-value pairs in an order regardless of how
the key chunks are stored in a node.

> ========================================
> psql -c "select rt_load_ms, rt_search_ms from bench_seq_search(0, 1 * 1000 * 1000)"
> (min load time of three)
>
> v15:
>  rt_load_ms | rt_search_ms
> ------------+--------------
>         113 |          455
>
> v25-0005:
>  rt_load_ms | rt_search_ms
> ------------+--------------
>         135 |          456
>
> v25-0006 (inlining or not):
>  rt_load_ms | rt_search_ms
> ------------+--------------
>         136 |          455
>
> v25-0007 (remove dead code):
>  rt_load_ms | rt_search_ms
> ------------+--------------
>         135 |          455
>
> v25-addendum...txt (no ordering):
>  rt_load_ms | rt_search_ms
> ------------+--------------
>         134 |          455
>
> Note: The regression seems to have started in v17, which is the first with a full template.
>
> Nothing so far has helped here, and previous experience has shown that trying to profile 100ms will not be useful.
Insteadof putting more effort into diving deeper, it seems a better use of time to write a benchmark that calls the tid
storeitself. That's more realistic, since this function was intended to test load and search of tids, but the tid store
doesn'tquite operate so simply anymore. What do you think, Masahiko? 

Yeah, that's more realistic. TidStore now encodes TIDs slightly
differently from the benchmark test.

I've attached the patch that adds a simple benchmark test using
TidStore. With this test, I got similar trends of results to yours
with gcc, but I've not analyzed them in depth yet.

query: select * from bench_tidstore_load(0, 10 * 1000 * 1000)

v15:
 load_ms
---------
     816

v25-0007 (remove dead code):
load_ms
---------
     839

v25-addendum...txt (no ordering):
 load_ms
---------
     820

BTW it would be better to remove the RT_DEBUG macro from bench_radix_tree.c.

>
> I'm inclined to keep 0006, because it might give a slight boost, and 0007 because it's never a bad idea to remove
deadcode. 

Yeah, these two changes make sense to me too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Introduce a new view for checkpointer related stats
Next
From: Jim Jones
Date:
Subject: Re: [PATCH] Add pretty-printed XML output option