Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoDePt8x14Vrg+xLFd2akRAKHS_8j2uVmyoRVMYOze-VSw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum Re: [PoC] Improve dead tuple storage for lazy vacuum |
List | pgsql-hackers |
Hi, On Tue, Feb 7, 2023 at 6:25 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > On Tue, Jan 31, 2023 at 9:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I've attached v24 patches. The locking support patch is separated > > (0005 patch). Also I kept the updates for TidStore and the vacuum > > integration from v23 separate. > > Okay, that's a lot more simple, and closer to what I imagined. For v25, I squashed v24's additions and added a couple ofmy own. I've kept the CF status at "needs review" because no specific action is required at the moment. > > I did start to review the TID store some more, but that's on hold because something else came up: On a lark I decided tore-run some benchmarks to see if anything got lost in converting to a template, and that led me down a rabbit hole -- somegood and bad news on that below. > > 0001: > > I removed the uint64 case, as discussed. There is now a brief commit message, but needs to be fleshed out a bit. I tookanother look at the Arm optimization that Nathan found some month ago, for forming the highbit mask, but that doesn'tplay nicely with how node32 uses it, so I decided against it. I added a comment to describe the reasoning in casesomeone else gets a similar idea. > > I briefly looked into "separate-commit TODO: move non-SIMD fallbacks to their own header to clean up the #ifdef maze.",but decided it wasn't such a clear win to justify starting the work now. It's still in the back of my mind, but Iremoved the reminder from the commit message. The changes make sense to me. > > 0003: > > The template now requires the value to be passed as a pointer. That was a pretty trivial change, but affected multipleother patches, so not sent separately. Also adds a forgotten RT_ prefix to the bitmap macros and adds a top commentto the *_impl.h headers. There are some comment fixes. The changes were either trivial or discussed earlier, so alsonot sent separately. Great. > > 0004/5: I wanted to measure the load time as well as search time in bench_search_random_nodes(). That's kept separate tomake it easier to test other patch versions. > > The bad news is that the speed of loading TIDs in bench_seq/shuffle_search() has regressed noticeably. I can't reproducethis in any other bench function and was the reason for writing 0005 to begin with. More confusingly, my effortsto fix this improved *other* functions, but the former didn't budge at all. First the patches: > > 0006 adds and removes some "inline" declarations (where it made sense), and added some for "pg_noinline" based on Andres'advice some months ago. Agreed. > > 0007 removes some dead code. RT_NODE_INSERT_INNER is only called during RT_SET_EXTEND, so it can't possibly find an existingkey. This kind of change is much easier with the inner/node cases handled together in a template, as far as beingsure of how those cases are different. I thought about trying the search in assert builds and verifying it doesn't exist,but thought yet another #ifdef would be too messy. Agreed. > > v25-addendum-try-no-maintain-order.txt -- It makes optional keeping the key chunks in order for the linear-search nodes.I believe the TID store no longer cares about the ordering, but this is a text file for now because I don't want toclutter the CI with a behavior change. Also, the second ART paper (on concurrency) mentioned that some locking schemesdon't allow these arrays to be shifted. So it might make sense to give up entirely on guaranteeing ordered iteration,or at least make it optional as in the patch. I think it's still important for lazy vacuum that an iteration over a TID store returns TIDs in ascending order, because otherwise a heap vacuum does random writes. That being said, we can have RT_ITERATE_NEXT() return key-value pairs in an order regardless of how the key chunks are stored in a node. > ======================================== > psql -c "select rt_load_ms, rt_search_ms from bench_seq_search(0, 1 * 1000 * 1000)" > (min load time of three) > > v15: > rt_load_ms | rt_search_ms > ------------+-------------- > 113 | 455 > > v25-0005: > rt_load_ms | rt_search_ms > ------------+-------------- > 135 | 456 > > v25-0006 (inlining or not): > rt_load_ms | rt_search_ms > ------------+-------------- > 136 | 455 > > v25-0007 (remove dead code): > rt_load_ms | rt_search_ms > ------------+-------------- > 135 | 455 > > v25-addendum...txt (no ordering): > rt_load_ms | rt_search_ms > ------------+-------------- > 134 | 455 > > Note: The regression seems to have started in v17, which is the first with a full template. > > Nothing so far has helped here, and previous experience has shown that trying to profile 100ms will not be useful. Insteadof putting more effort into diving deeper, it seems a better use of time to write a benchmark that calls the tid storeitself. That's more realistic, since this function was intended to test load and search of tids, but the tid store doesn'tquite operate so simply anymore. What do you think, Masahiko? Yeah, that's more realistic. TidStore now encodes TIDs slightly differently from the benchmark test. I've attached the patch that adds a simple benchmark test using TidStore. With this test, I got similar trends of results to yours with gcc, but I've not analyzed them in depth yet. query: select * from bench_tidstore_load(0, 10 * 1000 * 1000) v15: load_ms --------- 816 v25-0007 (remove dead code): load_ms --------- 839 v25-addendum...txt (no ordering): load_ms --------- 820 BTW it would be better to remove the RT_DEBUG macro from bench_radix_tree.c. > > I'm inclined to keep 0006, because it might give a slight boost, and 0007 because it's never a bad idea to remove deadcode. Yeah, these two changes make sense to me too. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: