Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CANWCAZYqWibTRCWs5mV57mLj1A0nbKX-eV5G+d-KmBOGHTVY-w@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Tue, Feb 6, 2024 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Feb 2, 2024 at 8:47 PM John Naylor <johncnaylorls@gmail.com> wrote: > > My todo: > > - benchmark tid store / vacuum again, since we haven't since varlen > > types and removing unnecessary locks. I ran a vacuum benchmark similar to the one in [1] (unlogged tables for reproducibility), but smaller tables (100 million records), deleting only the last 20% of the table, and including a parallel vacuum test. Scripts attached. monotonically ordered int column index: master: system usage: CPU: user: 4.27 s, system: 0.41 s, elapsed: 4.70 s system usage: CPU: user: 4.23 s, system: 0.44 s, elapsed: 4.69 s system usage: CPU: user: 4.26 s, system: 0.39 s, elapsed: 4.66 s v-59: system usage: CPU: user: 3.10 s, system: 0.44 s, elapsed: 3.56 s system usage: CPU: user: 3.07 s, system: 0.35 s, elapsed: 3.43 s system usage: CPU: user: 3.07 s, system: 0.36 s, elapsed: 3.44 s uuid column index: master: system usage: CPU: user: 18.22 s, system: 1.70 s, elapsed: 20.01 s system usage: CPU: user: 17.70 s, system: 1.70 s, elapsed: 19.48 s system usage: CPU: user: 18.48 s, system: 1.59 s, elapsed: 20.43 s v-59: system usage: CPU: user: 5.18 s, system: 1.18 s, elapsed: 6.45 s system usage: CPU: user: 6.56 s, system: 1.39 s, elapsed: 7.99 s system usage: CPU: user: 6.51 s, system: 1.44 s, elapsed: 8.05 s int & uuid indexes in parallel: master: system usage: CPU: user: 4.53 s, system: 1.22 s, elapsed: 20.43 s system usage: CPU: user: 4.49 s, system: 1.29 s, elapsed: 20.98 s system usage: CPU: user: 4.46 s, system: 1.33 s, elapsed: 20.50 s v59: system usage: CPU: user: 2.09 s, system: 0.32 s, elapsed: 4.86 s system usage: CPU: user: 3.76 s, system: 0.51 s, elapsed: 8.92 s system usage: CPU: user: 3.83 s, system: 0.54 s, elapsed: 9.09 s Over all, I'm pleased with these results, although I'm confused why sometimes with the patch the first run reports running faster than the others. I'm curious what others get. Traversing a tree that lives in DSA has some overhead, as expected, but still comes out way ahead of master. There are still some micro-benchmarks we could do on tidstore, and it'd be good to find out worse-case memory use (1 dead tuple each on spread-out pages), but this is decent demonstration. > > I'm not sure what the test_node_types_* functions are testing that > > test_basic doesn't. They have a different, and confusing, way to stop > > at every size class and check the keys/values. It seems we can replace > > all that with two more calls (asc/desc) to test_basic, with the > > maximum level. v58-0008: + /* borrowed from RT_MAX_SHIFT */ + const int max_shift = (pg_leftmost_one_pos64(UINT64_MAX) / BITS_PER_BYTE) * BITS_PER_BYTE; This is harder to read than "64 - 8", and doesn't really help maintainability either. Maybe "(sizeof(uint64) - 1) * BITS_PER_BYTE" is a good compromise. + /* leaf nodes */ + test_basic(test_info, 0); + /* internal nodes */ + test_basic(test_info, 8); + + /* max-level nodes */ + test_basic(test_info, max_shift); This three-way terminology is not very informative. How about: + /* a tree with one level, i.e. a single node under the root node. */ ... + /* a tree with two levels */ ... + /* a tree with the maximum number of levels */ +static void +test_basic(rt_node_class_test_elem *test_info, int shift) +{ + elog(NOTICE, "testing node %s with shift %d", test_info->class_name, shift); + + /* Test nodes while changing the key insertion order */ + do_test_basic(test_info->nkeys, shift, false); + do_test_basic(test_info->nkeys, shift, true); Adding a level of indirection makes this harder to read, and do we still know whether a test failed in asc or desc keys? > > My earlier opinion was that "handle" was a nicer variable name, but > > this brings back the typedef and also keeps the variable name I didn't > > like, but pushes it down into the function. I'm a bit confused, so > > I've kept these not-squashed for now. > > I misunderstood your comment. I've changed to use a variable name > rt_handle and removed the TidStoreHandle type. 0013 patch. (diff against an earlier version) - pvs->shared->dead_items_handle = TidStoreGetHandle(dead_items); + pvs->shared->dead_items_dp = TidStoreGetHandle(dead_items); Shall we use "handle" in vacuum_parallel.c as well? > > I'm pretty sure there's an > > accidental memset call that crept in there, but I'm running out of > > steam today. I have just a little bit of work to add for v59: v59-0009 - set_offset_bitmap_at() will call memset if it needs to zero any bitmapwords. That can only happen if e.g. there is an offset > 128 and there are none between 64 and 128, so not a huge deal but I think it's a bit nicer in this patch. > > > * WIP: notes about traditional radix tree trading off span vs height... > > > > > > Are you going to write it? > > > > Yes, when I draft a rough commit message, (for next time). I haven't gotten to the commit message, but: v59-0004 - I did some rewriting of the top header comment to explain ART concepts for new readers, made small comment changes, and tidied up some indentation that pgindent won't touch v59-0005 - re-pgindent'ed [1] https://www.postgresql.org/message-id/CAFBsxsHUxmXYy0y4RrhMcNe-R11Bm099Xe-wUdb78pOu0%2BPT2Q%40mail.gmail.com
Attachment
pgsql-hackers by date: