Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoAiwz2Ehu9wyNrRQw1ZjWZp1RHdcqG16f0o40iW36kDWA@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Sat, Feb 10, 2024 at 9:29 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Tue, Feb 6, 2024 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Feb 2, 2024 at 8:47 PM John Naylor <johncnaylorls@gmail.com> wrote: > > > > My todo: > > > - benchmark tid store / vacuum again, since we haven't since varlen > > > types and removing unnecessary locks. > > I ran a vacuum benchmark similar to the one in [1] (unlogged tables > for reproducibility), but smaller tables (100 million records), > deleting only the last 20% of the table, and including a parallel > vacuum test. Scripts attached. > > monotonically ordered int column index: > > master: > system usage: CPU: user: 4.27 s, system: 0.41 s, elapsed: 4.70 s > system usage: CPU: user: 4.23 s, system: 0.44 s, elapsed: 4.69 s > system usage: CPU: user: 4.26 s, system: 0.39 s, elapsed: 4.66 s > > v-59: > system usage: CPU: user: 3.10 s, system: 0.44 s, elapsed: 3.56 s > system usage: CPU: user: 3.07 s, system: 0.35 s, elapsed: 3.43 s > system usage: CPU: user: 3.07 s, system: 0.36 s, elapsed: 3.44 s > > uuid column index: > > master: > system usage: CPU: user: 18.22 s, system: 1.70 s, elapsed: 20.01 s > system usage: CPU: user: 17.70 s, system: 1.70 s, elapsed: 19.48 s > system usage: CPU: user: 18.48 s, system: 1.59 s, elapsed: 20.43 s > > v-59: > system usage: CPU: user: 5.18 s, system: 1.18 s, elapsed: 6.45 s > system usage: CPU: user: 6.56 s, system: 1.39 s, elapsed: 7.99 s > system usage: CPU: user: 6.51 s, system: 1.44 s, elapsed: 8.05 s > > int & uuid indexes in parallel: > > master: > system usage: CPU: user: 4.53 s, system: 1.22 s, elapsed: 20.43 s > system usage: CPU: user: 4.49 s, system: 1.29 s, elapsed: 20.98 s > system usage: CPU: user: 4.46 s, system: 1.33 s, elapsed: 20.50 s > > v59: > system usage: CPU: user: 2.09 s, system: 0.32 s, elapsed: 4.86 s > system usage: CPU: user: 3.76 s, system: 0.51 s, elapsed: 8.92 s > system usage: CPU: user: 3.83 s, system: 0.54 s, elapsed: 9.09 s > > Over all, I'm pleased with these results, although I'm confused why > sometimes with the patch the first run reports running faster than the > others. I'm curious what others get. Traversing a tree that lives in > DSA has some overhead, as expected, but still comes out way ahead of > master. Thanks! That's a great improvement. I've also run the same scripts in my environment just in case and got similar results: monotonically ordered int column index: master: system usage: CPU: user: 14.81 s, system: 0.90 s, elapsed: 15.74 s system usage: CPU: user: 14.91 s, system: 0.80 s, elapsed: 15.73 s system usage: CPU: user: 14.85 s, system: 0.70 s, elapsed: 15.57 s v-59: system usage: CPU: user: 9.47 s, system: 1.04 s, elapsed: 10.53 s system usage: CPU: user: 9.67 s, system: 0.81 s, elapsed: 10.50 s system usage: CPU: user: 9.59 s, system: 0.86 s, elapsed: 10.47 s uuid column index: master: system usage: CPU: user: 28.37 s, system: 1.38 s, elapsed: 29.81 s system usage: CPU: user: 28.05 s, system: 1.37 s, elapsed: 29.47 s system usage: CPU: user: 28.46 s, system: 1.36 s, elapsed: 29.88 s v-59: system usage: CPU: user: 14.87 s, system: 1.13 s, elapsed: 16.02 s system usage: CPU: user: 14.84 s, system: 1.31 s, elapsed: 16.18 s system usage: CPU: user: 10.96 s, system: 1.24 s, elapsed: 12.22 s int & uuid indexes in parallel: master: system usage: CPU: user: 15.81 s, system: 1.43 s, elapsed: 34.31 s system usage: CPU: user: 15.84 s, system: 1.41 s, elapsed: 34.34 s system usage: CPU: user: 15.92 s, system: 1.39 s, elapsed: 34.33 s v-59: system usage: CPU: user: 10.93 s, system: 0.92 s, elapsed: 17.59 s system usage: CPU: user: 10.92 s, system: 1.20 s, elapsed: 17.58 s system usage: CPU: user: 10.90 s, system: 1.01 s, elapsed: 17.45 s > > There are still some micro-benchmarks we could do on tidstore, and > it'd be good to find out worse-case memory use (1 dead tuple each on > spread-out pages), but this is decent demonstration. I've tested a simple case where vacuum removes 33k dead tuples spread about every 10 pages. master: 198,000 bytes (=33000 * 6) system usage: CPU: user: 29.49 s, system: 0.88 s, elapsed: 30.40 s v-59: 2,834,432 bytes (reported by TidStoreMemoryUsage()) system usage: CPU: user: 15.96 s, system: 0.89 s, elapsed: 16.88 s > > > > I'm not sure what the test_node_types_* functions are testing that > > > test_basic doesn't. They have a different, and confusing, way to stop > > > at every size class and check the keys/values. It seems we can replace > > > all that with two more calls (asc/desc) to test_basic, with the > > > maximum level. > > v58-0008: > > + /* borrowed from RT_MAX_SHIFT */ > + const int max_shift = (pg_leftmost_one_pos64(UINT64_MAX) / > BITS_PER_BYTE) * BITS_PER_BYTE; > > This is harder to read than "64 - 8", and doesn't really help > maintainability either. > Maybe "(sizeof(uint64) - 1) * BITS_PER_BYTE" is a good compromise. > > + /* leaf nodes */ > + test_basic(test_info, 0); > > + /* internal nodes */ > + test_basic(test_info, 8); > + > + /* max-level nodes */ > + test_basic(test_info, max_shift); > > This three-way terminology is not very informative. How about: > > + /* a tree with one level, i.e. a single node under the root node. */ > ... > + /* a tree with two levels */ > ... > + /* a tree with the maximum number of levels */ Agreed. > > +static void > +test_basic(rt_node_class_test_elem *test_info, int shift) > +{ > + elog(NOTICE, "testing node %s with shift %d", test_info->class_name, shift); > + > + /* Test nodes while changing the key insertion order */ > + do_test_basic(test_info->nkeys, shift, false); > + do_test_basic(test_info->nkeys, shift, true); > > Adding a level of indirection makes this harder to read, and do we > still know whether a test failed in asc or desc keys? Agreed, it seems to be better to keep the previous logging style. > > > > My earlier opinion was that "handle" was a nicer variable name, but > > > this brings back the typedef and also keeps the variable name I didn't > > > like, but pushes it down into the function. I'm a bit confused, so > > > I've kept these not-squashed for now. > > > > I misunderstood your comment. I've changed to use a variable name > > rt_handle and removed the TidStoreHandle type. 0013 patch. > > (diff against an earlier version) > - pvs->shared->dead_items_handle = TidStoreGetHandle(dead_items); > + pvs->shared->dead_items_dp = TidStoreGetHandle(dead_items); > > Shall we use "handle" in vacuum_parallel.c as well? Agreed. > > > > I'm pretty sure there's an > > > accidental memset call that crept in there, but I'm running out of > > > steam today. > > I have just a little bit of work to add for v59: > > v59-0009 - set_offset_bitmap_at() will call memset if it needs to zero > any bitmapwords. That can only happen if e.g. there is an offset > 128 > and there are none between 64 and 128, so not a huge deal but I think > it's a bit nicer in this patch. LGTM. > > > > > * WIP: notes about traditional radix tree trading off span vs height... > > > > > > > > Are you going to write it? > > > > > > Yes, when I draft a rough commit message, (for next time). > > I haven't gotten to the commit message, but: I've drafted the commit message. > > v59-0004 - I did some rewriting of the top header comment to explain > ART concepts for new readers, made small comment changes, and tidied > up some indentation that pgindent won't touch > v59-0005 - re-pgindent'ed LGTM, squashed all changes. I've attached these updates from v59 in separate patches. I've run regression tests with valgrind and run the coverity scan, and I don't see critical issues. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: