Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoDCTS573Tp5TnpgUDmMYeH=Xz19UabctYus_Eib0-jWQQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Sat, Jul 8, 2023 at 11:54 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
>
> On Fri, Jul 7, 2023 at 2:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 5, 2023 at 8:21 PM John Naylor <john.naylor@enterprisedb.com> wrote:
> > > Well, it's going to be a bit of a mess until I can demonstrate it working (and working well) with bitmap heap
scan.Fixing that now is just going to create conflicts. I do have a couple small older patches laying around that were
quickexperiments -- I think at least some of them should give a performance boost in loading speed, but haven't had
timeto test. Would you like to take a look? 
> >
> > Yes, I can experiment with these patches in the meantime.
>
> Okay, here it is in v36. 0001-6 are same as v35.
>
> 0007 removes a wasted extra computation newly introduced by refactoring growing nodes. 0008 just makes 0011 nicer.
Notworth testing by themselves, but better to be tidy. 
> 0009 is an experiment to get rid of slow memmoves in node4, addressing a long-standing inefficiency. It looks a bit
tricky,but I think it's actually straightforward after drawing out the cases with pen and paper. It works if the fanout
iseither 4 or 5, so we have some wiggle room. This may give a noticeable boost if the input is reversed or random. 
> 0010 allows RT_EXTEND_DOWN to reduce function calls, so should help with sparse trees.
> 0011 reduces function calls when growing the smaller nodes. Not sure about this one -- possibly worth it for node4
only?
>
> If these help, it'll show up more easily in smaller inputs. Large inputs tend to be more dominated by RAM latency.

Thanks for sharing the patches!

0007, 0008, 0010, and 0011 are straightforward and agree to merge them.

I have some questions on 0009 patch:

+       /* shift chunks and children
+
+               Unfortunately, gcc has gotten too aggressive in
turning simple loops
+               into slow memmove's, so we have to be a bit more clever.
+               See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481
+
+               We take advantage of the fact that a good
+               compiler can turn a memmove of a small constant power-of-two
+               number of bytes into a single load/store.
+       */

According to the comment, this optimization is for only gcc? and there
is no negative impact when building with other compilers such as clang
by this change?

I'm not sure that it's a good approach to hand-optimize the code much
to generate better instructions on gcc. I think this change reduces
readability and maintainability. According to the bugzilla ticket
referred to in the comment, it's realized as a bug in the community,
so once the gcc bug fixes, we might no longer need this trick, no?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: \di+ cannot show the same name indexes
Next
From: Michael Paquier
Date:
Subject: Re: 'ERROR: attempted to update invisible tuple' from 'ALTER INDEX ... ATTACH PARTITION' on parent index