Re: Optimising compactify_tuples() - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Optimising compactify_tuples()
Date
Msg-id CA+hUKG+SkE2vtS_owMNJyM0FbEHOdaUcOi4wHvkoq09HwGjBcg@mail.gmail.com
Whole thread Raw
In response to Re: Optimising compactify_tuples()  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Fri, Sep 11, 2020 at 1:45 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 10 Sep 2020 at 10:40, Thomas Munro <thomas.munro@gmail.com> wrote:
> > I wonder if we could also identify a range at the high end that is
> > already correctly sorted and maximally compacted so it doesn't even
> > need to be copied out.
>
> I've experimented quite a bit with this patch today. I think I've
> tested every idea you've mentioned here, so there's quite a lot of
> information to share.
>
> I did write code to skip the copy to the separate buffer for tuples
> that are already in the correct place and with a version of the patch
> which keeps tuples in their traditional insert order (later lineitem's
> tuple located earlier in the page) I see a generally a very large
> number of tuples being skipped with this method.  See attached
> v4b_skipped_tuples.png. The vertical axis is the number of
> compactify_tuple() calls during the benchmark where we were able to
> skip that number of tuples. The average skipped tuples overall calls
> during recovery was 81 tuples, so we get to skip about half the tuples
> in the page doing this on this benchmark.

Excellent.

> > So one question is whether we want to do the order-reversing as part
> > of this patch, or wait for a more joined-up project to make lots of
> > code paths collude on making scan order match memory order
> > (corellation = 1).  Most or all of the gain from your patch would
> > presumably still apply if did the exact opposite and forced offset
> > order to match reverse-item ID order (correlation = -1), which also
> > happens to be the initial state when you insert tuples today; you'd
> > still tend towards a state that allows nice big memmov/memcpy calls
> > during page compaction.  IIUC currently we start with correlation -1
> > and then tend towards correlation = 0 after many random updates
> > because we can't change the order, so it gets scrambled over time.
> > I'm not sure what I think about that.
>
> So I did lots of benchmarking with both methods and my conclusion is
> that I think we should stick to the traditional INSERT order with this
> patch. But we should come back and revisit that more generally one
> day. The main reason that I'm put off flipping the tuple order is that
> it significantly reduces the number of times we hit the preordered
> case.  We go to all the trouble of reversing the order only to have it
> broken again when we add 1 more tuple to the page.  If we keep this
> the traditional way, then it's much more likely that we'll maintain
> that pre-order and hit the more optimal memmove code path.

Right, that makes sense.  Thanks for looking into it!

> I've also attached another tiny patch that I think is pretty useful
> separate from this. It basically changes:
>
> LOG:  redo done at 0/D518FFD0
>
> into:
>
> LOG:  redo done at 0/D518FFD0 system usage: CPU: user: 58.93 s,
> system: 0.74 s, elapsed: 62.31 s

+1



pgsql-hackers by date:

Previous
From: "Hunter, James"
Date:
Subject: Re: Fix for parallel BTree initialization bug
Next
From: Thomas Munro
Date:
Subject: Re: Optimising compactify_tuples()