Re: Optimising compactify_tuples() - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Optimising compactify_tuples()
Date
Msg-id CA+hUKGLhcvXFMvAG6O1UsE2sN2CUkMBNwdOFAHGEv4Szco_yXQ@mail.gmail.com
Whole thread Raw
In response to Re: Optimising compactify_tuples()  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Optimising compactify_tuples()  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Thu, Sep 10, 2020 at 2:34 AM David Rowley <dgrowleyml@gmail.com> wrote:
> I think you were adequately caffeinated.  You're right that this is
> fairly simple to do, but it looks even more simple than looping twice
> of the array.  I think it's just a matter of looping over the
> itemidbase backwards and putting the higher itemid tuples at the end
> of the page. I've done it this way in the attached patch.

Yeah, I was wondering how to make as much of the algorithm work in a
memory-forwards direction as possible (even the item pointer access),
but it was just a hunch.  Once you have the adjacent-tuple merging
thing so you're down to just a couple of big memcpy calls, it's
probably moot anyway.

> I also added a presorted path which falls back on doing memmoves
> without the temp buffer when the itemidbase array indicates that
> higher lineitems all have higher offsets.  I'm doing the presort check
> in the calling function since that loops over the lineitems already.
> We can just memmove the tuples in reverse order without overwriting
> any yet to be moved tuples when these are in order.

Great.

I wonder if we could also identify a range at the high end that is
already correctly sorted and maximally compacted so it doesn't even
need to be copied out.

+        * Do the tuple compactification.  Collapse memmove calls for adjacent
+        * tuples.

s/memmove/memcpy/

> With the attached v3, performance is better. The test now runs
> recovery 65.6 seconds, vs master's 148.5 seconds. So about 2.2x
> faster.

On my machine I'm seeing 57s, down from 86s on unpatched master, for
the simple pgbench workload from
https://github.com/macdice/redo-bench/.  That's not quite what you're
reporting but it still blows the doors off the faster sorting patch,
which does it in 74s.

> We should probably consider what else can be done to try to write
> pages with tuples for earlier lineitems earlier in the page.  VACUUM
> FULLs and friends will switch back to the opposite order when
> rewriting the heap.

Yeah, and also bulk inserts/COPY.  Ultimately if we flipped our page
format on its head that'd come for free, but that'd be a bigger
project with more ramifications.

So one question is whether we want to do the order-reversing as part
of this patch, or wait for a more joined-up project to make lots of
code paths collude on making scan order match memory order
(corellation = 1).  Most or all of the gain from your patch would
presumably still apply if did the exact opposite and forced offset
order to match reverse-item ID order (correlation = -1), which also
happens to be the initial state when you insert tuples today; you'd
still tend towards a state that allows nice big memmov/memcpy calls
during page compaction.  IIUC currently we start with correlation -1
and then tend towards correlation = 0 after many random updates
because we can't change the order, so it gets scrambled over time.
I'm not sure what I think about that.

PS You might as well post future patches with .patch endings so that
the cfbot tests them; it seems pretty clear now that a patch to
optimise sorting (as useful as it may be for future work) can't beat a
patch to skip it completely.  I took the liberty of switching the
author and review names in the commitfest entry to reflect this.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: SIGQUIT handling, redux
Next
From: Tom Lane
Date:
Subject: Re: SIGQUIT handling, redux