Re: Optimising compactify_tuples() - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Optimising compactify_tuples()
Date
Msg-id CA+hUKGJLbeBqYij8zL7LV1bY6yPycO7rGbBwqcwgTSceh4-yQg@mail.gmail.com
Whole thread Raw
In response to Re: Optimising compactify_tuples()  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Optimising compactify_tuples()  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Mon, Sep 7, 2020 at 7:48 PM David Rowley <dgrowleyml@gmail.com> wrote:
> I was wondering today if we could just get rid of the sort in
> compactify_tuples() completely. It seems to me that the existing sort
> is there just so that the memmove() is done in order of tuple at the
> end of the page first. We seem to be just shunting all the tuples to
> the end of the page so we need to sort the line items in reverse
> offset so as not to overwrite memory for other tuples during the copy.
>
> I wondered if we could get around that just by having another buffer
> somewhere and memcpy the tuples into that first then copy the tuples
> out that buffer back into the page. No need to worry about the order
> we do that in as there's no chance to overwrite memory belonging to
> other tuples.
>
> Doing that gives me 79.166 seconds in the same recovery test. Or about
> 46% faster, instead of 22% (I mistakenly wrote 28% yesterday)

Wow.

One thought is that if we're going to copy everything out and back in
again, we might want to consider doing it in a
memory-prefetcher-friendly order.  Would it be a good idea to
rearrange the tuples to match line pointer order, so that the copying
work and also later sequential scans are in a forward direction?  The
copying could also perhaps be done with single memcpy() for ranges of
adjacent tuples.  Another thought is that it might be possible to
identify some easy cases that it can handle with an alternative
in-place shifting algorithm without having to go to the
copy-out-and-back-in path.  For example, when the offset order already
matches line pointer order but some dead tuples just need to be
squeezed out by shifting ranges of adjacent tuples, and maybe some
slightly more complicated cases, but nothing requiring hard work like
sorting.

> (I don't want to derail the sort improvements here. I happen to think
> those are quite important improvements, so I'll continue to review
> that patch still.  Longer term, we might just end up with something
> slightly different for compactify_tuples)

Yeah.  Perhaps qsort specialisation needs to come back in a new thread
with a new use case.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: default partition and concurrent attach partition
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation