Home > mailing lists

Re: Optimising compactify_tuples() - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Optimising compactify_tuples()
Date	September 7, 2020 07:47:59
Msg-id	CAApHDvq_iYSKdK4OR2iLGO1oSOnzJXu-=SFXqY7FQvgPkyv7sg@mail.gmail.com Whole thread Raw
In response to	Re: Optimising compactify_tuples() (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Optimising compactify_tuples() Re: Optimising compactify_tuples()
List	pgsql-hackers

Tree view

On Sun, 6 Sep 2020 at 23:37, David Rowley <dgrowleyml@gmail.com> wrote:
> The test replayed ~2.2 GB of WAL. master took 148.581 seconds and
> master+0001+0002 took 115.588 seconds. That's about 28% faster. Pretty
> nice!

I was wondering today if we could just get rid of the sort in
compactify_tuples() completely. It seems to me that the existing sort
is there just so that the memmove() is done in order of tuple at the
end of the page first. We seem to be just shunting all the tuples to
the end of the page so we need to sort the line items in reverse
offset so as not to overwrite memory for other tuples during the copy.

I wondered if we could get around that just by having another buffer
somewhere and memcpy the tuples into that first then copy the tuples
out that buffer back into the page. No need to worry about the order
we do that in as there's no chance to overwrite memory belonging to
other tuples.

Doing that gives me 79.166 seconds in the same recovery test. Or about
46% faster, instead of 22% (I mistakenly wrote 28% yesterday)

The top of perf report says:

  24.19%  postgres  postgres            [.] PageRepairFragmentation
   8.37%  postgres  postgres            [.] hash_search_with_hash_value
   7.40%  postgres  libc-2.31.so        [.] 0x000000000018e74b
   5.59%  postgres  libc-2.31.so        [.] 0x000000000018e741
   5.49%  postgres  postgres            [.] XLogReadBufferExtended
   4.05%  postgres  postgres            [.] compactify_tuples
   3.27%  postgres  postgres            [.] PinBuffer
   2.88%  postgres  libc-2.31.so        [.] 0x000000000018e470
   2.02%  postgres  postgres            [.] hash_bytes

(I'll need to figure out why libc's debug symbols are not working)

I was thinking that there might be a crossover point to where this
method becomes faster than the sort method.  e.g sorting 1 tuple is
pretty cheap, but copying the memory for the entire tuple space might
be expensive as that includes the tuples we might be getting rid of.
So if we did go down that path we might need some heuristics to decide
which method is likely best. Maybe that's based on the number of
tuples, I'm not really sure. I've not made any attempt to try to give
it a worst-case workload to see if there is a crossover point that's
worth worrying about.

The attached patch is what I used to test this.  It kinda goes and
sticks a page-sized variable on the stack, which is not exactly ideal.
I think we'd likely want to figure some other way to do that, but I
just don't know what that would look like yet. I just put the attached
together quickly to test out the idea.

(I don't want to derail the sort improvements here. I happen to think
those are quite important improvements, so I'll continue to review
that patch still.  Longer term, we might just end up with something
slightly different for compactify_tuples)

David

Attachment

compactify_tuples_dgr.patch.txt

pgsql-hackers by date:

From: Michael Paquier
Date: 07 September 2020, 07:40:57
Subject: Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

From: Surafel Temesgen
Date: 07 September 2020, 07:49:31
Subject: Re: Improvements in Copy From

Re: Optimising compactify_tuples() - Mailing list pgsql-hackers

Attachment

Previous

Next