Re: Optimising compactify_tuples() - Mailing list pgsql-hackers

From David Rowley
Subject Re: Optimising compactify_tuples()
Date
Msg-id CAApHDvrTqc0XvYDHuh6irteOy4e4t0JNjY6JGpsmXLnHksnyxg@mail.gmail.com
Whole thread Raw
In response to Re: Optimising compactify_tuples()  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Optimising compactify_tuples()  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Wed, 9 Sep 2020 at 05:38, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Sep 9, 2020 at 3:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > On Tue, 8 Sep 2020 at 12:08, Thomas Munro <thomas.munro@gmail.com> wrote:
> > > One thought is that if we're going to copy everything out and back in
> > > again, we might want to consider doing it in a
> > > memory-prefetcher-friendly order.  Would it be a good idea to
> > > rearrange the tuples to match line pointer order, so that the copying
> > > work and also later sequential scans are in a forward direction?
> >
> > That's an interesting idea but wouldn't that require both the copy to
> > the separate buffer *and* a qsort? That's the worst of both
> > implementations. We'd need some other data structure too in order to
> > get the index of the sorted array by reverse lineitem point, which
> > might require an additional array and an additional sort.
>
> Well I may not have had enough coffee yet but I thought you'd just
> have to spin though the item IDs twice.  Once to compute sum(lp_len)
> so you can compute the new pd_upper, and the second time to copy the
> tuples from their random locations on the temporary page to new
> sequential locations, so that afterwards item ID order matches offset
> order.

I think you were adequately caffeinated.  You're right that this is
fairly simple to do, but it looks even more simple than looping twice
of the array.  I think it's just a matter of looping over the
itemidbase backwards and putting the higher itemid tuples at the end
of the page. I've done it this way in the attached patch.

I also added a presorted path which falls back on doing memmoves
without the temp buffer when the itemidbase array indicates that
higher lineitems all have higher offsets.  I'm doing the presort check
in the calling function since that loops over the lineitems already.
We can just memmove the tuples in reverse order without overwriting
any yet to be moved tuples when these are in order.

Also, I added code to collapse the memcpy and memmoves for adjacent
tuples so that we perform the minimal number of calls to those
functions. Once we've previously compacted a page it seems that the
code is able to reduce the number of calls significantly.  I added
some logging and reviewed at after a run of the benchmark and saw that
for about 192 tuples we're mostly just doing 3-4 memcpys in the
non-presorted path and just 2 memmoves, for the presorted code path.
I also found that in my test the presorted path was only taken 12.39%
of the time. Trying with 120 million UPDATEs instead of 12 million in
the test ended up reducing this to just 10.89%.  It seems that it'll
just be 1 or 2 tuples spoiling it since new tuples will still be added
earlier in the page after we free up space to add more.

I also experimented seeing what would happen if I also tried to
collapse the memcpys for copying to the temp buffer. The performance
got a little worse from doing that. So I left that code #ifdef'd out

With the attached v3, performance is better. The test now runs
recovery 65.6 seconds, vs master's 148.5 seconds. So about 2.2x
faster.

We should probably consider what else can be done to try to write
pages with tuples for earlier lineitems earlier in the page.  VACUUM
FULLs and friends will switch back to the opposite order when
rewriting the heap.

Also fixed my missing libc debug symbols:

  24.90%  postgres  postgres            [.] PageRepairFragmentation
  15.26%  postgres  libc-2.31.so        [.] __memmove_avx_unaligned_erms
   9.61%  postgres  postgres            [.] hash_search_with_hash_value
   8.03%  postgres  postgres            [.] compactify_tuples
   6.25%  postgres  postgres            [.] XLogReadBufferExtended
   3.74%  postgres  postgres            [.] PinBuffer
   2.25%  postgres  postgres            [.] hash_bytes
   1.79%  postgres  postgres            [.] heap_xlog_update
   1.47%  postgres  postgres            [.] LWLockRelease
   1.44%  postgres  postgres            [.] XLogReadRecord
   1.33%  postgres  postgres            [.] PageGetHeapFreeSpace
   1.16%  postgres  postgres            [.] DecodeXLogRecord
   1.13%  postgres  postgres            [.] pg_comp_crc32c_sse42
   1.12%  postgres  postgres            [.] LWLockAttemptLock
   1.09%  postgres  postgres            [.] StartupXLOG
   0.90%  postgres  postgres            [.] ReadBuffer_common
   0.84%  postgres  postgres            [.] SlruSelectLRUPage
   0.74%  postgres  libc-2.31.so        [.] __memcmp_avx2_movbe
   0.68%  postgres  [kernel.kallsyms]   [k] copy_user_generic_string
   0.66%  postgres  postgres            [.] PageAddItemExtended
   0.66%  postgres  postgres            [.] PageIndexTupleOverwrite
   0.62%  postgres  postgres            [.] smgropen
   0.60%  postgres  postgres            [.] ReadPageInternal
   0.57%  postgres  postgres            [.] GetPrivateRefCountEntry
   0.52%  postgres  postgres            [.] heap_redo
   0.51%  postgres  postgres            [.] AdvanceNextFullTransactionIdPastXid

David

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Reduce the time required for a database recovery from archive.
Next
From: "Jameson, Hunter 'James'"
Date:
Subject: Re: [UNVERIFIED SENDER] Re: Fix for parallel BTree initialization bug