Re: Redundant tuple copy in tqueueReceiveSlot() - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Redundant tuple copy in tqueueReceiveSlot()
Date
Msg-id CA+hUKGL--+RFMbQXcoFpvf1Yx2qn_FjCHWM0jJEGrKBAG=pYZQ@mail.gmail.com
Whole thread Raw
In response to Redundant tuple copy in tqueueReceiveSlot()  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: Redundant tuple copy in tqueueReceiveSlot()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Sep 9, 2020 at 5:23 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> I went ahead and tried doing this. I chose an approach where we can
> return the pointer to the in-place minimal tuple data if it's a
> heap/buffer/minimal tuple slot. A new function
> ExecFetchSlotMinimalTupleData() returns in-place minimal tuple data.
> If it's neither heap, buffer or minimal tuple, it returns a copy as
> usual. The receiver should not assume the data is directly taken from
> MinimalTupleData, so it should set it's t_len to the number of bytes
> returned. Patch attached
> (0001-Avoid-redundant-tuple-copy-while-sending-tuples-to-G.patch)

+char *
+ExecFetchSlotMinimalTupleData(TupleTableSlot *slot, uint32 *len,
+                              bool *shouldFree)

Interesting approach.  It's a bit of a weird interface, returning a
pointer to a non-valid MinimalTuple that requires extra tweaking after
you copy it to make it a valid one and that you're not allowed to
tweak in-place.  I'd probably make that return "const void *" just for
a bit of extra documentation.  I wonder if there is a way we could
make "Minimal Tuples but with the length travelling separately (and
perhaps chopped off?)" into a first-class concept...  It's also a
shame to be schlepping a bunch of padding bytes around.

     tuple = (MinimalTuple) data;
-    Assert(tuple->t_len == nbytes);
+    tuple->t_len = nbytes;

Hmm, so you have to scribble on shared memory on the receiving side.
I wondered about a couple of different ways to share the length field
with the shm_mq envelope, but that all seems a bit too weird...

> Thomas, I guess you had a different approach in mind when you said
> about "returning either success or
> hey-that-buffer's-too-small-I-need-N-bytes".  But what looks clear to

Yeah I tried some things like that, but I wasn't satisfied with any of
them; basically the extra work involved in negotiating the size was a
bit too high.  On the other hand, because my interface was "please
write a MinimalTuple here!", it had the option to *form* a
MinimalTuple directly in place, whereas your approach can only avoid
creating and destroying a temporary tuple when the source is a heap
tuple.

> me is that avoiding the copy shows consistent improvement of 4 to 10%
> for simple parallel table scans. I tried my patch on both x86_64 and
> arm64, and observed this speedup on both.

I think that's a great validation of the goal but I hope we can figure
out a way that avoids the temporary tuple for more cases.  FWIW I saw
hash self joins running a couple of percent faster with one of my
abandoned patches.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_restore causing deadlocks on partitioned tables
Next
From: Ashutosh Sharma
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."