On Wed, Sep 9, 2020 at 5:23 PM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> I went ahead and tried doing this. I chose an approach where we can
> return the pointer to the in-place minimal tuple data if it's a
> heap/buffer/minimal tuple slot. A new function
> ExecFetchSlotMinimalTupleData() returns in-place minimal tuple data.
> If it's neither heap, buffer or minimal tuple, it returns a copy as
> usual. The receiver should not assume the data is directly taken from
> MinimalTupleData, so it should set it's t_len to the number of bytes
> returned. Patch attached
> (0001-Avoid-redundant-tuple-copy-while-sending-tuples-to-G.patch)
+char *
+ExecFetchSlotMinimalTupleData(TupleTableSlot *slot, uint32 *len,
+ bool *shouldFree)
Interesting approach. It's a bit of a weird interface, returning a
pointer to a non-valid MinimalTuple that requires extra tweaking after
you copy it to make it a valid one and that you're not allowed to
tweak in-place. I'd probably make that return "const void *" just for
a bit of extra documentation. I wonder if there is a way we could
make "Minimal Tuples but with the length travelling separately (and
perhaps chopped off?)" into a first-class concept... It's also a
shame to be schlepping a bunch of padding bytes around.
tuple = (MinimalTuple) data;
- Assert(tuple->t_len == nbytes);
+ tuple->t_len = nbytes;
Hmm, so you have to scribble on shared memory on the receiving side.
I wondered about a couple of different ways to share the length field
with the shm_mq envelope, but that all seems a bit too weird...
> Thomas, I guess you had a different approach in mind when you said
> about "returning either success or
> hey-that-buffer's-too-small-I-need-N-bytes". But what looks clear to
Yeah I tried some things like that, but I wasn't satisfied with any of
them; basically the extra work involved in negotiating the size was a
bit too high. On the other hand, because my interface was "please
write a MinimalTuple here!", it had the option to *form* a
MinimalTuple directly in place, whereas your approach can only avoid
creating and destroying a temporary tuple when the source is a heap
tuple.
> me is that avoiding the copy shows consistent improvement of 4 to 10%
> for simple parallel table scans. I tried my patch on both x86_64 and
> arm64, and observed this speedup on both.
I think that's a great validation of the goal but I hope we can figure
out a way that avoids the temporary tuple for more cases. FWIW I saw
hash self joins running a couple of percent faster with one of my
abandoned patches.