On Sun, Jul 12, 2020 at 7:25 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
> Do you mean that we should have an implementation for
> get_minimal_tuple() for the heap AM and have it return a pointer to the
> minimal tuple from the MINIMAL_TUPLE_OFFSET? And then a caller such as
> tqueueReceiveSlot() will ensure that the heap tuple from which it wants
> to extract the minimal tuple was allocated in the tuple queue in the
> first place? If we consider that the node directly below a gather is a
> SeqScan, then we could possibly, in ExecInitSeqScan() set-up the
> ss_ScanTupleSlot to point to memory in the shared tuple queue?
> Similarly, other ExecInit*() methods can do the same for other executor
> nodes that involve parallelism? Of course, things would be slightly
> different for
> the other use cases you mentioned (such as hash table population)
What I mean is that where ExecHashTableInsert() and
tqueueReceiveSlot() do ExecFetchSlotMinimalTuple(), you usually get a
freshly allocated copy, and then you copy that again, and free it.
There may be something similar going on in tuplestore and sort code.
Perhaps we could have something like
ExecFetchSlotMinimalTupleInPlace(slot, output_buffer,
output_buffer_size) that returns a value that indicates either success
or hey-that-buffer's-too-small-I-need-N-bytes, or something like that.
That silly extra copy is something Andres pointed out to me in some
perf results involving TPCH hash joins, a couple of years ago.
> All things considered, I think the patch in its current form should go
> in.
Thanks for the testing and review! Pushed.
> Having the in-place copy, could be done as a separate patch? Do you
> agree?
Yeah.