Redundant tuple copy in tqueueReceiveSlot() - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Redundant tuple copy in tqueueReceiveSlot() |
Date | |
Msg-id | CAJ3gD9c7iLLPWQdPUgZGGyQfspHvPg857pjeOqfXwCNOwjvKRQ@mail.gmail.com Whole thread Raw |
In response to | Re: Does TupleQueueReaderNext() really need to copy its result? (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Redundant tuple copy in tqueueReceiveSlot()
|
List | pgsql-hackers |
Hi, I am starting a new thread that continues with the following point that was discussed in [1] .... On Fri, 17 Jul 2020 at 09:05, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sun, Jul 12, 2020 at 7:25 AM Soumyadeep Chakraborty > <soumyadeep2007@gmail.com> wrote: > > Do you mean that we should have an implementation for > > get_minimal_tuple() for the heap AM and have it return a pointer to the > > minimal tuple from the MINIMAL_TUPLE_OFFSET? And then a caller such as > > tqueueReceiveSlot() will ensure that the heap tuple from which it wants > > to extract the minimal tuple was allocated in the tuple queue in the > > first place? If we consider that the node directly below a gather is a > > SeqScan, then we could possibly, in ExecInitSeqScan() set-up the > > ss_ScanTupleSlot to point to memory in the shared tuple queue? > > Similarly, other ExecInit*() methods can do the same for other executor > > nodes that involve parallelism? Of course, things would be slightly > > different for > > the other use cases you mentioned (such as hash table population) > > What I mean is that where ExecHashTableInsert() and > tqueueReceiveSlot() do ExecFetchSlotMinimalTuple(), you usually get a > freshly allocated copy, and then you copy that again, and free it. > There may be something similar going on in tuplestore and sort code. > Perhaps we could have something like > ExecFetchSlotMinimalTupleInPlace(slot, output_buffer, > output_buffer_size) that returns a value that indicates either success > or hey-that-buffer's-too-small-I-need-N-bytes, or something like that. > That silly extra copy is something Andres pointed out to me in some > perf results involving TPCH hash joins, a couple of years ago. I went ahead and tried doing this. I chose an approach where we can return the pointer to the in-place minimal tuple data if it's a heap/buffer/minimal tuple slot. A new function ExecFetchSlotMinimalTupleData() returns in-place minimal tuple data. If it's neither heap, buffer or minimal tuple, it returns a copy as usual. The receiver should not assume the data is directly taken from MinimalTupleData, so it should set it's t_len to the number of bytes returned. Patch attached (0001-Avoid-redundant-tuple-copy-while-sending-tuples-to-G.patch) Thomas, I guess you had a different approach in mind when you said about "returning either success or hey-that-buffer's-too-small-I-need-N-bytes". But what looks clear to me is that avoiding the copy shows consistent improvement of 4 to 10% for simple parallel table scans. I tried my patch on both x86_64 and arm64, and observed this speedup on both. create table tab as select generate_series(1, 20000000) id, 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz' v; select pg_prewarm('tab'::regclass); explain analyze select * from tab where id %2 = 0; Times in milli-secs : HEAD : 1833.119 1816.522 1875.648 1872.153 1834.383 Patch'ed : 1763.786 1721.160 1714.665 1719.738 1704.478 This was with the default 2 parallel workers. With 3 or 4 workers, for the above testcase I didn't see a noticeable difference. I think, if I double the number of rows, the difference will be noticeable. In any case, the gain would go on reducing with the number of workers, because the tuple copy also gets parallelized. In some scenarios, parallel_leader_participation=off causes the difference to amplify. Haven't had a chance to see if this helps any of the TPC-H queries. Also attached is a patch guc_for_testing.patch that I used for testing the gain. This patch is only for testing. Without this, in order to compare the performance figures it requires server restart, and the figures anyway shift back and forth by 5-15 percent after each restart, which creates lot of noise when comparing figures with and without fix. Use this GUC enable_fix to enable/disable the fix. [1] https://www.postgresql.org/message-id/CA%2BhUKGLrN2M18-hACEJbNoj2sn_WoUj9rkkBeoPK7SY427pAnA%40mail.gmail.com -- Thanks, -Amit Khandekar Huawei Technologies
Attachment
pgsql-hackers by date: