Re: Use virtual tuple slot for Unique node - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Use virtual tuple slot for Unique node |
Date | |
Msg-id | CAApHDvpyi8SqrJG6wU3fkHB-Kb0k5mdv88h7Y90ND9oQ9GbtnA@mail.gmail.com Whole thread Raw |
In response to | Re: Use virtual tuple slot for Unique node (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Use virtual tuple slot for Unique node
|
List | pgsql-hackers |
On Wed, 25 Oct 2023 at 22:48, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > We may save the size of data in VirtualTupleTableSlot, thus avoiding > the first loop. I assume that when allocating > VirtualTupleTableSlot->data, we always know what size we are > allocating so it should be just a matter of saving it in > VirtualTupleTableSlot->size. This should avoid the first loop in > tts_virtual_materialize() and give some speed up. We will need a loop > to repoint non-byval Datums. I imagine that the pointers to non-byval > Datums can be computed as dest_slot->tts_values[natts] = > dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data). > This would work as long as all the non-byval datums in the source slot > are all stored flattened in source slot's data. I am assuming that > that would be true in a materialized virtual slot. The byval datums > are copied as is. I think, this will avoid multiple memcpy calls, one > per non-byval attribute and hence some speedup. I may be wrong though. hmm, do you think it's common enough that we copy an already materialised virtual slot? I tried adding the following code totts_virtual_copyslot and didn't see the NOTICE message when running each of your test queries. "make check" also worked without anything failing after adjusting nodeUnique.c to always use a TTSOpsVirtual slot. + if (srcslot->tts_ops == &TTSOpsVirtual && TTS_SHOULDFREE(srcslot)) + elog(NOTICE, "We copied a materialized virtual slot!"); I did get a failure in postgres_fdw's tests: server loopback options (table_name 'tab_batch_sharded_p1_remote'); insert into tab_batch_sharded select * from tab_batch_local; +NOTICE: We copied a materialized virtual slot! +NOTICE: We copied a materialized virtual slot! so I think it's probably not that common that we'd gain from that optimisation. What might buy us a bit more would be to get rid of the for loop inside tts_virtual_copyslot() and copy the guts of tts_virtual_materialize() into tts_virtual_copyslot() and set the dstslot tts_isnull and tts_values arrays in the same loop that we use to calculate the size. I tried that in the attached patch and then tested it alongside the patch that changes the slot type. master = 74604a37f 1 = [1] 2 = optimize_tts_virtual_copyslot.patch Using the script from [2] and the setup from [3] but reduced to 10k tuples instead of 1 million. Times the average query time in milliseconds for a 60 second pgbench run. query master master+1 master+1+2 m/m+1 m/m+1+2 Q1 2.616 2.082 1.903 125.65% 137.47% Q2 9.479 10.593 10.361 89.48% 91.49% Q3 10.329 10.357 10.627 99.73% 97.20% Q4 7.272 7.306 6.941 99.53% 104.77% Q5 7.597 7.043 6.645 107.87% 114.33% Q6 162.177 161.037 162.807 100.71% 99.61% Q7 59.288 59.43 61.294 99.76% 96.73% 103.25% 105.94% I only expect Q2 and Q3 to gain from this. Q1 shouldn't improve but did, so the results might not be stable enough to warrant making any decisions from. I was uncertain if the old behaviour of when srcslot contains fewer attributes than dstslot was intended or not. What happens there is that we'd leave the additional old dstslot tts_values in place and only overwrite up to srcslot->natts but then we'd go on and materialize all the dstslot attributes. I think this might not be needed as we do dstslot->tts_nvalid = srcdesc->natts. I suspect we may be ok just to materialize the srcslot attributes and ignore the previous additional dstslot attributes. Changing it to that would make the attached patch more simple. David [1] https://www.postgresql.org/message-id/attachment/151110/use_subnode_slot_type_for_nodeunique.patch [2] https://www.postgresql.org/message-id/attachment/151342/uniquebench.sh.txt [3] https://www.postgresql.org/message-id/CAExHW5uhTMdkk26oJg9f2ZVufbi5J4Lquj79MdSO%2BipnGJ_muw%40mail.gmail.com
Attachment
pgsql-hackers by date: