Re: Use virtual tuple slot for Unique node - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Use virtual tuple slot for Unique node |
Date | |
Msg-id | CAExHW5uku=D2Dy6ST4JHO1sokZF8arecymk2RksioKk2BGb4UA@mail.gmail.com Whole thread Raw |
In response to | Re: Use virtual tuple slot for Unique node (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use virtual tuple slot for Unique node
|
List | pgsql-hackers |
On Fri, Oct 27, 2023 at 8:48 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 25 Oct 2023 at 22:48, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > We may save the size of data in VirtualTupleTableSlot, thus avoiding > > the first loop. I assume that when allocating > > VirtualTupleTableSlot->data, we always know what size we are > > allocating so it should be just a matter of saving it in > > VirtualTupleTableSlot->size. This should avoid the first loop in > > tts_virtual_materialize() and give some speed up. We will need a loop > > to repoint non-byval Datums. I imagine that the pointers to non-byval > > Datums can be computed as dest_slot->tts_values[natts] = > > dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data). > > This would work as long as all the non-byval datums in the source slot > > are all stored flattened in source slot's data. I am assuming that > > that would be true in a materialized virtual slot. The byval datums > > are copied as is. I think, this will avoid multiple memcpy calls, one > > per non-byval attribute and hence some speedup. I may be wrong though. > > hmm, do you think it's common enough that we copy an already > materialised virtual slot? > > I tried adding the following code totts_virtual_copyslot and didn't > see the NOTICE message when running each of your test queries. "make > check" also worked without anything failing after adjusting > nodeUnique.c to always use a TTSOpsVirtual slot. > > + if (srcslot->tts_ops == &TTSOpsVirtual && TTS_SHOULDFREE(srcslot)) > + elog(NOTICE, "We copied a materialized virtual slot!"); > > I did get a failure in postgres_fdw's tests: > > server loopback options (table_name 'tab_batch_sharded_p1_remote'); > insert into tab_batch_sharded select * from tab_batch_local; > +NOTICE: We copied a materialized virtual slot! > +NOTICE: We copied a materialized virtual slot! > > so I think it's probably not that common that we'd gain from that optimisation. Thanks for this analysis. If we aren't copying a materialized virtual slot often, no point in adding that optimization. > > What might buy us a bit more would be to get rid of the for loop > inside tts_virtual_copyslot() and copy the guts of > tts_virtual_materialize() into tts_virtual_copyslot() and set the > dstslot tts_isnull and tts_values arrays in the same loop that we use > to calculate the size. > > I tried that in the attached patch and then tested it alongside the > patch that changes the slot type. > > master = 74604a37f > 1 = [1] > 2 = optimize_tts_virtual_copyslot.patch > > Using the script from [2] and the setup from [3] but reduced to 10k > tuples instead of 1 million. > > Times the average query time in milliseconds for a 60 second pgbench run. > > query master master+1 master+1+2 m/m+1 m/m+1+2 > Q1 2.616 2.082 1.903 125.65% > 137.47% > Q2 9.479 10.593 10.361 89.48% > 91.49% > Q3 10.329 10.357 10.627 99.73% > 97.20% > Q4 7.272 7.306 6.941 99.53% > 104.77% > Q5 7.597 7.043 6.645 107.87% > 114.33% > Q6 162.177 161.037 162.807 100.71% 99.61% > Q7 59.288 59.43 61.294 99.76% > 96.73% > > 103.25% 105.94% > > I only expect Q2 and Q3 to gain from this. Q1 shouldn't improve but > did, so the results might not be stable enough to warrant making any > decisions from. I am actually surprised to see that eliminating loop is showing improvements. There aren't hundreds of attributes involved in those queries. So I share your stability concerns. And even with these patches, Q2 and Q3 are still slower. Q1 is consistently giving performance > 25% for both of us. But other queries aren't showing a whole lot improvement. So I am wondering whether it's worth pursuing this change; similar to the opinion you expressed a few emails earlier. > > I was uncertain if the old behaviour of when srcslot contains fewer > attributes than dstslot was intended or not. What happens there is > that we'd leave the additional old dstslot tts_values in place and > only overwrite up to srcslot->natts but then we'd go on and > materialize all the dstslot attributes. I think this might not be > needed as we do dstslot->tts_nvalid = srcdesc->natts. I suspect we may > be ok just to materialize the srcslot attributes and ignore the > previous additional dstslot attributes. Changing it to that would > make the attached patch more simple. We seem to use both tt_nvalid and tts_tupleDescriptor->natts. I forgot what's the difference. If we do what you say, we might end up trying to access unmaterialized values beyond tts_nvalid. Better to investigate whether such a hazard exists. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: