Re: Use virtual tuple slot for Unique node - Mailing list pgsql-hackers

From David Rowley
Subject Re: Use virtual tuple slot for Unique node
Date
Msg-id CAApHDvpyi8SqrJG6wU3fkHB-Kb0k5mdv88h7Y90ND9oQ9GbtnA@mail.gmail.com
Whole thread Raw
In response to Re: Use virtual tuple slot for Unique node  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Use virtual tuple slot for Unique node
List pgsql-hackers
On Wed, 25 Oct 2023 at 22:48, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> We may save the size of data in VirtualTupleTableSlot, thus avoiding
> the first loop. I assume that when allocating
> VirtualTupleTableSlot->data, we always know what size we are
> allocating so it should be just a matter of saving it in
> VirtualTupleTableSlot->size. This should avoid the first loop in
> tts_virtual_materialize() and give some speed up. We will need a loop
> to repoint non-byval Datums. I imagine that the pointers to non-byval
> Datums can be computed as dest_slot->tts_values[natts] =
> dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data).
> This would work as long as all the non-byval datums in the source slot
> are all stored flattened in source slot's data. I am assuming that
> that would be true in a materialized virtual slot. The byval datums
> are copied as is. I think, this will avoid multiple memcpy calls, one
> per non-byval attribute and hence some speedup. I may be wrong though.

hmm, do you think it's common enough that we copy an already
materialised virtual slot?

I tried adding the following code totts_virtual_copyslot and didn't
see the NOTICE message when running each of your test queries. "make
check" also worked without anything failing after adjusting
nodeUnique.c to always use a TTSOpsVirtual slot.

+       if (srcslot->tts_ops == &TTSOpsVirtual && TTS_SHOULDFREE(srcslot))
+               elog(NOTICE, "We copied a materialized virtual slot!");

I did get a failure in postgres_fdw's tests:

   server loopback options (table_name 'tab_batch_sharded_p1_remote');
 insert into tab_batch_sharded select * from tab_batch_local;
+NOTICE:  We copied a materialized virtual slot!
+NOTICE:  We copied a materialized virtual slot!

so I think it's probably not that common that we'd gain from that optimisation.

What might buy us a bit more would be to get rid of the for loop
inside tts_virtual_copyslot() and copy the guts of
tts_virtual_materialize() into tts_virtual_copyslot() and set the
dstslot tts_isnull and tts_values arrays in the same loop that we use
to calculate the size.

I tried that in the attached patch and then tested it alongside the
patch that changes the slot type.

master = 74604a37f
1 = [1]
2 = optimize_tts_virtual_copyslot.patch

Using the script from [2] and the setup from [3] but reduced to 10k
tuples instead of 1 million.

Times the average query time in milliseconds for a 60 second pgbench run.

query    master      master+1        master+1+2    m/m+1         m/m+1+2
Q1        2.616         2.082                   1.903       125.65%
    137.47%
Q2        9.479        10.593                10.361         89.48%
     91.49%
Q3        10.329      10.357                10.627         99.73%
    97.20%
Q4        7.272          7.306                 6.941          99.53%
     104.77%
Q5        7.597          7.043                 6.645        107.87%
     114.33%
Q6        162.177  161.037              162.807        100.71%         99.61%
Q7        59.288      59.43                  61.294          99.76%
     96.73%

     103.25%        105.94%

I only expect Q2 and Q3 to gain from this. Q1 shouldn't improve but
did, so the results might not be stable enough to warrant making any
decisions from.

I was uncertain if the old behaviour of when srcslot contains fewer
attributes than dstslot was intended or not.  What happens there is
that we'd leave the additional old dstslot tts_values in place and
only overwrite up to srcslot->natts but then we'd go on and
materialize all the dstslot attributes. I think this might not be
needed as we do dstslot->tts_nvalid = srcdesc->natts. I suspect we may
be ok just to materialize the srcslot attributes and ignore the
previous additional dstslot attributes.  Changing it to that would
make the attached patch more simple.

David

[1] https://www.postgresql.org/message-id/attachment/151110/use_subnode_slot_type_for_nodeunique.patch
[2] https://www.postgresql.org/message-id/attachment/151342/uniquebench.sh.txt
[3] https://www.postgresql.org/message-id/CAExHW5uhTMdkk26oJg9f2ZVufbi5J4Lquj79MdSO%2BipnGJ_muw%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Next
From: Bharath Rupireddy
Date:
Subject: Re: A recent message added to pg_upgade