Re: Use virtual tuple slot for Unique node - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Use virtual tuple slot for Unique node |
Date | |
Msg-id | CAExHW5vLUg=OG-QWv+3t1cSwPj9_iDS9n3kyt7bQyvpj+M8oJQ@mail.gmail.com Whole thread Raw |
In response to | Re: Use virtual tuple slot for Unique node (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use virtual tuple slot for Unique node
|
List | pgsql-hackers |
On Tue, Oct 24, 2023 at 4:30 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 20 Oct 2023 at 22:30, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > I ran my experiments again. It seems on my machine the execution times > > do vary a bit. I ran EXPLAIN ANALYZE on the query 5 times and took > > average of execution times. I did this three times. For each run the > > standard deviation was within 2%. Here are the numbers > > master: 13548.33, 13878.88, 14572.52 > > master + 0001: 13734.58, 14193.83, 14574.73 > > > > So for me, I would say, this particular query performs the same with > > or without patch. > > I'm not really sure which of the 7 queries you're referring to here. > The times you're quoting seem to align best to Q7 from your previous > results, so I'll assume you mean Q7. > > I'm not really concerned with Q7 as both patched and unpatched use > TTSOpsMinimalTuple. It's Q7. Yes. I was responding to your statement: " However, Q7 does seem to be above noise level faster and I'm not sure why.". Anyway, we can set that aside. > > I also think you need to shrink the size of your benchmark down. With > 1 million tuples, you're more likely to be also measuring the time it > takes to get cache lines from memory into the CPU. A smaller scale > test will make this less likely. Also, you'd be better timing SELECT > rather than the time it takes to EXPLAIN ANALYZE. They're not the same > thing. EXPLAIN ANALYZE has additional timing going on and we may end > up not de-toasting toasted Datums. I ran experiments with 10K rows and measured timing using \timing in psql. The measurements are much more flaky than a larger set of rows and EXPLAIN ANALYZE. But I think your observations are good enough. > > > On Thu, Oct 19, 2023 at 4:26 PM David Rowley <dgrowleyml@gmail.com> wrote: > > > We can see that Q2 and Q3 become a bit slower. This makes sense as > > > tts_virtual_materialize() is quite a bit more complex than > > > heap_copy_minimal_tuple() which is a simple palloc/memcpy. > > > > > > > If the source slot is a materialized virtual slot, > > tts_virtual_copyslot() could perform a memcpy of the materialized data > > itself rather than materialising from datums. That might be more > > efficient. > > I think you're talking about just performing a memcpy() of the > VirtualTupleTableSlot->data field. Unfortunately, you'd not be able > to do just that as you'd also need to repoint the non-byval Datums in > tts_values at the newly memcpy'd memory. If you skipped that part, > those would remain pointing to the original memory. If that memory > goes away, then bad things will happen. I think you'd still need to do > the 2nd loop in tts_virtual_materialize() Yes, we will need repoint non-byval Datums ofc. > > > May be we should fix the above said inefficiency in > > tt_virtual_copyslot()? > > I don't have any bright ideas on how to make tts_virtual_materialize() > itself faster. If there were some way to remember !attbyval > attributes for the 2nd loop, that might be good, but creating > somewhere to store that might result in further overheads. We may save the size of data in VirtualTupleTableSlot, thus avoiding the first loop. I assume that when allocating VirtualTupleTableSlot->data, we always know what size we are allocating so it should be just a matter of saving it in VirtualTupleTableSlot->size. This should avoid the first loop in tts_virtual_materialize() and give some speed up. We will need a loop to repoint non-byval Datums. I imagine that the pointers to non-byval Datums can be computed as dest_slot->tts_values[natts] = dest_vslot->data + (src_slot->tts_values[natts] - src_vslot->data). This would work as long as all the non-byval datums in the source slot are all stored flattened in source slot's data. I am assuming that that would be true in a materialized virtual slot. The byval datums are copied as is. I think, this will avoid multiple memcpy calls, one per non-byval attribute and hence some speedup. I may be wrong though. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: