Re: Use virtual tuple slot for Unique node - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Use virtual tuple slot for Unique node |
Date | |
Msg-id | CAExHW5uhTMdkk26oJg9f2ZVufbi5J4Lquj79MdSO+ipnGJ_muw@mail.gmail.com Whole thread Raw |
In response to | Re: Use virtual tuple slot for Unique node (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use virtual tuple slot for Unique node
|
List | pgsql-hackers |
On Tue, Oct 10, 2023 at 2:23 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 27 Sept 2023 at 20:01, David Rowley <dgrowleyml@gmail.com> wrote: > > > > On Sat, 23 Sept 2023 at 03:15, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > So not a win in this case. Could you peek at the outer slot type, and > > > use the same kind of slot for the Unique's result? Or some more > > > complicated logic, like use a virtual slot if all the values are > > > pass-by-val? I'd also like to keep this simple, though... > > > > > > Would this kind of optimization make sense elsewhere? > > > > There are a few usages of ExecGetResultSlotOps(). e.g ExecInitHashJoin(). > > > > If I adjust the patch to: > > > > - ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsMinimalTuple); > > + ExecInitResultTupleSlotTL(&uniquestate->ps, > > + > > ExecGetResultSlotOps(outerPlanState(uniquestate), > > + > > NULL)); > > Just to keep this from going cold, here's that in patch form for > anyone who wants to test. Thanks. I don't recollect why we chose MinimalTupleSlot here - may be because we expected the underlying node to always produce a minimal tupe. But Unique node copies the tuple returned by the underlying node. This copy is carried out by the TupleTableSlot specific copy function copyslot. Every implementation of this function first converts the source slot tuple into the required form and then copies it. Having both the TupleTableSlots, ouput slot from the underlying node and the output slot of Unique node, of the same type avoids the first step and just copies the slot. It makes sense that it performs better. The code looks fine to me. > > I spent a bit more time running some more benchmarks and I don't see > any workload where it slows things down. I'd be happy if someone else > had a go at finding a regression. I built on your experiments and I might have found a minor regression. Setup ===== drop table if exists t_int; create table t_int(a int, b int); insert into t_int select x, x from generate_series(1,1000000)x; create index on t_int (a,b); vacuum analyze t_int; drop table if exists t_text; create table t_text(a text, b text); insert into t_text select lpad(x::text, 1000, '0'), x::text from generate_series(1,1000000)x; create index on t_text (a,b); vacuum analyze t_text; drop table if exists t_mixed; -- this one is new but it doesn't matter much create table t_mixed(a text, b int); insert into t_mixed select lpad(x::text, 1000, '0'), x from generate_series(1,1000000)x; create index on t_mixed (a,b); vacuum analyze t_mixed; Queries and measurements (average execution time from 3 runs - on my Thinkpad T490) ====================== Q1 select distinct a,b from t_int'; HEAD: 544.45 ms patched: 381.55 ms Q2 select distinct a,b from t_text HEAD: 609.90 ms patched: 513.42 ms Q3 select distinct a,b from t_mixed HEAD: 626.80 ms patched: 468.22 ms The more the pass by ref data, more memory is allocated which seems to reduce the gain by this patch. Above nodes use Buffer or HeapTupleTableSlot. Try some different nodes which output minimal or virtual TTS. set enable_hashagg to off; Q4 select distinct a,b from (select sum(a) over (order by a rows 2 preceding) a, b from t_int) q HEAD: 2529.58 ms patched: 2332.23 Q5 select distinct a,b from (select sum(a) over (order by a rows 2 preceding) a, b from t_int order by a, b) q HEAD: 2633.69 ms patched: 2255.99 ms Q6 select distinct a,b from (select string_agg(a, ', ') over (order by a rows 2 preceding) a, b from t_text) q HEAD: 108589.85 ms patched: 107226.82 ms Q7 select distinct a,b from (select string_agg(left(a, 100), ', ') over (order by a rows 2 preceding) a, b from t_text) q HEAD: 16070.62 ms patched: 16182.16 ms This one is surprising though. May be the advantage of using the same tuple table slot is so narrow when large data needs to be copied that the execution times almost match. The patched and unpatched execution times differ by the margin of error either way. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: