Re: Use virtual tuple slot for Unique node - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Use virtual tuple slot for Unique node
Date
Msg-id CAExHW5uhTMdkk26oJg9f2ZVufbi5J4Lquj79MdSO+ipnGJ_muw@mail.gmail.com
Whole thread Raw
In response to Re: Use virtual tuple slot for Unique node  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use virtual tuple slot for Unique node
List pgsql-hackers
On Tue, Oct 10, 2023 at 2:23 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 27 Sept 2023 at 20:01, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > On Sat, 23 Sept 2023 at 03:15, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > > So not a win in this case. Could you peek at the outer slot type, and
> > > use the same kind of slot for the Unique's result? Or some more
> > > complicated logic, like use a virtual slot if all the values are
> > > pass-by-val? I'd also like to keep this simple, though...
> > >
> > > Would this kind of optimization make sense elsewhere?
> >
> > There are a few usages of ExecGetResultSlotOps(). e.g ExecInitHashJoin().
> >
> > If I adjust the patch to:
> >
> > -       ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsMinimalTuple);
> > +       ExecInitResultTupleSlotTL(&uniquestate->ps,
> > +
> > ExecGetResultSlotOps(outerPlanState(uniquestate),
> > +
> >                             NULL));
>
> Just to keep this from going cold, here's that in patch form for
> anyone who wants to test.

Thanks.

I don't recollect why we chose MinimalTupleSlot here - may be because
we expected the underlying node to always produce a minimal tupe. But
Unique node copies the tuple returned by the underlying node. This
copy is carried out by the TupleTableSlot specific copy function
copyslot. Every implementation of this function first converts the
source slot tuple into the required form and then copies it. Having
both the TupleTableSlots, ouput slot from the underlying node and the
output slot of Unique node, of the same type avoids the first step and
just copies the slot. It makes sense that it performs better. The code
looks fine to me.

>
> I spent a bit more time running some more benchmarks and I don't see
> any workload where it slows things down.  I'd be happy if someone else
> had a go at finding a regression.

I built on your experiments and I might have found a minor regression.

Setup
=====
drop table if exists t_int;
create table t_int(a int, b int);
insert into t_int select x, x from generate_series(1,1000000)x;
create index on t_int (a,b);
vacuum analyze t_int;

drop table if exists t_text;
create table t_text(a text, b text);
insert into t_text select lpad(x::text, 1000, '0'), x::text from
generate_series(1,1000000)x;
create index on t_text (a,b);
vacuum analyze t_text;

drop table if exists t_mixed; -- this one is new but it doesn't matter much
create table t_mixed(a text, b int);
insert into t_mixed select lpad(x::text, 1000, '0'), x from
generate_series(1,1000000)x;
create index on t_mixed (a,b);
vacuum analyze t_mixed;

Queries and measurements (average execution time from 3 runs - on my
Thinkpad T490)
======================
Q1 select distinct a,b from t_int';
HEAD: 544.45 ms
patched: 381.55 ms

Q2 select distinct a,b from t_text
HEAD: 609.90 ms
patched: 513.42 ms

Q3 select distinct a,b from t_mixed
HEAD: 626.80 ms
patched: 468.22 ms

The more the pass by ref data, more memory is allocated which seems to
reduce the gain by this patch.
Above nodes use Buffer or HeapTupleTableSlot.
Try some different nodes which output minimal or virtual TTS.

set enable_hashagg to off;
Q4 select distinct a,b from (select sum(a) over (order by a rows 2
preceding) a, b from t_int) q
HEAD: 2529.58 ms
patched: 2332.23

Q5 select distinct a,b from (select sum(a) over (order by a rows 2
preceding) a, b from t_int order by a, b) q
HEAD: 2633.69 ms
patched: 2255.99 ms


Q6 select distinct a,b from (select string_agg(a, ', ') over (order by
a rows 2 preceding) a, b from t_text) q
HEAD: 108589.85 ms
patched: 107226.82 ms

Q7 select distinct a,b from (select string_agg(left(a, 100), ', ')
over (order by a rows 2 preceding) a, b from t_text) q
HEAD: 16070.62 ms
patched: 16182.16 ms

This one is surprising though. May be the advantage of using the same
tuple table slot is so narrow when large data needs to be copied that
the execution times almost match. The patched and unpatched execution
times differ by the margin of error either way.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Benoit Lobréau
Date:
Subject: Re: Logging parallel worker draught
Next
From: Heikki Linnakangas
Date:
Subject: Re: Special-case executor expression steps for common combinations