Re: Planning aggregates which require sorted or distinct input - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Planning aggregates which require sorted or distinct input
Date
Msg-id 17421.1169230347@sss.pgh.pa.us
Whole thread Raw
In response to Planning aggregates which require sorted or distinct input  (Gavin Sherry <swm@alcove.com.au>)
Responses Re: Planning aggregates which require sorted or distinct  (Gavin Sherry <swm@alcove.com.au>)
List pgsql-hackers
Gavin Sherry <swm@alcove.com.au> writes:
> What we want to do is have a kind of 'sub plan' for each aggregate. In
> effect, the plan might start looking like a directed graph.  Here is part
> of the plan as a directed graph.

>                        GroupAggregate
>               /-----------------^---------------...
>               |                 |
>               |                 |
>               ^                 |
>               |               Unique
>               |                 ^
>               |                 |
>             Sort               Sort
>           (saledate)    (saledate,prodid)
>               ^                 ^
>               |                 |
>               -------------- Fan Out ------------...
>                                 ^
>                                 |
>                                Scan

> This idea was presented by Brian Hagenbuch at Greenplum. He calls it a
> 'Fan Out' plan. It is trivial to rejoin the data because all data input to
> the aggregates is sorted by the same primary key.

Er, what primary key would that be exactly?  And even if you had a key,
I wouldn't call joining on it trivial; I'd call it expensive ...

Still, it looks better than your "pipeline" idea which is even more full
of handwaving --- the problem with that one is that you're either
duplicating the earlier aggregates' results a lot of times, or you've
got different numbers of rows for different columns at various steps of
the pipeline.

I'd stick with the fanout idea but work on some way to keep related rows
together that doesn't depend on untenable assumptions like having a
primary key.

When I've thought about this in the past, I had in mind leaving the plan
structure pretty much as it is, but making the planner concern itself
with the properties of individual aggregates more than it does now ---
eg, mark DISTINCT aggregates as to whether they should use sorting or
hashing, or mark that they can assume pre-sorted input.  Perhaps this
is another way of describing what you call a fan-out plan.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Windows buildfarm failures
Next
From: Tom Lane
Date:
Subject: Re: Windows buildfarm failures