Home > mailing lists

Re: Planning aggregates which require sorted or distinct input - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Planning aggregates which require sorted or distinct input
Date	January 19, 2007 14:12:36
Msg-id	17421.1169230347@sss.pgh.pa.us Whole thread Raw
In response to	Planning aggregates which require sorted or distinct input (Gavin Sherry <swm@alcove.com.au>)
Responses	Re: Planning aggregates which require sorted or distinct
List	pgsql-hackers

Tree view

Gavin Sherry <swm@alcove.com.au> writes:
> What we want to do is have a kind of 'sub plan' for each aggregate. In
> effect, the plan might start looking like a directed graph.  Here is part
> of the plan as a directed graph.

>                        GroupAggregate
>               /-----------------^---------------...
>               |                 |
>               |                 |
>               ^                 |
>               |               Unique
>               |                 ^
>               |                 |
>             Sort               Sort
>           (saledate)    (saledate,prodid)
>               ^                 ^
>               |                 |
>               -------------- Fan Out ------------...
>                                 ^
>                                 |
>                                Scan

> This idea was presented by Brian Hagenbuch at Greenplum. He calls it a
> 'Fan Out' plan. It is trivial to rejoin the data because all data input to
> the aggregates is sorted by the same primary key.

Er, what primary key would that be exactly?  And even if you had a key,
I wouldn't call joining on it trivial; I'd call it expensive ...

Still, it looks better than your "pipeline" idea which is even more full
of handwaving --- the problem with that one is that you're either
duplicating the earlier aggregates' results a lot of times, or you've
got different numbers of rows for different columns at various steps of
the pipeline.

I'd stick with the fanout idea but work on some way to keep related rows
together that doesn't depend on untenable assumptions like having a
primary key.

When I've thought about this in the past, I had in mind leaving the plan
structure pretty much as it is, but making the planner concern itself
with the properties of individual aggregates more than it does now ---
eg, mark DISTINCT aggregates as to whether they should use sorting or
hashing, or mark that they can assume pre-sorted input.  Perhaps this
is another way of describing what you call a fan-out plan.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 19 January 2007, 13:13:43
Subject: Re: Windows buildfarm failures

From: Tom Lane
Date: 19 January 2007, 14:35:10
Subject: Re: Windows buildfarm failures

Re: Planning aggregates which require sorted or distinct input - Mailing list pgsql-hackers

Previous

Next