Thread: agg/order-by question

agg/order-by question

From

Sailesh Krishnamurthy

Date:

12 July 2003, 04:40:43

Consider the explain for the following queries ..

sample=# explain select a, count(*) from foo group by a order by a;                              QUERY PLAN
-------------------------------------------------------------------------Aggregate  (cost=69.83..77.33 rows=100
width=4) ->  Group  (cost=69.83..74.83 rows=1000 width=4)        ->  Sort  (cost=69.83..72.33 rows=1000 width=4)
     Sort Key: a              ->  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=4)
 
(5 rows)
sample=# explain select a, count(*) from foo group by a order by a desc;                                 QUERY PLAN
-------------------------------------------------------------------------------Sort  (cost=80.65..80.90 rows=100
width=4) Sort Key: a  ->  Aggregate  (cost=69.83..77.33 rows=100 width=4)        ->  Group  (cost=69.83..74.83
rows=1000width=4)              ->  Sort  (cost=69.83..72.33 rows=1000 width=4)                    Sort Key: a
        ->  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=4)
 
(7 rows)

In the first case pgsql doesn't have a Sort on top because the Sort
below the Group produces the right "interesting order" (using the
System-R term). In the second case however, since the order-by clause
demands "desc" there is a Sort tagged on on top. 

Now, instead of doing this, isn't it better to just have a similar
plan as in the first case, but just change the lower Sort to be
descending ? It doesn't affect the Group and the Agg in any way ..

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

Re: agg/order-by question

From

Bruno Wolff III

Date:

12 July 2003, 10:38:55

On Sat, Jul 12, 2003 at 00:39:06 -0700, Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> wrote:
> 
> Consider the explain for the following queries ..
> 
> sample=# explain select a, count(*) from foo group by a order by a;
>                                QUERY PLAN
> -------------------------------------------------------------------------
>  Aggregate  (cost=69.83..77.33 rows=100 width=4)
>    ->  Group  (cost=69.83..74.83 rows=1000 width=4)
>          ->  Sort  (cost=69.83..72.33 rows=1000 width=4)
>                Sort Key: a
>                ->  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=4)
> (5 rows)
>  
> sample=# explain select a, count(*) from foo group by a order by a desc;
>                                   QUERY PLAN
> -------------------------------------------------------------------------------
>  Sort  (cost=80.65..80.90 rows=100 width=4)
>    Sort Key: a
>    ->  Aggregate  (cost=69.83..77.33 rows=100 width=4)
>          ->  Group  (cost=69.83..74.83 rows=1000 width=4)
>                ->  Sort  (cost=69.83..72.33 rows=1000 width=4)
>                      Sort Key: a
>                      ->  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=4)
> (7 rows)
>  
> 
> In the first case pgsql doesn't have a Sort on top because the Sort
> below the Group produces the right "interesting order" (using the
> System-R term). In the second case however, since the order-by clause
> demands "desc" there is a Sort tagged on on top. 
> 
> Now, instead of doing this, isn't it better to just have a similar
> plan as in the first case, but just change the lower Sort to be
> descending ? It doesn't affect the Group and the Agg in any way ..

You might try this in 7.4. I am pretty sure a change was made a couple
of weeks ago to let group by work with either sort order. Also hash
aggragates have been available for quite a while in 7.4. This is a better
plan when there are only a small number of distinct values.

Re: agg/order-by question

From

Sailesh Krishnamurthy

Date:

12 July 2003, 17:15:10

>>>>> "Bruno" == Bruno Wolff, <Bruno> writes:
   Bruno> You might try this in 7.4. I am pretty sure a change was   Bruno> made a couple of weeks ago to let group by
workwith either   Bruno> sort order. Also hash aggragates have been available for   Bruno> quite a while in 7.4. This
isa better plan when there are   Bruno> only a small number of distinct values.
 

Gotcha ! Thanks. 

TelegraphCQ is still on the 7.3.2 code base .. after doing one hellish
merge in March, I'm not too eager to do another, although merging more
often is likely going to be less painful.

I knew about the hash-aggregates - we had set spilling of
hash-aggregates to disk for large number of distinct values (with a
crude form of recursive partitioning) as a course project for our
undergraduate database class at Berkeley. When I get some time, I want
to clean up my solution code and contribute it as a patch. I don't
think that will be before the end of summer though.

BTW, some systems prefer sorted grouped-aggregates to hashed
grouped-aggregates - even for small distinct values. How it works is
to just update the running aggregates in place in the sort tournament
tree. The only requirement is to be able to compute aggregates of
aggregates, so that partial aggs for the same distinct values across
different runs can be merged. The advantage is that you get sorted
grouped aggregation for the same cost of unsorted hash-grouped
agg. The disadvantage is that you lose the modularity of the sort.


-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh