Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Date
Msg-id 1071703480.3831.24.camel@fuji.krosing.net
Whole thread Raw
In response to Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane kirjutas N, 18.12.2003 kell 00:27:
> "Robert Bedell" <robert@friendlygenius.com> writes:
> > I'm curious if anyone has ever looked into adding OLAP functionality (per
> > the SQL99 specs) into PostGreSQL.

As a first project one could think of implementing NULLS FIRST/LAST
(from 4.14.9) for all ORDER BY operations.

> There was a fairly crude CUBE implementation submitted (and rejected) a
> few months ago, but there's not been any work I thought had a chance of
> getting committed.  This is an issue of implementation quality rather
> than whether we want the feature --- I think the community is interested
> in adding any and all features that are in SQL99.
> 
> > More specifically I would like to add grouping sets, and the CUBE and ROLLUP
> > operators, into postgresql.  Since modifying such the GROUP BY operation
> > would necessitate changing the query structure, wouldn't that affect the
> > query rewrites and genetic optimizer?
> 
> I don't think either the rewriter or GEQO would notice at all.  The
> regular optimizer definitely would though.

If it would mess up the optimiser, then could the extra aggregators not
be put in after optimisations, at least for hash aggregators ?

The implementation using hash aggregators should be just a SMOP
http://people.kldp.org/~eunjea/jargon/?idx=SMOP.html ;)

While ROLLUP could easily be implemented on the same scan over sorted
data as an ordinary GROUP BY by adding extra aggregators for partial
matches, CUBE and arbitrary grouping sets can't.

ie the query

SELECT SUM(a)
GROUP BY ROLLUP (b,c)

if run over a sorted set, would have a distinct aggregator for each of
(b,c), (b,) and (,) which can be reinitialised/reused every time b,c or
b changes. CUBE(b,c) would need hash aggregators for at least (,c).

> > These are not simple projects, I know.
> 
> Might be a tad ambitious for your first venture into backend hacking...

OTOH it may be quite easy to *test* implementation ideas by always
expand GROUP BY into ROLLUP or CUBE by adding the aggregators just
before the executor, without touching the parser and pre-optimisation
query tree at all.

--------------
Hannu






pgsql-hackers by date:

Previous
From: David Felstead
Date:
Subject: Re: TODO list
Next
From: Hannu Krosing
Date:
Subject: Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets