OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets - Mailing list pgsql-hackers

From Robert Bedell
Subject OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Date
Msg-id 200312171656870.SM00984@xavier
Whole thread Raw
Responses Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
List pgsql-hackers
I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL.  I don't actually own the proper SQL99
specifications, and since the newer sql2003 ones are coming out I don't know
which to purchase.  Could someone point me in the right direction?  I've
looked in the mailing lists and the docs and found some interest in olap
like functionality, but not like what I found in other databases (such as
Oracle and DB2).

More specifically I would like to add grouping sets, and the CUBE and ROLLUP
operators, into postgresql.  Since modifying such the GROUP BY operation
would necessitate changing the query structure, wouldn't that affect the
query rewrites and genetic optimizer?  On a superficial level yes, but would
any existing query rewriting care about additional grouping being done in a
GROUP BY operation?  More specifically, what would changing the query
structure affect by making the GROUP BY clause a list of lists (the list of
the grouping sets) instead of an expression list as it currently is?

An example of a ROLLUP result might be (pseudoresults..):

CREATE TABLE SALES_SUMMARY (NAME TEXT, DEPARTMENT TEXT, SALES INTEGER);
-- populate with data

SELECT DEPARTMENT, NAME, SUM(SALES) FROM SALES_SUMMARY GROUP BY
ROLLUP(DEPARTMENT,NAME);

DEPARTMENT     NAME     SUM(SALES)
-------------- -------- ----------
Dept one       Bob      13
Dept one       Jeff     12
Dept one       NULL     25
Dept two       Jim      10
Dept two       Mary     11
Dept two       NULL     21
NULL           NULL     46

-- Where the rows with NULLs represent the subtotals and grandtotals,
respectively.

Any thoughts?

Along the same vein, window partitioning would be nice for aggregates.
Aggregate expressions like: RANK() OVER (PARTITION BY DEPARTMENT ORDER BY
SALARY DESC).  They get rid of a lot of subselect operations quite nicely.

These are not simple projects, I know.  There are a lot of features in high
databases I would like to have in open source tools, and I would like to
make a contribution towards getting them there ;)

PS - ...no, I won't even mention materialized views...

-----------------
Robert Bedell



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: time interval behaviour seems odd
Next
From: Andrew Dunstan
Date:
Subject: Re: TODO list