Thread: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

17 December 2003, 17:57:59

I'm curious if anyone has ever looked into adding OLAP functionality (per
the SQL99 specs) into PostGreSQL.  I don't actually own the proper SQL99
specifications, and since the newer sql2003 ones are coming out I don't know
which to purchase.  Could someone point me in the right direction?  I've
looked in the mailing lists and the docs and found some interest in olap
like functionality, but not like what I found in other databases (such as
Oracle and DB2).

More specifically I would like to add grouping sets, and the CUBE and ROLLUP
operators, into postgresql.  Since modifying such the GROUP BY operation
would necessitate changing the query structure, wouldn't that affect the
query rewrites and genetic optimizer?  On a superficial level yes, but would
any existing query rewriting care about additional grouping being done in a
GROUP BY operation?  More specifically, what would changing the query
structure affect by making the GROUP BY clause a list of lists (the list of
the grouping sets) instead of an expression list as it currently is?

An example of a ROLLUP result might be (pseudoresults..):

CREATE TABLE SALES_SUMMARY (NAME TEXT, DEPARTMENT TEXT, SALES INTEGER);
-- populate with data

SELECT DEPARTMENT, NAME, SUM(SALES) FROM SALES_SUMMARY GROUP BY
ROLLUP(DEPARTMENT,NAME);

DEPARTMENT     NAME     SUM(SALES)
-------------- -------- ----------
Dept one       Bob      13
Dept one       Jeff     12
Dept one       NULL     25
Dept two       Jim      10
Dept two       Mary     11
Dept two       NULL     21
NULL           NULL     46

-- Where the rows with NULLs represent the subtotals and grandtotals,
respectively.

Any thoughts?

Along the same vein, window partitioning would be nice for aggregates.
Aggregate expressions like: RANK() OVER (PARTITION BY DEPARTMENT ORDER BY
SALARY DESC).  They get rid of a lot of subselect operations quite nicely.

These are not simple projects, I know.  There are a lot of features in high
databases I would like to have in open source tools, and I would like to
make a contribution towards getting them there ;)

PS - ...no, I won't even mention materialized views...

-----------------
Robert Bedell

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Tom Lane

Date:

17 December 2003, 18:28:23

"Robert Bedell" <robert@friendlygenius.com> writes:
> I'm curious if anyone has ever looked into adding OLAP functionality (per
> the SQL99 specs) into PostGreSQL.

There was a fairly crude CUBE implementation submitted (and rejected) a
few months ago, but there's not been any work I thought had a chance of
getting committed.  This is an issue of implementation quality rather
than whether we want the feature --- I think the community is interested
in adding any and all features that are in SQL99.

> More specifically I would like to add grouping sets, and the CUBE and ROLLUP
> operators, into postgresql.  Since modifying such the GROUP BY operation
> would necessitate changing the query structure, wouldn't that affect the
> query rewrites and genetic optimizer?

I don't think either the rewriter or GEQO would notice at all.  The
regular optimizer definitely would though.

> These are not simple projects, I know.

Might be a tad ambitious for your first venture into backend hacking...
        regards, tom lane

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Joe Conway

Date:

17 December 2003, 18:45:42

Robert Bedell wrote:
> I'm curious if anyone has ever looked into adding OLAP functionality (per
> the SQL99 specs) into PostGreSQL.  I don't actually own the proper SQL99
> specifications, and since the newer sql2003 ones are coming out I don't know
> which to purchase.  Could someone point me in the right direction?

For a recent draft of SQL2003 see:
http://www.wiscorp.com/sql/sql_2003_standard.zip

FWIW I've been thinking about working on some SQL 2003 OLAP features for 
Postgres 7.5, but it isn't clear what I'll have time for yet. I'm hoping 
to at least work on aggregates that accept two arguments (binary set 
functions), e.g. REGR_*, and window frame aggregates.
(section 4.15.4 and 10.9)

> PS - ...no, I won't even mention materialized views...

There has been a recent discussion about materialized views on one of 
the lists. I think someone might already be working on it. Search the 
archives.

Joe

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

17 December 2003, 19:03:38

> > More specifically I would like to add grouping sets, and the CUBE and
> ROLLUP
> > operators, into postgresql.  Since modifying such the GROUP BY operation
> > would necessitate changing the query structure, wouldn't that affect the
> > query rewrites and genetic optimizer?
> 
> I don't think either the rewriter or GEQO would notice at all.  The
> regular optimizer definitely would though.

Alright, I will focus on the regular optimizer more specifically then.
Thanks!
> > These are not simple projects, I know.
> 
> Might be a tad ambitious for your first venture into backend hacking...

I agree completely.  I'm not purporting to jump in quite that quickly, but
it is something I would like to see added, and am willing to put the time
towards.  If you have any suggestions for items in the TODO list that would
move me more quickly towards my goal I would appreciate it.  (Also useful to
avoid working on something someone else is!)

To Joe Conway:

Thanks for the link!  And if you have any specific ideas on window frame
aggregates I would like to discuss them with you out of band ;)

Thanks for you time ;)

Cheers,

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Hannu Krosing

Date:

17 December 2003, 19:25:21

Tom Lane kirjutas N, 18.12.2003 kell 00:27:
> "Robert Bedell" <robert@friendlygenius.com> writes:
> > I'm curious if anyone has ever looked into adding OLAP functionality (per
> > the SQL99 specs) into PostGreSQL.

As a first project one could think of implementing NULLS FIRST/LAST
(from 4.14.9) for all ORDER BY operations.

> There was a fairly crude CUBE implementation submitted (and rejected) a
> few months ago, but there's not been any work I thought had a chance of
> getting committed.  This is an issue of implementation quality rather
> than whether we want the feature --- I think the community is interested
> in adding any and all features that are in SQL99.
> 
> > More specifically I would like to add grouping sets, and the CUBE and ROLLUP
> > operators, into postgresql.  Since modifying such the GROUP BY operation
> > would necessitate changing the query structure, wouldn't that affect the
> > query rewrites and genetic optimizer?
> 
> I don't think either the rewriter or GEQO would notice at all.  The
> regular optimizer definitely would though.

If it would mess up the optimiser, then could the extra aggregators not
be put in after optimisations, at least for hash aggregators ?

The implementation using hash aggregators should be just a SMOP
http://people.kldp.org/~eunjea/jargon/?idx=SMOP.html ;)

While ROLLUP could easily be implemented on the same scan over sorted
data as an ordinary GROUP BY by adding extra aggregators for partial
matches, CUBE and arbitrary grouping sets can't.

ie the query

SELECT SUM(a)
GROUP BY ROLLUP (b,c)

if run over a sorted set, would have a distinct aggregator for each of
(b,c), (b,) and (,) which can be reinitialised/reused every time b,c or
b changes. CUBE(b,c) would need hash aggregators for at least (,c).

> > These are not simple projects, I know.
> 
> Might be a tad ambitious for your first venture into backend hacking...

OTOH it may be quite easy to *test* implementation ideas by always
expand GROUP BY into ROLLUP or CUBE by adding the aggregators just
before the executor, without touching the parser and pre-optimisation
query tree at all.

--------------
Hannu

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Hannu Krosing

Date:

17 December 2003, 19:33:16

Robert Bedell kirjutas N, 18.12.2003 kell 01:02:

> > > These are not simple projects, I know.
> > 
> > Might be a tad ambitious for your first venture into backend hacking...
> 
> I agree completely.  I'm not purporting to jump in quite that quickly, but
> it is something I would like to see added, and am willing to put the time
> towards.  If you have any suggestions for items in the TODO list that would
> move me more quickly towards my goal I would appreciate it.

I guess that by adding hash aggregates Tom solved most problems of
adding ROLLUP, CUBE and GROUPING SETS.

OTOH, I'm not sure if hash aggregates can already spill to disk if not
enough memory is available for keeping them all. If not, then adding
this capability would be great push towards their general use for
GROUPING SETS.

ALso, a mix of scan-over-sorted-group-by + hash aggregates for
out-of-order extra groups would be great way to using less memory for
hash aggregates.

----------------
Hannu

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

17 December 2003, 19:56:40

> I guess that by adding hash aggregates Tom solved most problems of
> adding ROLLUP, CUBE and GROUPING SETS.
> 
> OTOH, I'm not sure if hash aggregates can already spill to disk if not
> enough memory is available for keeping them all. If not, then adding
> this capability would be great push towards their general use for
> GROUPING SETS.
> 
> ALso, a mix of scan-over-sorted-group-by + hash aggregates for
> out-of-order extra groups would be great way to using less memory for
> hash aggregates.

The other issue is that in a scan-over-sorted-group-by without out of order
grouping sets you can return tuples as you reset the aggregators.  With out
of order grouping sets you would have to wait until the whole table was
scanned - at least for those grouping sets - to return the resulting tuples.
Since this could get rather large the spill to disk functionality is
necessary.  It should probably mimic how the sort does it...

Another point is selecting the best way to sort a given collection of
grouping sets for minimal memory usage.  Any ORDER BY in the query should
really be applied after the grouping operation.

The CUBE and ROLLUP operators should really be applied by expanding them
into the equivalent collections of grouping sets.

Cheers,

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Hannu Krosing

Date:

17 December 2003, 20:09:09

Robert Bedell kirjutas N, 18.12.2003 kell 01:55:
> > I guess that by adding hash aggregates Tom solved most problems of
> > adding ROLLUP, CUBE and GROUPING SETS.
> > 
> > OTOH, I'm not sure if hash aggregates can already spill to disk if not
> > enough memory is available for keeping them all. If not, then adding
> > this capability would be great push towards their general use for
> > GROUPING SETS.
> > 
> > ALso, a mix of scan-over-sorted-group-by + hash aggregates for
> > out-of-order extra groups would be great way to using less memory for
> > hash aggregates.
> 
> The other issue is that in a scan-over-sorted-group-by without out of order
> grouping sets you can return tuples as you reset the aggregators.  With out
> of order grouping sets you would have to wait until the whole table was
> scanned - at least for those grouping sets - to return the resulting tuples.
> Since this could get rather large the spill to disk functionality is
> necessary.  It should probably mimic how the sort does it...
> 
> Another point is selecting the best way to sort a given collection of
> grouping sets for minimal memory usage. 

it seems that the longest GROUPING SET and all its left-continuous
subsets could be collected from the sorted scan and the rest from hash
aggregates. 

GROUPING SET () will always need a "hash" ;)

To optimise any further would require use of statistics data, and is
probably not a good idea to do before having the simpler one implemented

> Any ORDER BY in the query should
> really be applied after the grouping operation.
> 
> The CUBE and ROLLUP operators should really be applied by expanding them
> into the equivalent collections of grouping sets.

For pure ROLLUP one could shortcut the split-into-groups and
put-together-again process, as ROLLUP is already doable from single
sorted scan.

-------------
Hannu

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

17 December 2003, 20:15:59

> it seems that the longest GROUPING SET and all its left-continuous
> subsets could be collected from the sorted scan and the rest from hash
> aggregates.
> 
> GROUPING SET () will always need a "hash" ;)
> 
> To optimise any further would require use of statistics data, and is
> probably not a good idea to do before having the simpler one implemented

Absolutely right.  That's a good starting point.
> > Any ORDER BY in the query should
> > really be applied after the grouping operation.
> >
> > The CUBE and ROLLUP operators should really be applied by expanding them
> > into the equivalent collections of grouping sets.
> 
> For pure ROLLUP one could shortcut the split-into-groups and
> put-together-again process, as ROLLUP is already doable from single
> sorted scan.

Actually as long as the grouping sets are all left-continuous of the longest
grouping set it's doable from a single sorted scan.  If done with the right
implementation separating resetable aggregators and out of order aggregators
you could get this optimization for free.  This avoids having to look for
ROLLUP specifically.

Cheers,

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

17 December 2003, 20:27:07

> > For pure ROLLUP one could shortcut the split-into-groups and
> > put-together-again process, as ROLLUP is already doable from single
> > sorted scan.
> 
> Actually as long as the grouping sets are all left-continuous of the
> longest
> grouping set it's doable from a single sorted scan.  If done with the
> right
> implementation separating resetable aggregators and out of order
> aggregators
> you could get this optimization for free.  This avoids having to look for
> ROLLUP specifically.

Ok, putting it that way was kind of dumb - and you already said that mostly.
What I mean to say is you don't have to check for ROLLUP specifically to get
that optimization..  This also allows you to do "GROUP BY a, b, ROLLUP(c,d)"
(resulting in grouping sets {{a,b,c,d},{a,b,c},{a,b}}) in an optimized
fashion without having to check for PURE rollup.

Cheers,

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Tom Lane

Date:

17 December 2003, 21:20:35

Hannu Krosing <hannu@tm.ee> writes:
> OTOH, I'm not sure if hash aggregates can already spill to disk if not
> enough memory is available for keeping them all.

They do not, which is something it'd be good to fix, since if the
planner drastically underestimates the number of groups, you could end
up with a hashtable far bigger than the advertised SortMem.  Even if
this doesn't end up with an "out of memory" error, it could still drive
you into swap hell.

(Maybe that should be Robert's get-your-feet-wet project.)
        regards, tom lane

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

18 December 2003, 00:09:33

> Hannu Krosing <hannu@tm.ee> writes:
> > OTOH, I'm not sure if hash aggregates can already spill to disk if not
> > enough memory is available for keeping them all.
> 
> They do not, which is something it'd be good to fix, since if the
> planner drastically underestimates the number of groups, you could end
> up with a hashtable far bigger than the advertised SortMem.  Even if
> this doesn't end up with an "out of memory" error, it could still drive
> you into swap hell.
> 
> (Maybe that should be Robert's get-your-feet-wet project.)

Ok, I've been looking through the executor - nodeGroup.c, nodeAgg.c,
dynahash.c, execGrouping.c, etc..  I've found the places where hashed
aggregation occurs.  It looks like hashed groups simply read the entire
result in memory via a function chain through lookup_hash_entry() ->
LookupTupleHashEntry() -> hash_search().  I think that LookupTupleHashEntry
is the best place to put the code to spill over unto disk, since that is
used only by the Group, Agg, and Subplan executor nodes.  Putting it there
gives the added benefit of letting hashed subselect results spill over unto
disk as well (not sure if that's the right thing to do).  I have a couple of
questions:

1) When does the optimizer set the nodeAgg plan to HASHED?  The current
implementation simply reads through the entire result before returning
anything, so obviously it's not always done.  Sorry if I should RTFM on this
one....

2) What mechanism would be best to use for storing the data on disk?  I know
there is a temporary table mechanism, I'll be hunting for that shortly..

3) What should define the spillover point.  The documentation points to the
'sort_mem' parameter for this, but the code doesn't look to actually
implement that yet.  Similarly for the hash_search() function - I didn't see
anything to store it to disk.

4) Should LookupTupleHashEntry() be worried about the pointers it
receives...similarly for hash_search()?

Obviously (2) is the really big question.  What's the best way to do this?
I still have a bit more to go before I understand what's going on, but I'm
starting to grasp it.  Any tips would be appreciated! :)

Cheers!

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Tom Lane

Date:

18 December 2003, 00:19:27

"Robert Bedell" <robert@friendlygenius.com> writes:
> 1) When does the optimizer set the nodeAgg plan to HASHED?

See grouping_planner() in src/backend/optimizer/plan/planner.c
particularly the logic around use_hashed_grouping.

> 2) What mechanism would be best to use for storing the data on disk?  I know
> there is a temporary table mechanism, I'll be hunting for that shortly..

Temp files, not temp tables.  You could look at
src/backend/utils/sort/tuplesort.c or src/backend/executor/nodeHash.c
for examples.

> 3) What should define the spillover point.

sort_mem.

> The documentation points to the
> 'sort_mem' parameter for this, but the code doesn't look to actually
> implement that yet.

Well, yeah, that's sort of exactly the point ... it's considered during
planning but the executor code has no fallback if the planner guesses
wrong.

> 4) Should LookupTupleHashEntry() be worried about the pointers it
> receives...similarly for hash_search()?

Eh?
        regards, tom lane

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

18 December 2003, 00:26:20

Thanks for the pointers!
> > The documentation points to the
> > 'sort_mem' parameter for this, but the code doesn't look to actually
> > implement that yet.
> 
> Well, yeah, that's sort of exactly the point ... it's considered during
> planning but the executor code has no fallback if the planner guesses
> wrong.

Yep.  The doc is wishful thinking then ;)  I was just verifying that.
> > 4) Should LookupTupleHashEntry() be worried about the pointers it
> > receives...similarly for hash_search()?

Was wondering how safe it was to write hashed data to disk.. given your
suggestions (and the obvious necessity) pretty safe, but it requires some
handling.  My bad.

Cheers!

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Tom Lane

Date:

18 December 2003, 00:39:00

Also ...

"Robert Bedell" <robert@friendlygenius.com> writes:
> ... I think that LookupTupleHashEntry
> is the best place to put the code to spill over unto disk, since that is
> used only by the Group, Agg, and Subplan executor nodes.

That's the right area to be looking at, but I don't think you can expect
to do a decent job with localized hacking in LookupTupleHashEntry.  That
function's API is predicated on the assumption that you have random
access to any entry in the hash table --- which stops being true as soon
as you spill some entries to disk.  The key problem here is to figure
out how to reorganize the computation to cope efficiently with nonrandom
access to the hash table.  You might want to look at the algorithm
embodied in nodeHash.c/nodeHashjoin.c, which partitions the hash table
and avoids cross-partition accesses.  I don't think it's directly
applicable :-(, because it chooses the number of partitions in advance,
which is exactly the luxury we do not have here --- if the planner had
gotten the number of entries right then we'd not be needing to spill.
But it might give some ideas.
        regards, tom lane

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

"Robert Bedell"

Date:

18 December 2003, 01:39:52

> That's the right area to be looking at, but I don't think you can expect
> to do a decent job with localized hacking in LookupTupleHashEntry.  That
> function's API is predicated on the assumption that you have random
> access to any entry in the hash table --- which stops being true as soon
> as you spill some entries to disk.  The key problem here is to figure
> out how to reorganize the computation to cope efficiently with nonrandom
> access to the hash table.  You might want to look at the algorithm
> embodied in nodeHash.c/nodeHashjoin.c, which partitions the hash table
> and avoids cross-partition accesses.  I don't think it's directly
> applicable :-(, because it chooses the number of partitions in advance,
> which is exactly the luxury we do not have here --- if the planner had
> gotten the number of entries right then we'd not be needing to spill.
> But it might give some ideas.

Without detailed statistics on the distribution of the result set with
respect to the grouping set(s), and without sorting the data, I don't think
there is a way to efficiently chop it up.

One idea is this: If you scan the result set once for all grouping sets,
storing those uniquely in hash batches in temp files, you could do the
aggegration in multiple passes by rescanning the result set for each batch
and skipping the rows that didn't match anything in the current hash batch.
The problem is that you're rescanning the underlying result all the time,
and this is not going to scale.  Hence I don't like this idea.

I think a better idea is to actually leave the random access concept in
there and implement an LRU cache on the grouping sets in the memory space
provided by sort_mem, growing that cache as needed until it has to drop it
out to disk.  This is essentially what virtual memory would do for us
anyway, but it's nicer on system since we cap the resources instead of it.
I don't know how this would fit into the existing hash table mechanism
(maybe not at all), but it would allow grouping sets to spill over unto
disk.  This, after all, is just an emergency break for a HASHED plan node
that was not correctly planned in the first place.

It would also allow spillover of grouping sets later on rather efficiently:
If the strategy is AGG_HASHED, the entire thing should fit into memory and
we don't need to worry about sorting the input or separating grouping sets
into resettable aggregators and out of order aggregators.  If you DO need to
worry about that, you:

- sort the input
- return the tuples resulting from resettable aggregators (those computable
from the ordering of the sort) immediately
- use the hash table mechanism to store grouping results that are out of
order.  (with it's associated spillover unto disk).  Since the hash table
preserved the concept of random access this is transparent.
- return the results from the hash table after you are done with the initial
scan of the underlying table.

The main issues with an LRU cache of the hash table on disk are (save me on
this, I'm dumb....:)):

- nonuniform size of tuples
- storing buckets instead of single tuple entries
- avoiding thrashing

Anyway those are just my two cents.  Take them as they are, I haven't
finished looking at tuplestore.c and co yet.

Cheers!

Robert

Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets

From

Sailesh Krishnamurthy

Date:

18 December 2003, 12:12:54

We have a rude hack of temping hashed aggs to disk to deal with the
case where there is not enough memory. I don't think that's an ideal
solution, but it certainly has the code to dump to file. I can post
the patch later in the day ..

(This is some code for our undergrad db class assignment. I was
planning to clean it up and submit it properly but I never got the
time)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh