Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets - Mailing list pgsql-hackers

From Robert Bedell
Subject Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Date
Msg-id 200312172308104.SM00984@xavier
Whole thread Raw
In response to Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets
List pgsql-hackers
> Hannu Krosing <hannu@tm.ee> writes:
> > OTOH, I'm not sure if hash aggregates can already spill to disk if not
> > enough memory is available for keeping them all.
> 
> They do not, which is something it'd be good to fix, since if the
> planner drastically underestimates the number of groups, you could end
> up with a hashtable far bigger than the advertised SortMem.  Even if
> this doesn't end up with an "out of memory" error, it could still drive
> you into swap hell.
> 
> (Maybe that should be Robert's get-your-feet-wet project.)

Ok, I've been looking through the executor - nodeGroup.c, nodeAgg.c,
dynahash.c, execGrouping.c, etc..  I've found the places where hashed
aggregation occurs.  It looks like hashed groups simply read the entire
result in memory via a function chain through lookup_hash_entry() ->
LookupTupleHashEntry() -> hash_search().  I think that LookupTupleHashEntry
is the best place to put the code to spill over unto disk, since that is
used only by the Group, Agg, and Subplan executor nodes.  Putting it there
gives the added benefit of letting hashed subselect results spill over unto
disk as well (not sure if that's the right thing to do).  I have a couple of
questions:

1) When does the optimizer set the nodeAgg plan to HASHED?  The current
implementation simply reads through the entire result before returning
anything, so obviously it's not always done.  Sorry if I should RTFM on this
one....

2) What mechanism would be best to use for storing the data on disk?  I know
there is a temporary table mechanism, I'll be hunting for that shortly..

3) What should define the spillover point.  The documentation points to the
'sort_mem' parameter for this, but the code doesn't look to actually
implement that yet.  Similarly for the hash_search() function - I didn't see
anything to store it to disk.

4) Should LookupTupleHashEntry() be worried about the pointers it
receives...similarly for hash_search()?

Obviously (2) is the really big question.  What's the best way to do this?
I still have a bit more to go before I understand what's going on, but I'm
starting to grasp it.  Any tips would be appreciated! :)

Cheers!

Robert



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: relation_byte_size()
Next
From: Tom Lane
Date:
Subject: Re: OLAP CUBE/ROLLUP Operators and GROUP BY grouping sets