Re: Large Scale Aggregation (HashAgg Enhancement) - Mailing list pgsql-hackers
From | Rod Taylor |
---|---|
Subject | Re: Large Scale Aggregation (HashAgg Enhancement) |
Date | |
Msg-id | 1137422526.15377.63.camel@home Whole thread Raw |
In response to | Re: Large Scale Aggregation (HashAgg Enhancement) (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Large Scale Aggregation (HashAgg Enhancement)
|
List | pgsql-hackers |
On Mon, 2006-01-16 at 08:32 +0000, Simon Riggs wrote: > On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote: > > A couple of days ago I found myself wanting to aggregate 3 Billion > > tuples down to 100 Million tuples based on an integer key with six > > integer values -- six sum()'s. > > > > PostgreSQL ran out of memory with its Hash Aggregator and doing an old > > style Sort & Sum took a fair amount of time to complete (cancelled the > > process after 24 hours -- small machine). > > > Spilling to disk would be nice but I suspect the obvious method would > > thrash quite badly with non-sorted input. > > There is already hash table overflow (spill to disk) logic in HashJoins, > so this should be possible by reusing that code for HashAggs. That's on > my todo list, but I'd welcome any assistance. > A question: Are the rows in your 3 B row table clumped together based > upon the 100M row key? (or *mostly* so) We might also be able to They are randomly distributed. Fully sorting the data is quite painful. > pre-aggregate the rows using a plan like > HashAgg > SortedAgg > or > SortedAgg > Sort > SortedAgg > > The first SortedAgg seems superfluous, buy would reduce the row volume > considerably if incoming rows were frequently naturally adjacent, even > if the values were not actually sorted. (This could also be done during > sorting, but its much easier to slot the extra executor step into the > plan). That might then reduce the size of the later sort, or allow it to > become a HashAgg. > > I could make that manually enabled using "enable_pre_agg" to allow us to > measure the effectiveness of that technique and decide what cost model > we'd use to make it automatic. Would that help? I don't understand how this helps. The problem isn't the 3B data source rows but rather the 100M destination keys that are being aggregated against. The memory constraints of HashAgg are a result of the large number of target keys and should be the same if it was 100M rows or 10B rows. I think I need something closer to: HashAgg-> HashSort (to disk) HashSort would create a number of files on disk with "similar" data. Grouping all similar keys into a single temporary file which HashAgg can deal with individually (100 loops by 1M target keys instead of 1 loop by 100M target keys). The results would be the same as partitioning by keyblock and running a HashAgg on each partition, but it would be handled by the Executor rather than by client side code. > > I've written something similar using a client and COPY with temporary > > tables. Even with the Export/Import copy I still beat the Sort&Sum > > method PostgreSQL falls back to. > > You can get round this now by chopping the larger table into pieces with > a WHERE clause and then putting them back together with a UNION. If the > table is partitioned, then do this by partitions. True, except this results in several sequential scans over the source data. I can extract and sort in a single pass at client side but it would be far better if I could get PostgreSQL to do the same. I could probably write a plpgsql function to do that logic but it would be quite messy. > This should also help when it comes to recalculating the sums again in > the future, since you'll only need to rescan the rows that have been > added since the last summation. We store the aggregated results and never do this type of calculation on that dataset again. The original dataset comes from about 300 partitions (time and source) and they are removed upon completion. While this calculation is being performed additional partitions are added. I suppose I could store source data in 300 * 1000 partitions (Approx 300 batches times 1000 segments) but that would probably run into other problems. PostgreSQL probably has issues with that many tables. --
pgsql-hackers by date: