Home > mailing lists

Re: Large Scale Aggregation (HashAgg Enhancement) - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Large Scale Aggregation (HashAgg Enhancement)
Date	January 19, 2006 19:20:54
Msg-id	1137712854.23075.21.camel@localhost.localdomain Whole thread Raw
In response to	Re: Large Scale Aggregation (HashAgg Enhancement) (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Large Scale Aggregation (HashAgg Enhancement)
List	pgsql-hackers

Tree view

On Tue, 2006-01-17 at 21:43 +0000, Simon Riggs wrote:
> On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote:
> > I was thinking along the lines of having multiple temp files per hash
> > bucket.  If you have a tuple that needs to migrate from bucket M to
> > bucket N, you know that it arrived before every tuple that was
> > assigned
> > to bucket N originally, so put such tuples into a separate temp file
> > and process them before the main bucket-N temp file.  This might get a
> > little tricky to manage after multiple hash resizings, but in
> > principle
> > it seems doable.

> You can manage that with file naming. Rows moved from batch N to batch M
> would be renamed N.M, so you'd be able to use file ordering to retrieve
> all files for *.M
> That scheme would work for multiple splits too, so that filenames could
> grow yet retain their sort order and final target batch properties.

This seems to lead to a super-geometric progression in the number of
files required, if we assume that the current batch could be
redistributed to all future batches each of which could be similarly
redistributed.

batches
1    no files
2    1 file
4    7 files
8    64 files
16    64,000 files
32    4 billion files ish

So it does seem important whether we demand sorted input or not.

Or at least requires us to provide the executor with a starting point
for the number of batches, so we could manage that.

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Tom Lane
Date: 19 January 2006, 19:20:26
Subject: Re: No heap lookups on index

From: Tom Lane
Date: 19 January 2006, 19:26:13
Subject: Re: Bug: random() can return 1.0

Re: Large Scale Aggregation (HashAgg Enhancement) - Mailing list pgsql-hackers

Previous

Next