Home > mailing lists

Re: Large Scale Aggregation (HashAgg Enhancement) - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Large Scale Aggregation (HashAgg Enhancement)
Date	January 17, 2006 17:43:12
Msg-id	1137534189.3180.288.camel@localhost.localdomain Whole thread
In response to	Re: Large Scale Aggregation (HashAgg Enhancement) (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Large Scale Aggregation (HashAgg Enhancement) Re: Large Scale Aggregation (HashAgg Enhancement)
List	pgsql-hackers

Tree view

On Tue, 2006-01-17 at 14:41 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
> >> The tricky part is to preserve the existing guarantee that tuples are
> >> merged into their aggregate in arrival order.
> 
> > You almost had me there... but there isn't any "arrival order".
> 
> The fact that it's not in the spec doesn't mean we don't support it.
> Here are a couple of threads on the subject:
> http://archives.postgresql.org/pgsql-general/2005-11/msg00304.php
> http://archives.postgresql.org/pgsql-sql/2003-06/msg00135.php
> 
> Per the second message, this has worked since 7.4, and it was requested
> fairly often before that.

OK.... My interest was in expanding the role of HashAgg, which as Rod
says can be used to avoid the sort, so the overlap between those ideas
was low anyway.

On Tue, 2006-01-17 at 09:52 -0500, Tom Lane wrote:
> I was thinking along the lines of having multiple temp files per hash
> bucket.  If you have a tuple that needs to migrate from bucket M to
> bucket N, you know that it arrived before every tuple that was
> assigned
> to bucket N originally, so put such tuples into a separate temp file
> and process them before the main bucket-N temp file.  This might get a
> little tricky to manage after multiple hash resizings, but in
> principle
> it seems doable.

OK, so we do need to do this when we have a defined arrival order: this
idea the best one so far. I don't see any optimization of this by
ignoring the arrival order, so it seems best to preserve the ordering
this way in all cases.

You can manage that with file naming. Rows moved from batch N to batch M
would be renamed N.M, so you'd be able to use file ordering to retrieve
all files for *.M
That scheme would work for multiple splits too, so that filenames could
grow yet retain their sort order and final target batch properties.

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Tom Lane
Date: 17 January 2006, 15:41:27
Subject: Re: Large Scale Aggregation (HashAgg Enhancement)

From: Simon Riggs
Date: 17 January 2006, 19:29:24
Subject: Re: Large Scale Aggregation (HashAgg Enhancement)

Re: Large Scale Aggregation (HashAgg Enhancement) - Mailing list pgsql-hackers

Previous

Next