Re: 9.5: Memory-bounded HashAgg - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: 9.5: Memory-bounded HashAgg
Date
Msg-id 1408607390.2335.247.camel@jeff-desktop
Whole thread Raw
In response to Re: 9.5: Memory-bounded HashAgg  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: 9.5: Memory-bounded HashAgg
List pgsql-hackers
On Wed, 2014-08-20 at 14:32 -0400, Robert Haas wrote:
> Well, I think you're certainly right that a hash table lookup is more
> expensive than modulo on a 32-bit integer; so much is obvious.  But if
> the load factor is not too large, I think that it's not a *lot* more
> expensive, so it could be worth it if it gives us other advantages.
> As I see it, the advantage of Jeff's approach is that it doesn't
> really matter whether our estimates are accurate or not.  We don't
> have to decide at the beginning how many batches to do, and then
> possibly end up using too much or too little memory per batch if we're
> wrong; we can let the amount of memory actually used during execution
> determine the number of batches.  That seems good.  Of course, a hash
> join can increase the number of batches on the fly, but only by
> doubling it, so you might go from 4 batches to 8 when 5 would really
> have been enough.  And a hash join also can't *reduce* the number of
> batches on the fly, which might matter a lot.  Getting the number of
> batches right avoids I/O, which is a lot more expensive than CPU.

My approach uses partition counts that are powers-of-two also, so I
don't think that's a big differentiator. In principle my algorithm could
adapt to other partition counts, but I'm not sure how big of an
advantage there is.

I think the big benefit of my approach is that it doesn't needlessly
evict groups from the hashtable. Consider input like 0, 1, 0, 2, ..., 0,
N. For large N, if you evict group 0, you have to write out about N
tuples; but if you leave it in, you only have to write out about N/2
tuples. The hashjoin approach doesn't give you any control over
eviction, so you only have about 1/P chance of saving the skew group
(where P is the ultimate number of partitions). With my approach, we'd
always keep the skew group in memory (unless we're very unlucky, and the
hash table fills up before we even see the skew value).

Regards,Jeff Davis





pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [TODO] Process pg_hba.conf keywords as case-insensitive
Next
From: Julien Rouhaud
Date:
Subject: [TODO] Track number of files ready to be archived in pg_stat_archiver