Re: 9.5: Memory-bounded HashAgg - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: 9.5: Memory-bounded HashAgg
Date
Msg-id 1408040474.2335.186.camel@jeff-desktop
Whole thread Raw
In response to Re: 9.5: Memory-bounded HashAgg  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 9.5: Memory-bounded HashAgg
List pgsql-hackers
On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
> Oh?  So if we have aggregates like array_agg whose memory footprint
> increases over time, the patch completely fails to avoid bloat?

Yes, in its current form.

> I might think a patch with such a limitation was useful, if it weren't
> for the fact that aggregates of that nature are a large part of the
> cases where the planner misestimates the table size in the first place.
> Any complication that we add to nodeAgg should be directed towards
> dealing with cases that the planner is likely to get wrong.

In my experience, the planner has a lot of difficulty estimating the
cardinality unless it's coming from a base table without any operators
above it (other than maybe a simple predicate). This is probably a lot
more common than array_agg problems, simply because array_agg is
relatively rare compared with GROUP BY in general.

Also, there are also cases where my patch should win against Sort even
when it does go to disk. For instance, high enough cardinality to exceed
work_mem, but also a large enough group size. Sort will have to deal
with all of the tuples before it can group any of them, whereas HashAgg
can group at least some of them along the way.

Consider the skew case where the cardinality is 2M, work_mem fits 1M
groups, and the input consists of the keys 1..1999999 mixed randomly
inside one billion zeros. (Aside: if the input is non-random, you may
not get the skew value before the hash table fills up, in which case
HashAgg is just as bad as Sort.)

That being said, we can hold out for an array_agg fix if desired. As I
pointed out in another email, my proposal is compatible with the idea of
dumping groups out of the hash table, and does take some steps in that
direction.

Regards,Jeff Davis





pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Function to know last log write timestamp
Next
From: Oleg Bartunov
Date:
Subject: Re: jsonb format is pessimal for toast compression