Home > mailing lists

Re: 9.5: Better memory accounting, towards memory-bounded HashAgg - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date	August 6, 2014 18:49:26
Msg-id	53E27895.1010505@fuzzy.cz Whole thread Raw
In response to	9.5: Better memory accounting, towards memory-bounded HashAgg (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On 2.8.2014 22:40, Jeff Davis wrote:
> Attached is a patch that explicitly tracks allocated memory (the blocks,
> not the chunks) for each memory context, as well as its children.
> 
> This is a prerequisite for memory-bounded HashAgg, which I intend to

Anyway, I'm really looking forward to the memory-bounded hashagg, and
I'm willing to spend some time testing it.

> submit for the next CF. Hashjoin tracks the tuple sizes that it adds to
> the hash table, which is a good estimate for Hashjoin. But I don't think

Actually, even for HashJoin the estimate is pretty bad, and varies a lot
depending on the tuple width. With "narrow" tuples (e.g. ~40B of data),
which is actually quite common case, you easily get ~100% palloc overhead.

I managed to address that by using a custom allocator. See this:
  https://commitfest.postgresql.org/action/patch_view?id=1503

I wonder whether something like that would be possible for the hashagg?

That would make the memory accounting accurate with 0% overhead (because
it's not messing with the memory context at all), but it only for the
one node (but maybe that's OK?).

> it's as easy for Hashagg, for which we need to track transition values,
> etc. (also, for HashAgg, I expect that the overhead will be more
> significant than for Hashjoin). If we track the space used by the memory
> contexts directly, it's easier and more accurate.

I don't think that's comparable - I can easily think about cases leading
to extreme palloc overhead with HashAgg (think of an aggregate
implementing COUNT DISTINCT - that effectively needs to store all the
values, and with short values the palloc overhead will be terrible).

Actually, I was looking at HashAgg (it's a somehow natural direction
after messing with Hash Join), and my plan was to use a similar dense
allocation approach. The trickiest part would probably me making this
available from the custom aggregates.

regards
Tomas

pgsql-hackers by date:

From: Peter Geoghegan
Date: 06 August 2014, 17:52:40
Subject: Re: A worst case for qsort

From: Robert Haas
Date: 06 August 2014, 18:57:02
Subject: Re: posix_fadvise() and pg_receivexlog

Re: 9.5: Better memory accounting, towards memory-bounded HashAgg - Mailing list pgsql-hackers

Previous

Next