Home > mailing lists

Re: explain HashAggregate to report bucket and memory stats - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: explain HashAggregate to report bucket and memory stats
Date	February 3, 2020 17:53:01
Msg-id	20200203145301.53mozz7gcdaklnjc@alap3.anarazel.de Whole thread Raw
In response to	explain HashAggregate to report bucket and memory stats (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: explain HashAggregate to report bucket and memory stats
List	pgsql-hackers

Tree view

Hi,

On 2020-01-03 10:19:26 -0600, Justin Pryzby wrote:
> On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > What would I find very useful is [...] if the HashAggregate node under
> > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > node would report...anything.

Yea, that'd be amazing. It probably should be something every
execGrouping.c using node can opt into.

Justin: As far as I can tell, you're trying to share one instrumentation
state between hashagg and hashjoins. I'm doubtful that's a good
idea. The cases are different enough that that's probably just going to
be complicated, without actually simplifying anything.

> Jeff: can you suggest what details Aggregate should show ?

Memory usage most importantly. Probably makes sense to differentiate
between the memory for the hashtable itself, and the tuples in it (since
they're allocated separately, and just having a overly large hashtable
doesn't hurt that much if it's not filled).

> diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
> index 3603c58..cf0fe3c 100644
> --- a/src/backend/executor/execGrouping.c
> +++ b/src/backend/executor/execGrouping.c
> @@ -203,6 +203,11 @@ BuildTupleHashTableExt(PlanState *parent,
>          hashtable->hash_iv = 0;
>  
>      hashtable->hashtab = tuplehash_create(metacxt, nbuckets, hashtable);
> +    hashtable->hinstrument.nbuckets_original = nbuckets;
> +    hashtable->hinstrument.nbuckets = nbuckets;
> +    hashtable->hinstrument.space_peak = entrysize * hashtable->hashtab->size;

That's not actually an accurate accounting of memory, because for filled
entries a lot of memory is used to store actual tuples:

static TupleHashEntryData *
lookup_hash_entry(AggState *aggstate)
...
    /* find or create the hashtable entry using the filtered tuple */
    entry = LookupTupleHashEntry(perhash->hashtable, hashslot, &isnew);

    if (isnew)
    {
        AggStatePerGroup pergroup;
        int            transno;

        pergroup = (AggStatePerGroup)
            MemoryContextAlloc(perhash->hashtable->tablecxt,
                               sizeof(AggStatePerGroupData) * aggstate->numtrans);
        entry->additional = pergroup;

since the memory doesn't actually shrink unless the hashtable is
destroyed or reset, it'd probably be sensible to compute the memory
usage either at reset, or at the end of the query.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 03 February 2020, 17:40:01
Subject: Re: Add %x to PROMPT1 and PROMPT2

From: Tom Lane
Date: 03 February 2020, 18:56:09
Subject: Re: BUG #16171: Potential malformed JSON in explain output

Re: explain HashAggregate to report bucket and memory stats - Mailing list pgsql-hackers

Previous

Next