Re: explain HashAggregate to report bucket and memory stats - Mailing list pgsql-hackers

From Andres Freund
Subject Re: explain HashAggregate to report bucket and memory stats
Date
Msg-id 20200306175859.d56ohskarwldyrrw@alap3.anarazel.de
Whole thread Raw
In response to Re: explain HashAggregate to report bucket and memory stats  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: explain HashAggregate to report bucket and memory stats
Re: explain HashAggregate to report bucket and memory stats
List pgsql-hackers
Hi,

On 2020-03-01 09:45:40 -0600, Justin Pryzby wrote:
> +/*
> + * Show hash bucket stats and (optionally) memory.
> + */
> +static void
> +show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es)
> +{
> +    long    spacePeakKb_tuples = (inst->space_peak_tuples + 1023) / 1024,
> +        spacePeakKb_hash = (inst->space_peak_hash + 1023) / 1024;

Let's not add further uses of long. It's terrible because it has
different widths on 64bit windows and everything else. Specify the
widths explicitly, or use something like size_t.


> +    if (es->format != EXPLAIN_FORMAT_TEXT)
> +    {
> +        ExplainPropertyInteger("Hash Buckets", NULL,
> +                               inst->nbuckets, es);
> +        ExplainPropertyInteger("Original Hash Buckets", NULL,
> +                               inst->nbuckets_original, es);
> +        ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
> +                               spacePeakKb_hash, es);
> +        ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",
> +                               spacePeakKb_tuples, es);

And then you're passing the long to ExplainPropertyInteger which accepts
a int64, making the use of long above particularly suspicious.

I wonder if it would make sense to add a ExplainPropertyBytes(), that
would do the rounding etc automatically. It could then also switch units
as appropriate.


> +    }
> +    else if (!inst->nbuckets)
> +        ; /* Do nothing */
> +    else
> +    {
> +        if (inst->nbuckets_original != inst->nbuckets)
> +        {
> +            ExplainIndentText(es);
> +            appendStringInfo(es->str,
> +                        "Buckets: %ld (originally %ld)",
> +                        inst->nbuckets,
> +                        inst->nbuckets_original);
> +        }
> +        else
> +        {
> +            ExplainIndentText(es);
> +            appendStringInfo(es->str,
> +                        "Buckets: %ld",
> +                        inst->nbuckets);
> +        }
> +
> +        if (es->analyze)
> +            appendStringInfo(es->str,
> +                    "  Memory Usage: hashtable: %ldkB, tuples: %ldkB",
> +                    spacePeakKb_hash, spacePeakKb_tuples);
> +        appendStringInfoChar(es->str, '\n');

I'm not sure I like the alternative output formats here. All the other
fields are separated with a comma, but the original size is in
parens. I'd probably just format it as "Buckets: %lld " and then add
", Original Buckets: %lld" when differing.

Also, %ld is problematic, because it's only 32bit wide on some platforms
(including 64bit windows), but 64bit on others. The easiest way to deal
with that is to use %lld and cast the argument to long long - yes that's
a sad workaround.


> +/* Update instrumentation stats */
> +void
> +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> +{
> +    hashtable->instrument.nbuckets = hashtable->hashtab->size;
> +    if (initial)
> +    {
> +        hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> +        hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> +            sizeof(TupleHashEntryData);
> +        hashtable->instrument.space_peak_tuples = 0;
> +    }
> +    else
> +    {
> +#define maxself(a,b) a=Max(a,b)
> +        /* hashtable->entrysize includes additionalsize */
> +        maxself(hashtable->instrument.space_peak_hash,
> +                hashtable->hashtab->size * sizeof(TupleHashEntryData));
> +        maxself(hashtable->instrument.space_peak_tuples,
> +                hashtable->hashtab->members * hashtable->entrysize);
> +#undef maxself
> +    }
> +}

Not a fan of this macro.

I'm also not sure I understand what you're trying to do here?


> diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> index f457b5b150..b173b32cab 100644
> --- a/src/test/regress/expected/aggregates.out
> +++ b/src/test/regress/expected/aggregates.out
> @@ -517,10 +517,11 @@ order by 1, 2;
>           ->  HashAggregate
>                 Output: s2.s2, sum((s1.s1 + s2.s2))
>                 Group Key: s2.s2
> +               Buckets: 4
>                 ->  Function Scan on pg_catalog.generate_series s2
>                       Output: s2.s2
>                       Function Call: generate_series(1, 3)
> -(14 rows)
> +(15 rows)

These tests probably won't be portable. The number of hash buckets
calculated will e.g. depend onthe size of the contained elements. And
that'll e.g. will depend on whether pointers are 4 or 8 bytes.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
Next
From: Kirill Bychik
Date:
Subject: Re: WAL usage calculation patch