Re: explain HashAggregate to report bucket and memory stats - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: explain HashAggregate to report bucket and memory stats |
Date | |
Msg-id | 20200306175859.d56ohskarwldyrrw@alap3.anarazel.de Whole thread Raw |
In response to | Re: explain HashAggregate to report bucket and memory stats (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: explain HashAggregate to report bucket and memory stats
Re: explain HashAggregate to report bucket and memory stats |
List | pgsql-hackers |
Hi, On 2020-03-01 09:45:40 -0600, Justin Pryzby wrote: > +/* > + * Show hash bucket stats and (optionally) memory. > + */ > +static void > +show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es) > +{ > + long spacePeakKb_tuples = (inst->space_peak_tuples + 1023) / 1024, > + spacePeakKb_hash = (inst->space_peak_hash + 1023) / 1024; Let's not add further uses of long. It's terrible because it has different widths on 64bit windows and everything else. Specify the widths explicitly, or use something like size_t. > + if (es->format != EXPLAIN_FORMAT_TEXT) > + { > + ExplainPropertyInteger("Hash Buckets", NULL, > + inst->nbuckets, es); > + ExplainPropertyInteger("Original Hash Buckets", NULL, > + inst->nbuckets_original, es); > + ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB", > + spacePeakKb_hash, es); > + ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB", > + spacePeakKb_tuples, es); And then you're passing the long to ExplainPropertyInteger which accepts a int64, making the use of long above particularly suspicious. I wonder if it would make sense to add a ExplainPropertyBytes(), that would do the rounding etc automatically. It could then also switch units as appropriate. > + } > + else if (!inst->nbuckets) > + ; /* Do nothing */ > + else > + { > + if (inst->nbuckets_original != inst->nbuckets) > + { > + ExplainIndentText(es); > + appendStringInfo(es->str, > + "Buckets: %ld (originally %ld)", > + inst->nbuckets, > + inst->nbuckets_original); > + } > + else > + { > + ExplainIndentText(es); > + appendStringInfo(es->str, > + "Buckets: %ld", > + inst->nbuckets); > + } > + > + if (es->analyze) > + appendStringInfo(es->str, > + " Memory Usage: hashtable: %ldkB, tuples: %ldkB", > + spacePeakKb_hash, spacePeakKb_tuples); > + appendStringInfoChar(es->str, '\n'); I'm not sure I like the alternative output formats here. All the other fields are separated with a comma, but the original size is in parens. I'd probably just format it as "Buckets: %lld " and then add ", Original Buckets: %lld" when differing. Also, %ld is problematic, because it's only 32bit wide on some platforms (including 64bit windows), but 64bit on others. The easiest way to deal with that is to use %lld and cast the argument to long long - yes that's a sad workaround. > +/* Update instrumentation stats */ > +void > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial) > +{ > + hashtable->instrument.nbuckets = hashtable->hashtab->size; > + if (initial) > + { > + hashtable->instrument.nbuckets_original = hashtable->hashtab->size; > + hashtable->instrument.space_peak_hash = hashtable->hashtab->size * > + sizeof(TupleHashEntryData); > + hashtable->instrument.space_peak_tuples = 0; > + } > + else > + { > +#define maxself(a,b) a=Max(a,b) > + /* hashtable->entrysize includes additionalsize */ > + maxself(hashtable->instrument.space_peak_hash, > + hashtable->hashtab->size * sizeof(TupleHashEntryData)); > + maxself(hashtable->instrument.space_peak_tuples, > + hashtable->hashtab->members * hashtable->entrysize); > +#undef maxself > + } > +} Not a fan of this macro. I'm also not sure I understand what you're trying to do here? > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out > index f457b5b150..b173b32cab 100644 > --- a/src/test/regress/expected/aggregates.out > +++ b/src/test/regress/expected/aggregates.out > @@ -517,10 +517,11 @@ order by 1, 2; > -> HashAggregate > Output: s2.s2, sum((s1.s1 + s2.s2)) > Group Key: s2.s2 > + Buckets: 4 > -> Function Scan on pg_catalog.generate_series s2 > Output: s2.s2 > Function Call: generate_series(1, 3) > -(14 rows) > +(15 rows) These tests probably won't be portable. The number of hash buckets calculated will e.g. depend onthe size of the contained elements. And that'll e.g. will depend on whether pointers are 4 or 8 bytes. Greetings, Andres Freund
pgsql-hackers by date: