Thread: explain HashAggregate to report bucket and memory stats

explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> What would I find very useful is [...] if the HashAggregate node under
> "explain analyze" would report memory and bucket stats; and if the Aggregate
> node would report...anything.

Find attached my WIP attempt to implement this.

Jeff: can you suggest what details Aggregate should show ?

Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Fri, Jan 03, 2020 at 10:19:25AM -0600, Justin Pryzby wrote:
> On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > What would I find very useful is [...] if the HashAggregate node under
> > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > node would report...anything.
> 
> Find attached my WIP attempt to implement this.
> 
> Jeff: can you suggest what details Aggregate should show ?

Rebased on top of 10013684970453a0ddc86050bba813c611114321
And added https://commitfest.postgresql.org/27/2428/



Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Sun, Jan 26, 2020 at 08:14:25AM -0600, Justin Pryzby wrote:
> On Fri, Jan 03, 2020 at 10:19:25AM -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > > node would report...anything.
> > 
> > Find attached my WIP attempt to implement this.
> > 
> > Jeff: can you suggest what details Aggregate should show ?
> 
> Rebased on top of 10013684970453a0ddc86050bba813c611114321
> And added https://commitfest.postgresql.org/27/2428/

Attached for real.

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Andres Freund
Date:
Hi,


On 2020-01-03 10:19:26 -0600, Justin Pryzby wrote:
> On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > What would I find very useful is [...] if the HashAggregate node under
> > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > node would report...anything.

Yea, that'd be amazing. It probably should be something every
execGrouping.c using node can opt into.


Justin: As far as I can tell, you're trying to share one instrumentation
state between hashagg and hashjoins. I'm doubtful that's a good
idea. The cases are different enough that that's probably just going to
be complicated, without actually simplifying anything.


> Jeff: can you suggest what details Aggregate should show ?

Memory usage most importantly. Probably makes sense to differentiate
between the memory for the hashtable itself, and the tuples in it (since
they're allocated separately, and just having a overly large hashtable
doesn't hurt that much if it's not filled).


> diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
> index 3603c58..cf0fe3c 100644
> --- a/src/backend/executor/execGrouping.c
> +++ b/src/backend/executor/execGrouping.c
> @@ -203,6 +203,11 @@ BuildTupleHashTableExt(PlanState *parent,
>          hashtable->hash_iv = 0;
>  
>      hashtable->hashtab = tuplehash_create(metacxt, nbuckets, hashtable);
> +    hashtable->hinstrument.nbuckets_original = nbuckets;
> +    hashtable->hinstrument.nbuckets = nbuckets;
> +    hashtable->hinstrument.space_peak = entrysize * hashtable->hashtab->size;

That's not actually an accurate accounting of memory, because for filled
entries a lot of memory is used to store actual tuples:

static TupleHashEntryData *
lookup_hash_entry(AggState *aggstate)
...
    /* find or create the hashtable entry using the filtered tuple */
    entry = LookupTupleHashEntry(perhash->hashtable, hashslot, &isnew);

    if (isnew)
    {
        AggStatePerGroup pergroup;
        int            transno;

        pergroup = (AggStatePerGroup)
            MemoryContextAlloc(perhash->hashtable->tablecxt,
                               sizeof(AggStatePerGroupData) * aggstate->numtrans);
        entry->additional = pergroup;


since the memory doesn't actually shrink unless the hashtable is
destroyed or reset, it'd probably be sensible to compute the memory
usage either at reset, or at the end of the query.


Greetings,

Andres Freund



Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Mon, Feb 03, 2020 at 06:53:01AM -0800, Andres Freund wrote:
> On 2020-01-03 10:19:26 -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > > node would report...anything.
> 
> Yea, that'd be amazing. It probably should be something every
> execGrouping.c using node can opt into.

Do you think it should be implemented in execGrouping/TupleHashTableData (as I
did) ?  I also did an experiment moving into the higher level nodes, but I
guess that's not actually desirable.  There's currently different output from
tests between the implementation using execGrouping.c and the one outside it,
so there's at least an issue with grouping sets.

> > +    hashtable->hinstrument.nbuckets_original = nbuckets;
> > +    hashtable->hinstrument.nbuckets = nbuckets;
> > +    hashtable->hinstrument.space_peak = entrysize * hashtable->hashtab->size;
> 
> That's not actually an accurate accounting of memory, because for filled
> entries a lot of memory is used to store actual tuples:

Thanks - I think I finally understood this.

I updated some existing tests to show the new output.  I imagine that's a
throwaway commit, and should eventually add new tests for each of these node
types under explain analyze.

I've been testing the various nodes like:

--heapscan:
DROP TABLE t; CREATE TABLE t (i int unique) WITH(autovacuum_enabled=off); INSERT INTO t SELECT
generate_series(1,99999);SET enable_seqscan=off; SET parallel_tuple_cost=0; SET parallel_setup_cost=0; SET
enable_indexonlyscan=off;explain analyze verbose SELECT * FROM t WHERE i BETWEEN 999 and 99999999;
 

--setop:
explain( analyze,verbose) SELECT * FROM generate_series(1,999) EXCEPT (SELECT NULL UNION ALL SELECT * FROM
generate_series(1,99999));
   Buckets: 2048 (originally 256)  Memory Usage: hashtable: 48kB, tuples: 8Kb

--recursive union:
explain analyze verbose WITH RECURSIVE t(n) AS ( SELECT 'foo' UNION SELECT n || ' bar' FROM t WHERE length(n) < 9999)
SELECTn, n IS OF (text) AS is_text FROM t;
 

--subplan
explain analyze verbose SELECT i FROM generate_series(1,999)i WHERE (i,i) NOT IN (SELECT 1,1 UNION ALL SELECT j,j FROM
generate_series(1,99999)j);
   Buckets: 262144 (originally 131072)  Memory Usage: hashtable: 6144kB, tuples: 782Kb
explain analyze verbose select i FROM generate_series(1,999)i WHERE(1,i) NOT in (select i,null::int from t) ;

--Agg:
explain (analyze,verbose) SELECT A,COUNT(1) FROM generate_series(1,99999)a GROUP BY 1;
   Buckets: 262144 (originally 256)  Memory Usage: hashtable: 6144kB, tuples: 782Kb

explain (analyze, verbose) select i FROM generate_series(1,999)i WHERE(1,1) not in (select a,null from (SELECT
generate_series(1,99999)a)x) ;
 

explain analyze verbose select * from (SELECT a FROM generate_series(1,99)a)v left join lateral (select v.a, four, ten,
count(*)from (SELECT b four, 2 ten, b FROM generate_series(1,999)b)x group by cube(four,ten)) s on true order by
v.a,four,ten;

--Grouping sets:
explain analyze verbose   select unique1,
         count(two), count(four), count(ten),
         count(hundred), count(thousand), count(twothousand),
         count(*)
    from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);

-- 
Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
Updated:

 . remove from explain analyze those tests which would display sort
   Memory/Disk.  Oops.
 . fix issue with the first patch showing zero "tuples" memory for some
   grouping sets.
 . reclassify memory as "tuples" if it has to do with "members".  So hashtable
   size is now redundant with nbuckets (if you know
   sizeof(TupleHashEntryData));

-- 
Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Sun, Feb 16, 2020 at 11:53:07AM -0600, Justin Pryzby wrote:
> Updated:
> 
>  . remove from explain analyze those tests which would display sort
>    Memory/Disk.  Oops.

 . Rebased on top of 5b618e1f48aecc66e3a9f60289491da520faae19
 . Updated to avoid sort's Disk output, for real this time.
 . And fixed a syntax error in an intermediate commit.

-- 
Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Tomas Vondra
Date:
Hi,

I've started looking at this patch, because I've been long missing the
information about hashagg hash table, so I'm pleased someone is working
on this. In general I agree it may be useful to add simila information
to other nodes using a hashtable, but IMHO the hashagg bit is the most
useful, so maybe we should focus on it first.

A couple of comments after an initial review of the patches:


1) explain.c API

The functions in explain.c (even the static ones) follow the convention
that the last parameter is (ExplainState *es). I think we should stick
to this, so the new parameters should be added before it.

For example instead of 

   static void ExplainNode(PlanState *planstate, List *ancestors,
     const char *relationship, const char *plan_name,
     ExplainState *es, SubPlanState *subplanstate);

we should do e.g.

   static void ExplainNode(PlanState *planstate, SubPlanState *subplanstate,
     List *ancestors,
     const char *relationship, const char *plan_name,
     ExplainState *es);

Also, the first show_grouping_sets should be renamed to aggstate to make
it consistent with the type change.


2) The hash_instrumentation is a bit inconsistent with what we already
have. We have Instrumentation, JitInstrumentation, WorkerInstrumentation
and HashInstrumentation, so I suggest we follow the same patter and call
this HashTableInstrumentation or something like that.


3) Almost all executor nodes that are modified to include this new
instrumentation struct also include TupleHashTable, and the data are
essentially about the hash table. So my question is why not to include
this into TupleHashTable - that would mean we don't need to modify any
executor nodes, and it'd probably simplify code in explain.c too because
we could simply pass the hashtable.

It'd also simplify e.g. SubPlanState where we have to add two new fields
and make sure to update the right one.


4) The one exception to (3) is BitmapHeapScanState, which does include
TIDBitmap and not TupleHashTable. And then we have tbm_instrumentation
which "fakes" the data based on the pagetable. Maybe this is a sign that
TIDBitmap needs a slightly different struct? Also, I'm not sure why we
actually need tbm_instrumentation()? It just copies the instrumentation
data from TIDBitmap into the node level, but why couldn't we just look
at the instrumentation data in TIDBitmap directly?


5) I think the explain for grouping sets need a rething. Teh function
show_grouping_set_keys was originally meant to print just the keys, but
now it's also printing the hash table stats. IMO we need a new function
printing a grouping set info - calling show_grouping_set_keys to print
the keys, but then also printing the extra hashtable info.


6) subplan explain

I probably agree the hashtable info should be included in the subplan,
not in the parent node. Otherwise it's confusing, particularly when the
node has multiple subplans. The one thing I find a bit strange is this:

explain (analyze, timing off, summary off, costs off)
select 'foo'::text in (select 'bar'::name union all select 'bar'::name);
                   QUERY PLAN
----------------------------------------------
  Result (actual rows=1 loops=1)
    SubPlan 1
      Buckets: 4 (originally 2)
      Null hashtable: Buckets: 2
      ->  Append (actual rows=2 loops=1)
            ->  Result (actual rows=1 loops=1)
            ->  Result (actual rows=1 loops=1)
(7 rows)

That is, there's no indication why would this use a hash table, because
the "hashed subplan" is included only in verbose mode:

explain (analyze, verbose, timing off, summary off, costs off)
select 'foo'::text in (select 'bar'::name union all select 'bar'::name);
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Result (actual rows=1 loops=1)
    Output: (hashed SubPlan 1)
    SubPlan 1
      Buckets: 4 (originally 2)  Memory Usage Hash: 1kB  Memory Usage Tuples: 1kB
      Null hashtable: Buckets: 2  Memory Usage Hash: 1kB  Memory Usage Tuples: 0kB
      ->  Append (actual rows=2 loops=1)
            ->  Result (actual rows=1 loops=1)
                  Output: 'bar'::name
            ->  Result (actual rows=1 loops=1)
                  Output: 'bar'::name
(10 rows)

Not sure if this is an issue, maybe it's fine. But it's definitely
strange that we only print memory info in verbose mode - IMHO it's much
more useful info than the number of buckets etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Sat, Feb 22, 2020 at 10:53:35PM +0100, Tomas Vondra wrote:
> I've started looking at this patch, because I've been long missing the

Thanks for looking

I have brief, initial comments before I revisit the patch.

> 3) Almost all executor nodes that are modified to include this new
> instrumentation struct also include TupleHashTable, and the data are
> essentially about the hash table. So my question is why not to include
> this into TupleHashTable - that would mean we don't need to modify any
> executor nodes, and it'd probably simplify code in explain.c too because
> we could simply pass the hashtable.

I considered this.  From 0004 commit message:

|    Also, if instrumentation were implemented in simplehash.h, I think every
|    insertion or deletion would need to check ->members and ->size (which isn't
|    necessary for Agg, but is necessary in the general case, and specifically for
|    tidbitmap, since it actually DELETEs hashtable entries).  Or else simplehash
|    would need a new function like UpdateTupleHashStats, which the higher level nodes
|    would need to call after filling the hashtable or before deleting tuples, which
|    seems to defeat the purpose of implementing stats at a lower layer.

> 4) The one exception to (3) is BitmapHeapScanState, which does include
> TIDBitmap and not TupleHashTable. And then we have tbm_instrumentation
> which "fakes" the data based on the pagetable. Maybe this is a sign that
> TIDBitmap needs a slightly different struct?

Hm, I'd say that it "collects" the data that's not immediately present, not
fake it.  But maybe I did it poorly.  Also, maybe TIDBitmap shouldn't be
included in the patch..

> Also, I'm not sure why we
> actually need tbm_instrumentation()? It just copies the instrumentation
> data from TIDBitmap into the node level, but why couldn't we just look
> at the instrumentation data in TIDBitmap directly?

See 0004 commit message:

|    TIDBitmap is a private structure, so add an accessor function to return its
|    instrumentation, and duplicate instrumentation struct in BitmapHeapState.

Also, I don't know what anyone else thinks, but I think 0005 is a throwaway
commit.  It's implemented more nicely in execGrouping.c.

> But it's definitely strange that we only print memory info in verbose mode -
> IMHO it's much more useful info than the number of buckets etc.

Because I wanted to be able to put "explain analyze" into regression tests
(which can show: "Buckets: 4 (originally 2)").  But cannot get stable output
for any plan which uses Sort, without hacks like explain_sq_limit and
explain_parallel_sort_stats.

Actually, I wish there were a way to control Sort nodes' Memory/Disk output,
too.  I'm sure most of regression tests were meant to be run as explain(analyze NO),
but it'd be much better if analyze YES were reasonably easy in the general
case that might include Sort.  If someone seconds that, I will start a separate
thread.

-- 
Justin Pryzby



Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Sat, Feb 22, 2020 at 10:53:35PM +0100, Tomas Vondra wrote:
> 1) explain.c API
> 
> The functions in explain.c (even the static ones) follow the convention
> that the last parameter is (ExplainState *es). I think we should stick
> to this, so the new parameters should be added before it.

I found it weird to have the "constant" arguments at the end rather than at the
beginning.  (Also, these don't follow that convention: show_buffer_usage
ExplainSaveGroup ExplainRestoreGroup ExplainOneQuery ExplainPrintJIT).

But done.

> Also, the first show_grouping_sets should be renamed to aggstate to make
> it consistent with the type change.

The prototype wasn't updated - fixed.

> 2) The hash_instrumentation is a bit inconsistent with what we already
> have ..HashTableInstrumentation..

Thanks for thinking of a better name.

> 5) I think the explain for grouping sets need a rething. Teh function
> show_grouping_set_keys was originally meant to print just the keys, but
> now it's also printing the hash table stats. IMO we need a new function
> printing a grouping set info - calling show_grouping_set_keys to print
> the keys, but then also printing the extra hashtable info.

I renamed it, and did the rest in a separate patch for now, since I'm only
partially convinced it's an improvement.

> 6) subplan explain
> 
> That is, there's no indication why would this use a hash table, because
> the "hashed subplan" is included only in verbose mode:

Need to think about that..

> Not sure if this is an issue, maybe it's fine. But it's definitely
> strange that we only print memory info in verbose mode - IMHO it's much
> more useful info than the number of buckets etc.

You're right that verbose isn't right for this.

I wrote patches creating new explain options to allow stable output of "explain
analyze", by avoiding Memory/Disk.  The only other way to handle it seems to be
to avoid "explain analyze" in regression tests, which is what's in common
practice anyway, so did that instead.

I also fixed wrong output and wrong non-text formatting for grouping sets,
tweaked output for subplan, and broke style rules less often.

-- 
Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Andres Freund
Date:
Hi,

On 2020-03-01 09:45:40 -0600, Justin Pryzby wrote:
> +/*
> + * Show hash bucket stats and (optionally) memory.
> + */
> +static void
> +show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es)
> +{
> +    long    spacePeakKb_tuples = (inst->space_peak_tuples + 1023) / 1024,
> +        spacePeakKb_hash = (inst->space_peak_hash + 1023) / 1024;

Let's not add further uses of long. It's terrible because it has
different widths on 64bit windows and everything else. Specify the
widths explicitly, or use something like size_t.


> +    if (es->format != EXPLAIN_FORMAT_TEXT)
> +    {
> +        ExplainPropertyInteger("Hash Buckets", NULL,
> +                               inst->nbuckets, es);
> +        ExplainPropertyInteger("Original Hash Buckets", NULL,
> +                               inst->nbuckets_original, es);
> +        ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
> +                               spacePeakKb_hash, es);
> +        ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",
> +                               spacePeakKb_tuples, es);

And then you're passing the long to ExplainPropertyInteger which accepts
a int64, making the use of long above particularly suspicious.

I wonder if it would make sense to add a ExplainPropertyBytes(), that
would do the rounding etc automatically. It could then also switch units
as appropriate.


> +    }
> +    else if (!inst->nbuckets)
> +        ; /* Do nothing */
> +    else
> +    {
> +        if (inst->nbuckets_original != inst->nbuckets)
> +        {
> +            ExplainIndentText(es);
> +            appendStringInfo(es->str,
> +                        "Buckets: %ld (originally %ld)",
> +                        inst->nbuckets,
> +                        inst->nbuckets_original);
> +        }
> +        else
> +        {
> +            ExplainIndentText(es);
> +            appendStringInfo(es->str,
> +                        "Buckets: %ld",
> +                        inst->nbuckets);
> +        }
> +
> +        if (es->analyze)
> +            appendStringInfo(es->str,
> +                    "  Memory Usage: hashtable: %ldkB, tuples: %ldkB",
> +                    spacePeakKb_hash, spacePeakKb_tuples);
> +        appendStringInfoChar(es->str, '\n');

I'm not sure I like the alternative output formats here. All the other
fields are separated with a comma, but the original size is in
parens. I'd probably just format it as "Buckets: %lld " and then add
", Original Buckets: %lld" when differing.

Also, %ld is problematic, because it's only 32bit wide on some platforms
(including 64bit windows), but 64bit on others. The easiest way to deal
with that is to use %lld and cast the argument to long long - yes that's
a sad workaround.


> +/* Update instrumentation stats */
> +void
> +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> +{
> +    hashtable->instrument.nbuckets = hashtable->hashtab->size;
> +    if (initial)
> +    {
> +        hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> +        hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> +            sizeof(TupleHashEntryData);
> +        hashtable->instrument.space_peak_tuples = 0;
> +    }
> +    else
> +    {
> +#define maxself(a,b) a=Max(a,b)
> +        /* hashtable->entrysize includes additionalsize */
> +        maxself(hashtable->instrument.space_peak_hash,
> +                hashtable->hashtab->size * sizeof(TupleHashEntryData));
> +        maxself(hashtable->instrument.space_peak_tuples,
> +                hashtable->hashtab->members * hashtable->entrysize);
> +#undef maxself
> +    }
> +}

Not a fan of this macro.

I'm also not sure I understand what you're trying to do here?


> diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> index f457b5b150..b173b32cab 100644
> --- a/src/test/regress/expected/aggregates.out
> +++ b/src/test/regress/expected/aggregates.out
> @@ -517,10 +517,11 @@ order by 1, 2;
>           ->  HashAggregate
>                 Output: s2.s2, sum((s1.s1 + s2.s2))
>                 Group Key: s2.s2
> +               Buckets: 4
>                 ->  Function Scan on pg_catalog.generate_series s2
>                       Output: s2.s2
>                       Function Call: generate_series(1, 3)
> -(14 rows)
> +(15 rows)

These tests probably won't be portable. The number of hash buckets
calculated will e.g. depend onthe size of the contained elements. And
that'll e.g. will depend on whether pointers are 4 or 8 bytes.


Greetings,

Andres Freund



Re: explain HashAggregate to report bucket and memory stats

From
Tomas Vondra
Date:
On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> ...
>
>> +    }
>> +    else if (!inst->nbuckets)
>> +        ; /* Do nothing */
>> +    else
>> +    {
>> +        if (inst->nbuckets_original != inst->nbuckets)
>> +        {
>> +            ExplainIndentText(es);
>> +            appendStringInfo(es->str,
>> +                        "Buckets: %ld (originally %ld)",
>> +                        inst->nbuckets,
>> +                        inst->nbuckets_original);
>> +        }
>> +        else
>> +        {
>> +            ExplainIndentText(es);
>> +            appendStringInfo(es->str,
>> +                        "Buckets: %ld",
>> +                        inst->nbuckets);
>> +        }
>> +
>> +        if (es->analyze)
>> +            appendStringInfo(es->str,
>> +                    "  Memory Usage: hashtable: %ldkB, tuples: %ldkB",
>> +                    spacePeakKb_hash, spacePeakKb_tuples);
>> +        appendStringInfoChar(es->str, '\n');
>
>I'm not sure I like the alternative output formats here. All the other
>fields are separated with a comma, but the original size is in
>parens. I'd probably just format it as "Buckets: %lld " and then add
>", Original Buckets: %lld" when differing.
>

FWIW this copies hashjoin precedent, which does this:

     appendStringInfo(es->str,
                      "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
                      hinstrument.nbuckets,
             ...

I agree it's not ideal, but maybe let's not invent new ways to format
the same type of info.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > +            ExplainIndentText(es);
> > +            appendStringInfo(es->str,
> > +                        "Buckets: %ld (originally %ld)",
> > +                        inst->nbuckets,
> > +                        inst->nbuckets_original);
> 
> I'm not sure I like the alternative output formats here. All the other
> fields are separated with a comma, but the original size is in
> parens. I'd probably just format it as "Buckets: %lld " and then add
> ", Original Buckets: %lld" when differing.

It's done that way for consistency with hashJoin in show_hash_info().

> > +/* Update instrumentation stats */
> > +void
> > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> > +{
> > +    hashtable->instrument.nbuckets = hashtable->hashtab->size;
> > +    if (initial)
> > +    {
> > +        hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> > +        hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> > +            sizeof(TupleHashEntryData);
> > +        hashtable->instrument.space_peak_tuples = 0;
> > +    }
> > +    else
> > +    {
> > +#define maxself(a,b) a=Max(a,b)
> > +        /* hashtable->entrysize includes additionalsize */
> > +        maxself(hashtable->instrument.space_peak_hash,
> > +                hashtable->hashtab->size * sizeof(TupleHashEntryData));
> > +        maxself(hashtable->instrument.space_peak_tuples,
> > +                hashtable->hashtab->members * hashtable->entrysize);
> > +#undef maxself
> > +    }
> > +}
> 
> Not a fan of this macro.
> 
> I'm also not sure I understand what you're trying to do here?

I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations.  If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.

> > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > index f457b5b150..b173b32cab 100644
> > --- a/src/test/regress/expected/aggregates.out
> > +++ b/src/test/regress/expected/aggregates.out
> > @@ -517,10 +517,11 @@ order by 1, 2;
> >           ->  HashAggregate
> >                 Output: s2.s2, sum((s1.s1 + s2.s2))
> >                 Group Key: s2.s2
> > +               Buckets: 4
> >                 ->  Function Scan on pg_catalog.generate_series s2
> >                       Output: s2.s2
> >                       Function Call: generate_series(1, 3)
> > -(14 rows)
> > +(15 rows)
> 
> These tests probably won't be portable. The number of hash buckets
> calculated will e.g. depend onthe size of the contained elements. And
> that'll e.g. will depend on whether pointers are 4 or 8 bytes.

I was aware and afraid of that.  Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode.  But as Tomas
pointed out, that's consistent with what's done elsewhere.

So is the solution to show stats only during explain ANALYZE ?

Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk.  That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee).  But does allow retiring
explain_sq_limit and explain_parallel_sort_stats.  I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:".  I guess in
either case, the tests shouldn't be included.

-- 
Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Andres Freund
Date:
Hi,

On 2020-03-06 15:33:10 -0600, Justin Pryzby wrote:
> On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > > +            ExplainIndentText(es);
> > > +            appendStringInfo(es->str,
> > > +                        "Buckets: %ld (originally %ld)",
> > > +                        inst->nbuckets,
> > > +                        inst->nbuckets_original);
> > 
> > I'm not sure I like the alternative output formats here. All the other
> > fields are separated with a comma, but the original size is in
> > parens. I'd probably just format it as "Buckets: %lld " and then add
> > ", Original Buckets: %lld" when differing.
> 
> It's done that way for consistency with hashJoin in show_hash_info().

Fair. I don't like it, but it's not this patch's fault.


> > > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > > index f457b5b150..b173b32cab 100644
> > > --- a/src/test/regress/expected/aggregates.out
> > > +++ b/src/test/regress/expected/aggregates.out
> > > @@ -517,10 +517,11 @@ order by 1, 2;
> > >           ->  HashAggregate
> > >                 Output: s2.s2, sum((s1.s1 + s2.s2))
> > >                 Group Key: s2.s2
> > > +               Buckets: 4
> > >                 ->  Function Scan on pg_catalog.generate_series s2
> > >                       Output: s2.s2
> > >                       Function Call: generate_series(1, 3)
> > > -(14 rows)
> > > +(15 rows)
> > 
> > These tests probably won't be portable. The number of hash buckets
> > calculated will e.g. depend onthe size of the contained elements. And
> > that'll e.g. will depend on whether pointers are 4 or 8 bytes.
> 
> I was aware and afraid of that.  Previously, I added this output only to
> "explain analyze", and (as an quick, interim implementation) changed various
> tests to use analyze, and memory only shown in "verbose" mode.  But as Tomas
> pointed out, that's consistent with what's done elsewhere.
> 
> So is the solution to show stats only during explain ANALYZE ?
> 
> Or ... I have a patch to create a new explain(MACHINE) option to allow more
> stable output, by avoiding Memory/Disk.  That doesn't attempt to make all
> "explain analyze" output stable - there's other issues, I think mostly related
> to parallel workers (see 4ea03f3f, 13e8b2ee).  But does allow retiring
> explain_sq_limit and explain_parallel_sort_stats.  I'm including my patch to
> show what I mean, but I didn't enable it for hashtable "Buckets:".  I guess in
> either case, the tests shouldn't be included.

Yea, there's been recent discussion about an argument like that. See
e.g.
https://www.postgresql.org/message-id/18494.1580079189%40sss.pgh.pa.us

Greetings,

Andres Freund



Re: explain HashAggregate to report bucket and memory stats

From
Jeff Davis
Date:
+        /* hashtable->entrysize includes additionalsize */
+        hashtable->instrument.space_peak_hash = Max(
+            hashtable->instrument.space_peak_hash,
+            hashtable->hashtab->size *
sizeof(TupleHashEntryData));
+
+        hashtable->instrument.space_peak_tuples = Max(
+            hashtable->instrument.space_peak_tuples,
+                hashtable->hashtab->members *
hashtable->entrysize);

I think, in general, we should avoid estimates/projections for
reporting and try to get at a real number, like
MemoryContextMemAllocated(). (Aside: I may want to tweak exactly what
that function reports so that it doesn't count the unused portion of
the last block.)

For instance, the report is still not accurate, because it doesn't
account for pass-by-ref transition state values.

To use memory-context-based reporting, it's hard to make the stats a
part of the tuple hash table, because the tuple hash table doesn't own
the memory contexts (they are passed in). It's also hard to make it
per-hashtable (e.g. for grouping sets), unless we put each grouping set
in its own memory context.

Also, is there a reason you report two different memory values
(hashtable and tuples)? I don't object, but it seems like a little too
much detail.

Regards,
    Jeff Davis





Re: explain HashAggregate to report bucket and memory stats

From
Andres Freund
Date:
Hi,

On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> Also, is there a reason you report two different memory values
> (hashtable and tuples)? I don't object, but it seems like a little too
> much detail.

Seems useful to me - the hashtable is pre-allocated based on estimates,
whereas the tuples are allocated "on demand". So seeing the difference
will allow to investigate the more crucial issue...

Greetings,

Andres Freund



Re: explain HashAggregate to report bucket and memory stats

From
Jeff Davis
Date:
On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > Also, is there a reason you report two different memory values
> > (hashtable and tuples)? I don't object, but it seems like a little
> > too
> > much detail.
> 
> Seems useful to me - the hashtable is pre-allocated based on
> estimates,
> whereas the tuples are allocated "on demand". So seeing the
> difference
> will allow to investigate the more crucial issue...

Then do we also want to report separately on the by-ref transition
values? That could be useful if you are using ARRAY_AGG and the states
grow larger than you might expect.

Regards,
    Jeff Davis





Re: explain HashAggregate to report bucket and memory stats

From
Andres Freund
Date:
Hi,

On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > Also, is there a reason you report two different memory values
> > > (hashtable and tuples)? I don't object, but it seems like a little
> > > too
> > > much detail.
> > 
> > Seems useful to me - the hashtable is pre-allocated based on
> > estimates,
> > whereas the tuples are allocated "on demand". So seeing the
> > difference
> > will allow to investigate the more crucial issue...
> 
> Then do we also want to report separately on the by-ref transition
> values? That could be useful if you are using ARRAY_AGG and the states
> grow larger than you might expect.

I can see that being valuable - I've had to debug cases with too much
memory being used due to aggregate transitions before. Right now it'd be
mixed in with tuples, I believe - and we'd need a separate context for
tracking the transition values? Due to that I'm inclined to not report
separately for now.

Greetings,

Andres Freund



Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > Also, is there a reason you report two different memory values
> > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > much detail.
> > > 
> > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > whereas the tuples are allocated "on demand". So seeing the difference
> > > will allow to investigate the more crucial issue...

> > Then do we also want to report separately on the by-ref transition
> > values? That could be useful if you are using ARRAY_AGG and the states
> > grow larger than you might expect.
> 
> I can see that being valuable - I've had to debug cases with too much
> memory being used due to aggregate transitions before. Right now it'd be
> mixed in with tuples, I believe - and we'd need a separate context for
> tracking the transition values? Due to that I'm inclined to not report
> separately for now.

I think that's already in a separate context indexed by grouping set:
src/include/nodes/execnodes.h:  ExprContext **aggcontexts;      /* econtexts for long-lived data (per GS) */

But the hashtable and tuples are combined.  I put them in separate contexts and
rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.

But didn't do anything yet with the aggcontexts.

Now I can get output like:

|template1=# explain analyze SELECT i,COUNT(1) FROM t GROUP BY 1;
| HashAggregate  (cost=4769.99..6769.98 rows=199999 width=12) (actual time=266.465..27020.333 rows=199999 loops=1)
|   Group Key: i
|   Buckets: 524288 (originally 262144)
|   Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
|   Disk Usage: 192 kB
|   HashAgg Batches: 3874
|   ->  Seq Scan on t  (cost=0.00..3769.99 rows=199999 width=4) (actual time=13.043..64.017 rows=199999 loops=1)

It looks somewhat funny next to hash join, which puts everything on one line:

|template1=# explain  analyze SELECT i,COUNT(1) FROM t a JOIN t b USING(i) GROUP BY 1;
| HashAggregate  (cost=13789.95..15789.94 rows=199999 width=12) (actual time=657.733..27129.873 rows=199999 loops=1)
|   Group Key: a.i
|   Buckets: 524288 (originally 262144)
|   Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
|   Disk Usage: 192 kB
|   HashAgg Batches: 3874
|   ->  Hash Join  (cost=6269.98..12789.95 rows=199999 width=4) (actual time=135.932..426.071 rows=199999 loops=1)
|         Hash Cond: (a.i = b.i)
|         ->  Seq Scan on t a  (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.265..47.598 rows=199999
loops=1)
|         ->  Hash  (cost=3769.99..3769.99 rows=199999 width=4) (actual time=131.881..131.882 rows=199999 loops=1)
|               Buckets: 262144  Batches: 1  Memory Usage: 9080kB
|               ->  Seq Scan on t b  (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.273..40.163 rows=199999
loops=1)

-- 
Justin

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Justin Pryzby
Date:
On Fri, Mar 20, 2020 at 03:44:42AM -0500, Justin Pryzby wrote:
> On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> > On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > > Also, is there a reason you report two different memory values
> > > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > > much detail.
> > > > 
> > > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > > whereas the tuples are allocated "on demand". So seeing the difference
> > > > will allow to investigate the more crucial issue...
> 
> > > Then do we also want to report separately on the by-ref transition
> > > values? That could be useful if you are using ARRAY_AGG and the states
> > > grow larger than you might expect.
> > 
> > I can see that being valuable - I've had to debug cases with too much
> > memory being used due to aggregate transitions before. Right now it'd be
> > mixed in with tuples, I believe - and we'd need a separate context for
> > tracking the transition values? Due to that I'm inclined to not report
> > separately for now.
> 
> I think that's already in a separate context indexed by grouping set:
> src/include/nodes/execnodes.h:  ExprContext **aggcontexts;      /* econtexts for long-lived data (per GS) */
> 
> But the hashtable and tuples are combined.  I put them in separate contexts and
> rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.

I forgot to say that I'd also switched started using memory context based
accounting.

90% of the initial goal of this patch was handled by instrumentation added by
"hash spill to disk" (1f39bce02), but this *also* adds:

 - separate accounting for tuples vs hashtable;
 - number of hash buckets;
 - handles other agg nodes, and bitmap scan;

Should I continue pursuing this patch?
Does it still serve any significant purpose?

template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ;
 HashAggregate (actual time=1070.713..2287.011 rows=999999 loops=1)
   Group Key: a
   Buckets: 32768 (originally 512)
   Peak Memory Usage: hashtable: 777kB, tuples: 4096kB
   Disk Usage: 22888 kB
   HashAgg Batches: 84
   ->  Function Scan on generate_series a (actual time=238.270..519.832 rows=999999 loops=1)

template1=# explain analyze SELECT * FROM t WHERE a BETWEEN 999 AND 99999;
 Bitmap Heap Scan on t  (cost=4213.01..8066.67 rows=197911 width=4) (actual time=26.803..84.693 rows=198002 loops=1)
   Recheck Cond: ((a >= 999) AND (a <= 99999))
   Heap Blocks: exact=878
   Buckets: 1024 (originally 256)
   Peak Memory Usage: hashtable: 48kB, tuples: 4kB

template1=# explain analyze SELECT generate_series(1,99999) EXCEPT SELECT generate_series(1,999);
 HashSetOp Except  (cost=0.00..2272.49 rows=99999 width=8) (actual time=135.986..174.656 rows=99000 loops=1)
   Buckets: 262144 (originally 131072)
   Peak Memory Usage: hashtable: 6177kB, tuples: 8192kB

@cfbot: rebased

Attachment

Re: explain HashAggregate to report bucket and memory stats

From
Jeff Davis
Date:
On Wed, 2020-04-08 at 16:00 -0500, Justin Pryzby wrote:
> 90% of the initial goal of this patch was handled by instrumentation
> added by
> "hash spill to disk" (1f39bce02), but this *also* adds:
> 
>  - separate accounting for tuples vs hashtable;
>  - number of hash buckets;
>  - handles other agg nodes, and bitmap scan;
> 
> Should I continue pursuing this patch?
> Does it still serve any significant purpose?

Those things would be useful for me trying to tune the performance and
cost model. I think we need to put some of these things under "VERBOSE"
or maybe invent a new explain option to provide this level of detail,
though.

Regards,
    Jeff Davis





Re: explain HashAggregate to report bucket and memory stats

From
Daniel Gustafsson
Date:
> On 9 Apr 2020, at 00:24, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2020-04-08 at 16:00 -0500, Justin Pryzby wrote:
>> 90% of the initial goal of this patch was handled by instrumentation
>> added by
>> "hash spill to disk" (1f39bce02), but this *also* adds:
>>
>> - separate accounting for tuples vs hashtable;
>> - number of hash buckets;
>> - handles other agg nodes, and bitmap scan;
>>
>> Should I continue pursuing this patch?
>> Does it still serve any significant purpose?
>
> Those things would be useful for me trying to tune the performance and
> cost model. I think we need to put some of these things under "VERBOSE"
> or maybe invent a new explain option to provide this level of detail,
> though.

This thread has stalled and the patch has been Waiting on Author since March,
and skimming the thread there seems to be questions raised over the value
proposition.  Is there progress happening behind the scenes or should we close
this entry for now, to re-open in case there is renewed activity/interest?

cheers ./daniel


Re: explain HashAggregate to report bucket and memory stats

From
Daniel Gustafsson
Date:
> On 12 Jul 2020, at 21:52, Daniel Gustafsson <daniel@yesql.se> wrote:

> This thread has stalled and the patch has been Waiting on Author since March,
> and skimming the thread there seems to be questions raised over the value
> proposition.  Is there progress happening behind the scenes or should we close
> this entry for now, to re-open in case there is renewed activity/interest?

With not too many days of the commitfest left, I'm closing this in 2020-07.
Please feel free to add a new entry if there is renewed interest in this patch.

cheers ./daniel