Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Memory-Bounded Hash Aggregation
Date
Msg-id 20200312210146.GG29065@telsasoft.com
Whole thread Raw
In response to Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, Mar 11, 2020 at 11:55:35PM -0700, Jeff Davis wrote:
>  * tweaked EXPLAIN output some more
> Unless I (or someone else) finds something significant, this is close
> to commit.

Thanks for working on this ; I finally made a pass over the patch.

+++ b/doc/src/sgml/config.sgml
+      <term><varname>enable_groupingsets_hash_disk</varname> (<type>boolean</type>)
+        Enables or disables the query planner's use of hashed aggregation for
+        grouping sets when the size of the hash tables is expected to exceed
+        <varname>work_mem</varname>.  See <xref
+        linkend="queries-grouping-sets"/>.  Note that this setting only
+        affects the chosen plan; execution time may still require using
+        disk-based hash aggregation.  ...
...
+      <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>)
+        ... This only affects the planner choice;
+        execution time may still require using disk-based hash
+        aggregation. The default is <literal>on</literal>.

I don't understand what's meant by "the chosen plan".
Should it say, "at execution ..." instead of "execution time" ?

+        Enables or disables the query planner's use of hashed aggregation plan
+        types when the memory usage is expected to exceed

Either remove "plan types" for consistency with enable_groupingsets_hash_disk,
Or add it there.  Maybe it should say "when the memory usage would OTHERWISE BE
expected to exceed.."

+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+    Agg        *agg       = (Agg *)aggstate->ss.ps.plan;
+    long     memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;

I see this partially duplicates my patch [0] to show memory stats for (at
Andres' suggestion) all of execGrouping.c.  Perhaps you'd consider naming the
function something more generic in case my patch progresses ?  I'm using:
|show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es);

Mine also shows:
|ExplainPropertyInteger("Original Hash Buckets", NULL,
|ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
|ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",

[0] https://www.postgresql.org/message-id/20200306213310.GM684%40telsasoft.com

You added hash_mem_peak and hash_batches_used to struct AggState.
In my 0001 patch, I added instrumentation to struct TupleHashTable, and in my
0005 patch I move it into AggStatePerHashData and other State nodes.

+    if (from_tape)
+        partition_mem += HASHAGG_READ_BUFFER_SIZE;
+    partition_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;

=> That looks wrong ; should say += ?

+            gettext_noop("Enables the planner's use of hashed aggregation plans that are expected to exceed
work_mem."),

should say:
"when the memory usage is otherwise be expected to exceed.."

-- 
Justin



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library
Next
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)