Re: explain HashAggregate to report bucket and memory stats - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: explain HashAggregate to report bucket and memory stats |
Date | |
Msg-id | 20200222215335.bu4kky7el4nccls5@development Whole thread Raw |
In response to | Re: explain HashAggregate to report bucket and memory stats (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: explain HashAggregate to report bucket and memory stats
Re: explain HashAggregate to report bucket and memory stats |
List | pgsql-hackers |
Hi, I've started looking at this patch, because I've been long missing the information about hashagg hash table, so I'm pleased someone is working on this. In general I agree it may be useful to add simila information to other nodes using a hashtable, but IMHO the hashagg bit is the most useful, so maybe we should focus on it first. A couple of comments after an initial review of the patches: 1) explain.c API The functions in explain.c (even the static ones) follow the convention that the last parameter is (ExplainState *es). I think we should stick to this, so the new parameters should be added before it. For example instead of static void ExplainNode(PlanState *planstate, List *ancestors, const char *relationship, const char *plan_name, ExplainState *es, SubPlanState *subplanstate); we should do e.g. static void ExplainNode(PlanState *planstate, SubPlanState *subplanstate, List *ancestors, const char *relationship, const char *plan_name, ExplainState *es); Also, the first show_grouping_sets should be renamed to aggstate to make it consistent with the type change. 2) The hash_instrumentation is a bit inconsistent with what we already have. We have Instrumentation, JitInstrumentation, WorkerInstrumentation and HashInstrumentation, so I suggest we follow the same patter and call this HashTableInstrumentation or something like that. 3) Almost all executor nodes that are modified to include this new instrumentation struct also include TupleHashTable, and the data are essentially about the hash table. So my question is why not to include this into TupleHashTable - that would mean we don't need to modify any executor nodes, and it'd probably simplify code in explain.c too because we could simply pass the hashtable. It'd also simplify e.g. SubPlanState where we have to add two new fields and make sure to update the right one. 4) The one exception to (3) is BitmapHeapScanState, which does include TIDBitmap and not TupleHashTable. And then we have tbm_instrumentation which "fakes" the data based on the pagetable. Maybe this is a sign that TIDBitmap needs a slightly different struct? Also, I'm not sure why we actually need tbm_instrumentation()? It just copies the instrumentation data from TIDBitmap into the node level, but why couldn't we just look at the instrumentation data in TIDBitmap directly? 5) I think the explain for grouping sets need a rething. Teh function show_grouping_set_keys was originally meant to print just the keys, but now it's also printing the hash table stats. IMO we need a new function printing a grouping set info - calling show_grouping_set_keys to print the keys, but then also printing the extra hashtable info. 6) subplan explain I probably agree the hashtable info should be included in the subplan, not in the parent node. Otherwise it's confusing, particularly when the node has multiple subplans. The one thing I find a bit strange is this: explain (analyze, timing off, summary off, costs off) select 'foo'::text in (select 'bar'::name union all select 'bar'::name); QUERY PLAN ---------------------------------------------- Result (actual rows=1 loops=1) SubPlan 1 Buckets: 4 (originally 2) Null hashtable: Buckets: 2 -> Append (actual rows=2 loops=1) -> Result (actual rows=1 loops=1) -> Result (actual rows=1 loops=1) (7 rows) That is, there's no indication why would this use a hash table, because the "hashed subplan" is included only in verbose mode: explain (analyze, verbose, timing off, summary off, costs off) select 'foo'::text in (select 'bar'::name union all select 'bar'::name); QUERY PLAN ---------------------------------------------------------------------------------- Result (actual rows=1 loops=1) Output: (hashed SubPlan 1) SubPlan 1 Buckets: 4 (originally 2) Memory Usage Hash: 1kB Memory Usage Tuples: 1kB Null hashtable: Buckets: 2 Memory Usage Hash: 1kB Memory Usage Tuples: 0kB -> Append (actual rows=2 loops=1) -> Result (actual rows=1 loops=1) Output: 'bar'::name -> Result (actual rows=1 loops=1) Output: 'bar'::name (10 rows) Not sure if this is an issue, maybe it's fine. But it's definitely strange that we only print memory info in verbose mode - IMHO it's much more useful info than the number of buckets etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: