On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > Also, is there a reason you report two different memory values
> > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > much detail.
> > >
> > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > whereas the tuples are allocated "on demand". So seeing the difference
> > > will allow to investigate the more crucial issue...
> > Then do we also want to report separately on the by-ref transition
> > values? That could be useful if you are using ARRAY_AGG and the states
> > grow larger than you might expect.
>
> I can see that being valuable - I've had to debug cases with too much
> memory being used due to aggregate transitions before. Right now it'd be
> mixed in with tuples, I believe - and we'd need a separate context for
> tracking the transition values? Due to that I'm inclined to not report
> separately for now.
I think that's already in a separate context indexed by grouping set:
src/include/nodes/execnodes.h: ExprContext **aggcontexts; /* econtexts for long-lived data (per GS) */
But the hashtable and tuples are combined. I put them in separate contexts and
rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.
But didn't do anything yet with the aggcontexts.
Now I can get output like:
|template1=# explain analyze SELECT i,COUNT(1) FROM t GROUP BY 1;
| HashAggregate (cost=4769.99..6769.98 rows=199999 width=12) (actual time=266.465..27020.333 rows=199999 loops=1)
| Group Key: i
| Buckets: 524288 (originally 262144)
| Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
| Disk Usage: 192 kB
| HashAgg Batches: 3874
| -> Seq Scan on t (cost=0.00..3769.99 rows=199999 width=4) (actual time=13.043..64.017 rows=199999 loops=1)
It looks somewhat funny next to hash join, which puts everything on one line:
|template1=# explain analyze SELECT i,COUNT(1) FROM t a JOIN t b USING(i) GROUP BY 1;
| HashAggregate (cost=13789.95..15789.94 rows=199999 width=12) (actual time=657.733..27129.873 rows=199999 loops=1)
| Group Key: a.i
| Buckets: 524288 (originally 262144)
| Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
| Disk Usage: 192 kB
| HashAgg Batches: 3874
| -> Hash Join (cost=6269.98..12789.95 rows=199999 width=4) (actual time=135.932..426.071 rows=199999 loops=1)
| Hash Cond: (a.i = b.i)
| -> Seq Scan on t a (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.265..47.598 rows=199999
loops=1)
| -> Hash (cost=3769.99..3769.99 rows=199999 width=4) (actual time=131.881..131.882 rows=199999 loops=1)
| Buckets: 262144 Batches: 1 Memory Usage: 9080kB
| -> Seq Scan on t b (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.273..40.163 rows=199999
loops=1)
--
Justin