Re: explain HashAggregate to report bucket and memory stats - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: explain HashAggregate to report bucket and memory stats
Date
Msg-id 20200408210053.GK2228@telsasoft.com
Whole thread Raw
In response to Re: explain HashAggregate to report bucket and memory stats  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: explain HashAggregate to report bucket and memory stats  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Fri, Mar 20, 2020 at 03:44:42AM -0500, Justin Pryzby wrote:
> On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> > On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > > Also, is there a reason you report two different memory values
> > > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > > much detail.
> > > > 
> > > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > > whereas the tuples are allocated "on demand". So seeing the difference
> > > > will allow to investigate the more crucial issue...
> 
> > > Then do we also want to report separately on the by-ref transition
> > > values? That could be useful if you are using ARRAY_AGG and the states
> > > grow larger than you might expect.
> > 
> > I can see that being valuable - I've had to debug cases with too much
> > memory being used due to aggregate transitions before. Right now it'd be
> > mixed in with tuples, I believe - and we'd need a separate context for
> > tracking the transition values? Due to that I'm inclined to not report
> > separately for now.
> 
> I think that's already in a separate context indexed by grouping set:
> src/include/nodes/execnodes.h:  ExprContext **aggcontexts;      /* econtexts for long-lived data (per GS) */
> 
> But the hashtable and tuples are combined.  I put them in separate contexts and
> rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.

I forgot to say that I'd also switched started using memory context based
accounting.

90% of the initial goal of this patch was handled by instrumentation added by
"hash spill to disk" (1f39bce02), but this *also* adds:

 - separate accounting for tuples vs hashtable;
 - number of hash buckets;
 - handles other agg nodes, and bitmap scan;

Should I continue pursuing this patch?
Does it still serve any significant purpose?

template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ;
 HashAggregate (actual time=1070.713..2287.011 rows=999999 loops=1)
   Group Key: a
   Buckets: 32768 (originally 512)
   Peak Memory Usage: hashtable: 777kB, tuples: 4096kB
   Disk Usage: 22888 kB
   HashAgg Batches: 84
   ->  Function Scan on generate_series a (actual time=238.270..519.832 rows=999999 loops=1)

template1=# explain analyze SELECT * FROM t WHERE a BETWEEN 999 AND 99999;
 Bitmap Heap Scan on t  (cost=4213.01..8066.67 rows=197911 width=4) (actual time=26.803..84.693 rows=198002 loops=1)
   Recheck Cond: ((a >= 999) AND (a <= 99999))
   Heap Blocks: exact=878
   Buckets: 1024 (originally 256)
   Peak Memory Usage: hashtable: 48kB, tuples: 4kB

template1=# explain analyze SELECT generate_series(1,99999) EXCEPT SELECT generate_series(1,999);
 HashSetOp Except  (cost=0.00..2272.49 rows=99999 width=8) (actual time=135.986..174.656 rows=99000 loops=1)
   Buckets: 262144 (originally 131072)
   Peak Memory Usage: hashtable: 6177kB, tuples: 8192kB

@cfbot: rebased

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Next
From: Andres Freund
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()