Tom Lane <tgl@sss.pgh.pa.us> writes:
> AFAICT there's no actual leak here; array_agg is just optimized for
> speed rather than space. It eats about 8K per hashtable entry.
> While the planner knows that, it's got no good idea how many groups
> will be produced by the query, so it underestimates the space needed
> --- and the HashAggregate code is not currently capable of spilling
> the hashtable to disk, so the table balloons well past the intended
> work_mem limit.
Aha, I found this "8K per hashtable entry" with ltrace and now understand
what is going on here:
=== array_agg malloc calls count === === myagg malloc calls count===
3 malloc(1024) 3 malloc(1024)
3 malloc(1048576) 3 malloc(1048576)
3 malloc(131072) 3 malloc(131072)
1 malloc(16056) 4 malloc(16384)
5 malloc(16384) 2 malloc(16440)
2 malloc(16440) 1 malloc(2048)
1 malloc(2048) 3 malloc(2097152)
3 malloc(2097152) 3 malloc(262144)
3 malloc(262144) 3 malloc(32768)
3 malloc(32768) 1 malloc(32824)
1 malloc(32824) 1 malloc(4096)
1 malloc(4096) 3 malloc(4194304)
3 malloc(4194304) 3 malloc(524288)
3 malloc(524288) 3 malloc(65536)
3 malloc(65536) 12 malloc(8192)
724151 malloc(8192) 1 malloc(8296)
1 malloc(8296) 29 malloc(8360)
44 malloc(8360) 16 malloc(8388608)
8 malloc(8388608)
Thank you for answer Tom!
> Although no real fix for this is within easy reach, it strikes me
> that we could possibly ameliorate things a bit by tweaking the
> memory context size parameters used by accumArrayResult().
> It would likely be reasonable to set the min size to 1K or so not 8K.
> This would make things a trifle slower when the actual space requirement
> exceeds 1K, but probably not by enough to notice.
Looks good.
> BTW, I don't believe your assertion that the handmade aggregate does
> this in 7MB. Even a very optimistic calculation puts the space needed
> for 700000 10-element integer arrays at forty-some MB, and when I try
> it I see more like 100MB consumed thanks to hashtable overhead.
Yes you are right, Tom. My mistake.
--
Sergey Burladyan