On Sat, Jul 25, 2020 at 10:07:37AM -0700, Peter Geoghegan wrote:
>On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan <pg@bowt.ie> wrote:
>> "Peak Memory Usage: 1605334kB"
>>
>> Hash agg avoids spilling entirely (so the planner gets it right this
>> time around). It even uses notably less memory.
>
>I guess that this is because the reported memory usage doesn't reflect
>the space used for transition state, which is presumably most of the
>total -- array_agg() is used in the query.
>
I'm not sure what you mean by "reported memory usage doesn't reflect the
space used for transition state"? Surely it does include that, we've
built the memory accounting stuff pretty much exactly to do that.
I think it's pretty clear what's happening - in the sorted case there's
only a single group getting new values at any moment, so when we decide
to spill we'll only add rows to that group and everything else will be
spilled to disk.
In the unsorted case however we manage to initialize all groups in the
hash table, but at that point the groups are tiny an fit into work_mem.
As we process more and more data the groups grow, but we can't evict
them - at the moment we don't have that capability. So we end up
processing everything in memory, but significantly exceeding work_mem.
FWIW all my tests are done on the same TPC-H data set clustered by
l_shipdate (so probably random with respect to other columns).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services