Re: 9.5: Better memory accounting, towards memory-bounded HashAgg - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date
Msg-id 1419799025.24895.59.camel@jeff-desktop
Whole thread Raw
In response to Re: 9.5: Better memory accounting, towards memory-bounded HashAgg  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
List pgsql-hackers
On Tue, 2014-12-23 at 01:16 -0800, Jeff Davis wrote:
> New patch attached (rebased, as well).
>
> I also see your other message about adding regression testing. I'm
> hesitant to slow down the tests for everyone to run through this code
> path though. Should I add regression tests, and then remove them later
> after we're more comfortable that it works?

Attached are some tests I ran. First, generate the data sets with
hashagg_test_data.sql. Then, do (I used work_mem at default of 4MB):

  set enable_hashagg=false;
  \o /tmp/sort.out
  \i /tmp/hashagg_test.sql
  \o
  set enable_hashagg=true;
  \o /tmp/hash.out
  \i /tmp/hashagg_test.sql

and then diff'd the output to make sure the results are the same (except
the plans, of course). The script loads the results into a temp table,
then sorts it before outputting, to make the test order-independent. I
didn't just add an ORDER BY, because that would change the plan and it
would never use hashagg.

I think that has fairly good coverage of the hashagg code. I used 3
different input data sets, byval and byref types (for group key and
args), and a group aggregate query as well as DISTINCT. Let me know if I
missed something.

I also did some performance comparisons between disk-based sort+group
and disk-based hashagg. The results are quite favorable for hashagg
given the data sets I provided. Simply create the data using
hashagg_test_data.sql (if not already done), set the work_mem to the
value you want to test, and run hashagg_test_perf.sql. It uses EXPLAIN
ANALYZE for the timings.

singleton: 10M groups of 1
even: 1M groups of 10
skew: wildly different group sizes; see data script

q1: group aggregate query
q2: distinct query

The total memory requirements for the test to run without going to disk
ranges from about 100MB (for "even") to about 1GB (for "singleton").
Regardless of work_mem, these all fit in memory on my machine, so they
aren't *really* going to disk. Also note that, because of how the memory
blocks are allocated, and that hashagg waits until memory is exceeded,
then hashagg might use about double work_mem when work_mem is small (the
effect is not important at higher values).

work_mem='1MB':
                  sort+group (s)    hashagg (s)
   singleton q1           12             10
   singleton q2            8              7
   even q1                14              7
   even q2                10              5
   skew q1                22              6
   skew q2                16              4

work_mem='4MB':
                  sort+group (s)    hashagg (s)
   singleton q1           12             11
   singleton q2            8              6
   even q1                12              7
   even q2                 9              5
   skew q1                19              6
   skew q2                13              3

work_mem='16MB':
                  sort+group (s)    hashagg (s)
   singleton q1           12             11
   singleton q2            8              7
   even q1                14              7
   even q2                10              5
   skew q1                15              6
   skew q2                12              4

work_mem='64MB':
                  sort+group (s)    hashagg (s)
   singleton q1           13             12
   singleton q2            9              8
   even q1                14              8
   even q2                10              5
   skew q1                17              6
   skew q2                13              4

work_mem='256MB':
                  sort+group (s)    hashagg (s)
   singleton q1           12             12
   singleton q2            9              8
   even q1                14              7
   even q2                11              4
   skew q1                16              6
   skew q2                13              4

work_mem='512MB':
                  sort+group (s)    hashagg (s)
   singleton q1           12             12
   singleton q2            9              7
   even q1                14              7
   even q2                10              4
   skew q1                16              6
   skew q2                12              4

work_mem='2GB':
                  sort+group (s)    hashagg (s)
   singleton q1            9             12
   singleton q2            6              6
   even q1                 8              7
   even q2                 6              4
   skew q1                 7              6
   skew q2                 5              4


These numbers are great news for disk-based hashagg. It seems to be the
same or better than sort+group in nearly all cases (again, this example
doesn't actually go to disk, so those numbers may come out differently).
Also, the numbers are remarkably stable for varying work_mem for both
plans. That means that it doesn't cost much to keep a lower work_mem as
long as your system has plenty of memory.

Do others have similar numbers? I'm quite surprised at how little
work_mem seems to matter for these plans (HashJoin might be a different
story though). I feel like I made a mistake -- can someone please do a
sanity check on my numbers?

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Oskari Saarenmaa
Date:
Subject: Re: Proposal "VACUUM SCHEMA"
Next
From: Peter Geoghegan
Date:
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg