Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200724131852.nul7wihayw4uadp7@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Fri, Jul 24, 2020 at 10:40:47AM +0200, Tomas Vondra wrote:
>On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote:
>>On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra
>><tomas.vondra@2ndquadrant.com> wrote:
>>>So let me share some fresh I/O statistics collected on the current code
>>>using iosnoop. I've done the tests on two different machines using the
>>>"aggregate part" of TPC-H Q17, i.e. essentially this:
>>>
>>>     SELECT * FROM (
>>>        SELECT
>>>            l_partkey AS agg_partkey,
>>>            0.2 * avg(l_quantity) AS avg_quantity
>>>        FROM lineitem GROUP BY l_partkey OFFSET 1000000000
>>>     ) part_agg;
>>>
>>>The OFFSET is there just to ensure we don't need to send anything to
>>>the client, etc.
>>
>>Thanks for testing this.
>>
>>>So sort writes ~3.4GB of data, give or take. But hashagg/master writes
>>>almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
>>>original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
>>>much more than the 3.4GB of data written by sort (which has to spill
>>>everything, while hashagg only spills rows not covered by the groups
>>>that fit into work_mem).
>>
>>What I find when I run your query (with my own TPC-H DB that is
>>smaller than what you used here -- 59,986,052 lineitem tuples) is that
>>the sort required about 7x more memory than the hash agg to do
>>everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak
>>hash agg memory usage. I'd be surprised if the ratio was very
>>different for you -- but can you check?
>>
>
>I can check, but it's not quite clear to me what are we looking for?
>Increase work_mem until there's no need to spill in either case?
>

FWIW the hashagg needs about 4775953kB and the sort 33677586kB. So yeah,
that's about 7x more. I think that's probably built into the TPC-H data
set. It'd be easy to construct cases with much higher/lower factors.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Next
From: Tom Lane
Date:
Subject: Re: renaming configure.in to configure.ac