Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200724084047.hnhigkgagzifuitd@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote:
>On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> So let me share some fresh I/O statistics collected on the current code
>> using iosnoop. I've done the tests on two different machines using the
>> "aggregate part" of TPC-H Q17, i.e. essentially this:
>>
>>      SELECT * FROM (
>>         SELECT
>>             l_partkey AS agg_partkey,
>>             0.2 * avg(l_quantity) AS avg_quantity
>>         FROM lineitem GROUP BY l_partkey OFFSET 1000000000
>>      ) part_agg;
>>
>> The OFFSET is there just to ensure we don't need to send anything to
>> the client, etc.
>
>Thanks for testing this.
>
>> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
>> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
>> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
>> much more than the 3.4GB of data written by sort (which has to spill
>> everything, while hashagg only spills rows not covered by the groups
>> that fit into work_mem).
>
>What I find when I run your query (with my own TPC-H DB that is
>smaller than what you used here -- 59,986,052 lineitem tuples) is that
>the sort required about 7x more memory than the hash agg to do
>everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak
>hash agg memory usage. I'd be surprised if the ratio was very
>different for you -- but can you check?
>

I can check, but it's not quite clear to me what are we looking for?
Increase work_mem until there's no need to spill in either case?

>I think that there is something pathological about this spill
>behavior, because it sounds like the precise opposite of what you
>might expect when you make a rough extrapolation of what disk I/O will
>be based on the memory used in no-spill cases (as reported by EXPLAIN
>ANALYZE).
>

Maybe, not sure what exactly you think is pathological? The trouble is
hashagg has to spill input tuples but the memory used in no-spill case
represents aggregated groups, so I'm not sure how you could extrapolate
from that ...

FWIW one more suspicious thing that I forgot to mention is the behavior
of the "planned partitions" depending on work_mem, which looks like
this:

       2MB       Planned Partitions:  64    HashAgg Batches:  4160
       4MB       Planned Partitions: 128    HashAgg Batches: 16512
       8MB       Planned Partitions: 256    HashAgg Batches: 21488
      64MB       Planned Partitions:  32    HashAgg Batches:  2720
     256MB       Planned Partitions:   8    HashAgg Batches:     8

I'd expect the number of planned partitions to decrease (slowly) as
work_mem increases, but it seems to increase initially. Seems a bit
strange, but maybe it's expected.

>> What I find really surprising is the costing - despite writing about
>> twice as much data, the hashagg cost is estimated to be much lower than
>> the sort. For example on the i5 machine, the hashagg cost is ~10M, while
>> sort cost is almost 42M. Despite using almost twice as much disk. And
>> the costing is exactly the same for master and the CP_SMALL_TLIST.
>
>That does make it sound like the costs of the hash agg aren't being
>represented. I suppose it isn't clear if this is a costing issue
>because it isn't clear if the execution time performance itself is
>pathological or is instead something that must be accepted as the cost
>of spilling the hash agg in a general kind of way.
>

Not sure, but I think we need to spill roughly as much as sort, so it
seems a bit strange that (a) we're spilling 2x as much data and yet the
cost is so much lower.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: deferred primary key and logical replication
Next
From: Ashutosh Sharma
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."