Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id CAH2-Wz=QFBj=RhsqqC0-1a3f8BxLXP-x39vZ0dLjMyFmDXpDtw@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> So let me share some fresh I/O statistics collected on the current code
> using iosnoop. I've done the tests on two different machines using the
> "aggregate part" of TPC-H Q17, i.e. essentially this:
>
>      SELECT * FROM (
>         SELECT
>             l_partkey AS agg_partkey,
>             0.2 * avg(l_quantity) AS avg_quantity
>         FROM lineitem GROUP BY l_partkey OFFSET 1000000000
>      ) part_agg;
>
> The OFFSET is there just to ensure we don't need to send anything to
> the client, etc.

Thanks for testing this.

> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
> much more than the 3.4GB of data written by sort (which has to spill
> everything, while hashagg only spills rows not covered by the groups
> that fit into work_mem).

What I find when I run your query (with my own TPC-H DB that is
smaller than what you used here -- 59,986,052 lineitem tuples) is that
the sort required about 7x more memory than the hash agg to do
everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak
hash agg memory usage. I'd be surprised if the ratio was very
different for you -- but can you check?

I think that there is something pathological about this spill
behavior, because it sounds like the precise opposite of what you
might expect when you make a rough extrapolation of what disk I/O will
be based on the memory used in no-spill cases (as reported by EXPLAIN
ANALYZE).

> What I find really surprising is the costing - despite writing about
> twice as much data, the hashagg cost is estimated to be much lower than
> the sort. For example on the i5 machine, the hashagg cost is ~10M, while
> sort cost is almost 42M. Despite using almost twice as much disk. And
> the costing is exactly the same for master and the CP_SMALL_TLIST.

That does make it sound like the costs of the hash agg aren't being
represented. I suppose it isn't clear if this is a costing issue
because it isn't clear if the execution time performance itself is
pathological or is instead something that must be accepted as the cost
of spilling the hash agg in a general kind of way.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Next
From: Tom Lane
Date:
Subject: Re: Making CASE error handling less surprising