Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Default setting for enable_hashagg_disk
Date	July 24, 2020 05:33:45
Msg-id	CAH2-Wz=QFBj=RhsqqC0-1a3f8BxLXP-x39vZ0dLjMyFmDXpDtw@mail.gmail.com Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: Default setting for enable_hashagg_disk (Tomas Vondra <tomas.vondra@2ndquadrant.com>) Re: Default setting for enable_hashagg_disk (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> So let me share some fresh I/O statistics collected on the current code
> using iosnoop. I've done the tests on two different machines using the
> "aggregate part" of TPC-H Q17, i.e. essentially this:
>
>      SELECT * FROM (
>         SELECT
>             l_partkey AS agg_partkey,
>             0.2 * avg(l_quantity) AS avg_quantity
>         FROM lineitem GROUP BY l_partkey OFFSET 1000000000
>      ) part_agg;
>
> The OFFSET is there just to ensure we don't need to send anything to
> the client, etc.

Thanks for testing this.

> So sort writes ~3.4GB of data, give or take. But hashagg/master writes
> almost 6-7GB of data, i.e. almost twice as much. Meanwhile, with the
> original CP_SMALL_TLIST we'd write "only" ~5GB of data. That's still
> much more than the 3.4GB of data written by sort (which has to spill
> everything, while hashagg only spills rows not covered by the groups
> that fit into work_mem).

What I find when I run your query (with my own TPC-H DB that is
smaller than what you used here -- 59,986,052 lineitem tuples) is that
the sort required about 7x more memory than the hash agg to do
everything in memory: 4,384,711KB for the quicksort vs 630,801KB peak
hash agg memory usage. I'd be surprised if the ratio was very
different for you -- but can you check?

I think that there is something pathological about this spill
behavior, because it sounds like the precise opposite of what you
might expect when you make a rough extrapolation of what disk I/O will
be based on the memory used in no-spill cases (as reported by EXPLAIN
ANALYZE).

> What I find really surprising is the costing - despite writing about
> twice as much data, the hashagg cost is estimated to be much lower than
> the sort. For example on the i5 machine, the hashagg cost is ~10M, while
> sort cost is almost 42M. Despite using almost twice as much disk. And
> the costing is exactly the same for master and the CP_SMALL_TLIST.

That does make it sound like the costs of the hash agg aren't being
represented. I suppose it isn't clear if this is a costing issue
because it isn't clear if the execution time performance itself is
pathological or is instead something that must be accepted as the cost
of spilling the hash agg in a general kind of way.

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Andres Freund
Date: 24 July 2020, 05:04:02
Subject: Re: [Patch] ALTER SYSTEM READ ONLY

From: Tom Lane
Date: 24 July 2020, 05:34:53
Subject: Re: Making CASE error handling less surprising

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next