Re: hashagg slowdown due to spill changes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: hashagg slowdown due to spill changes
Date
Msg-id 20200615133403.gt6x3musvjzvw24y@development
Whole thread Raw
In response to Re: hashagg slowdown due to spill changes  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: hashagg slowdown due to spill changes
List pgsql-hackers
On Sun, Jun 14, 2020 at 11:09:55PM -0700, Jeff Davis wrote:
>On Sun, 2020-06-14 at 11:14 -0700, Andres Freund wrote:
>> I'm somewhat inclined to think that we should revert 4cad2534da6 and
>> then look at how precisely to tackle this in 14.
>
>I'm fine with that.
>

I don't see how we could just revert 4cad2534d and leave this for v14.

The hashagg spilling is IMHO almost guaranteed to be a pain point for
some users, as it will force some queries to serialize large amounts of
data. Yes, some of this is a cost for hashagg enforcing work_mem at
runtime, I'm fine with that. We'd get reports about that too, but we can
justify that cost ...

But just reverting 4cad2534d will make this much worse, I think, as
illustrated by the benchmarks I did in [1]. And no, this is not really
fixable by tweaking the cost parameters - even with the current code
(i.e. 4cad2534d in place) I had to increase random_page_cost to 60 on
the temp tablespace (on SATA RAID) to get good plans with parallelism
enabled. I haven't tried, but I presume without 4cad2534d I'd have to
push r_p_c even further ...

[1] https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2%40development

>> It'd probably make sense to request small tlists when the number of
>> estimated groups is large, and not otherwise.
>
>That seems like a nice compromise that would be non-invasive, at least
>for create_agg_plan().
>

Maybe. It'd certainly better than nothing. It's not clear to me what
would a good threshold be, though. And it's not going to handle cases of
under-estimates.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: "李杰(慎追)"
Date:
Subject: 回复:回复:how to create index concurrently on partitioned table
Next
From: Masahiko Sawada
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take 2