Re: Disk-based hash aggregate's cost model - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Disk-based hash aggregate's cost model
Date
Msg-id 20200901091903.ggabq3t3axkc2wwx@development
Whole thread Raw
In response to Re: Disk-based hash aggregate's cost model  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Disk-based hash aggregate's cost model
Re: Disk-based hash aggregate's cost model
List pgsql-hackers
On Mon, Aug 31, 2020 at 11:34:34PM -0700, Jeff Davis wrote:
>On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
>> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
>> for the spilling case. In particular, the initial section computing
>> startup_cost seems to ignore that we may need to so some of the stuff
>> repeatedly - for example we'll repeat hash lookups for spilled
>> tuples,
>> and so on.
>
>To fix that, we'd also need to change the cost of in-memory HashAgg,
>right?
>

Why? I don't think we need to change costing of in-memory HashAgg. My
assumption was we'd only tweak startup_cost for cases with spilling by
adding something like (cpu_operator_cost * npartitions * ntuples).

>> The other thing is that sort seems to be doing only about half the
>> physical I/O (as measured by iosnoop) compared to hashagg, even
>> though
>> the estimates of pages / input_bytes are exactly the same. For
>> hashagg
>> the iosnoop shows 5921MB reads and 7185MB writes, while sort only
>> does
>> 2895MB reads and 3655MB writes. Which kinda matches the observed
>> sizes
>> of temp files in the two cases, so the input_bytes for sort seems to
>> be
>> a bit overestimated.
>
>Hmm, interesting.
>

FWIW I suspect some of this difference may be due to logical vs.
physical I/O. iosnoop only tracks physical I/O sent to the device, but
maybe we do much more logical I/O and it simply does not expire from
page cache for the sort. It might behave differently for larger data
set, longer query, ...

>How reasonable is it to be making these kinds of changes to the cost
>model right now? I think your analysis is solid, but I'm worried about
>making more intrusive changes very late in the cycle.
>
>I had originally tried to limit the cost model changes to the new plans
>I am introducing -- that is, HashAgg plans expected to require disk.
>That's why I came up with a rather arbitrary penalty.
>

I don't know. I certainly understand the desire not to change things
this late. OTOH I'm worried that we'll end up receiving a lot of poor
plans post release.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: [patch] demote