Re: [HACKERS] AGG_HASHED cost estimate - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] AGG_HASHED cost estimate
Date
Msg-id CAFjFpResYhW8MpAtt1P=GZvdgdw9J7m=TvB+sUWWa9UMjoaQvw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] AGG_HASHED cost estimate  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Thu, Apr 20, 2017 at 11:35 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

> Regardless, it seems like something is
> getting overlooked.

I agree with this.

> The way final_cost_hashjoin charges for the actual data
> comparison is via pg_proc.procost, rather than just assuming 1.0.  I don't
> know if we would want to go to that effort in cost_agg or not;

But aren't we already doing that as (cpu_operator_cost * numGroupCols)
* input_tuples. May be we should use procost of cur_eq_funcs instead
of blank cpu_operator_cost.

> I assume that
> there was a reason the code was put in final_cost_hashjoin rather than
> initial_cost_hashjoin.

I think this is part of final_cost_hashjoin because it might need a
pg_proc cache lookup. The lookup can be avoided if initial cost is
higher than the existing path's cost.


>
> The initial_cost_hashjoin also throws in an addition of cpu_tuple_cost, "to
> model the costs of inserting the row into the hashtable". Based on the gprof
> and perf output of some very simple aggregates, I would say that
> cpu_tuple_cost is if anything an underestimate, and that it applies to all
> the hash table look ups, whether they end up inserting (about numGroups) or
> finding an existing one (approximately input_tuples - numGroups).  Currently
> in AGG_HASHED that is charged only for numGroups, although I don't know if
> that charge is for inserting into the hash table, or for walking the hash
> table at the end, projecting out tuples.   That it is charged to total_cost
> rather than startup_cost suggests it is meant to apply to walking the hash
> table at the end, rather than inserting into it.

Yes. It's for final projection.

> Maybe it should be charged
> both on the way in and on the way out?

Hash lookup and insertion is costed as

startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;

May be that needs to change.


>
> Both gprof and perf agree that tuplehash_insert and ExecStoreMinimalTuple
> are quite a bit more expensive than either texteq or hash_any.  This is with
> large hash tables (25 million tuples hashed to 3 million aggregates) and I
> think a lot of the time goes to CPU cache misses, so they might not be so
> bad if the hash tables were smaller.  I don't know how to model this,
> though, if we need it to be accurate over both regimes.

I have not seen our costs modelling CPU cache behaviour; it assumes
the optimal performance in that case. But may be we want to start
modelling it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Quorum commit for multiple synchronous replication.
Next
From: Noah Misch
Date:
Subject: Re: [HACKERS] Quorum commit for multiple synchronous replication.