Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation |
Date | |
Msg-id | CAFjFpRcex4Q4uSGtPMRkpxidg1QfSj7fChmGLyBLjj9A329uDw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
|
List | pgsql-hackers |
On Thu, Jul 26, 2018 at 10:30 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 26 July 2018 at 07:12, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> In the patch clauselist_selectivity() gets called repeatedly for every >> new qual added to the clause list. Instead, if we try to combine the >> cost/row estimation with order_qual_clauses() or >> clauselist_selectivity(), we might be able to what the current patch >> does in O(n). clauselist_selectivity() calculates combined selectivity >> of all the given quals in O(n). We should do something similar to that >> in this case. > > Duplicating the logic of clauselist_selectivity() seems like quite a > lot of effort to go to just for improved filter cost estimates. Note > also that clauselist_selectivity() might get a good deal more complex > with multivariate stats. I am not suggesting to duplicate code in clauselist_selectivity(). Instead I am suggesting that we incorporate costing in that function. I don't know how feasible is that. > > Perhaps a reasonable simplification would be to just treat the clauses > as independent when estimating the filter cost, and so use the > following as a "good enough" estimate for the filter cost: > > cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ... > > That would probably be an improvement over what we currently have. It > would be O(n) to compute, and it would probably use the cached > selectivity estimates for the clauses. That looks like a good trade-off. But looking at the following comment in clause_selectivity(), I doubt if this will work in all the cases /* * If possible, cache the result of the selectivity calculation for * the clause. We can cache if varRelid is zero or the clause * contains only vars of that relid --- otherwise varRelid will affect * the result, so mustn't cache. Outer join quals might be examined * with either their join's actual jointype or JOIN_INNER, so we need * two cache variables to remember both cases. Note: we assume the * result won't change if we are switching the input relations or * considering a unique-ified case, so we only need one cache variable * for all non-JOIN_INNER cases. */ > > Note also that with this simplified expression for the filter cost, it > would be possible to improve order_qual_clauses() to take into account > the clause selectivities when sorting the clauses, and minimise the > cost of the filter by evaluating more selective clauses sooner, if > they're not too expensive. I believe that amounts to sorting by > cost/(1-sel) rather than just cost, except for clauses with sel==1, > which it makes sense to move to the end, and just sort by cost. +1, if we could do that. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: