Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Use extended statistics to estimate (Var op Var) clauses |
Date | |
Msg-id | e16416cc-51a0-dda1-64d5-d07ae34f0ea4@enterprisedb.com Whole thread Raw |
In response to | Re: Use extended statistics to estimate (Var op Var) clauses (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: Use extended statistics to estimate (Var op Var) clauses
Re: Use extended statistics to estimate (Var op Var) clauses |
List | pgsql-hackers |
On 8/28/21 6:30 PM, Mark Dilger wrote: > > >> On Aug 28, 2021, at 6:52 AM, Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >> >> Part 0003 fixes handling of those clauses so that we don't treat >> them as simple, but it does that by tweaking >> statext_is_compatible_clause(), as suggested by Dean. > > Function examine_opclause_args() doesn't set issimple to anything in > the IsA(rightop, Const) case, but assigns *issimplep = issimple at > the bottom. The compiler is not complaining about using a possibly > uninitialized variable, but if I change the "return true" on the very > next line to "return issimple", the compiler complains quite loudly. > Yeah, true. Thanks for noticing this was a bug - I forgot to set the issimple variable in the first branch. > > Some functions define bool *issimple, others bool *issimplep and bool > issimple. You might want to standardize the naming. > I think the naming is standard with respect to the surrounding code. If the other parameters use "p" to mark "pointer" then issimplep is used, but in other places it's just "issimple". IMHO this is appropriate. > It's difficult to know what "simple" means in extended_stats.c. > There is no file-global comment explaining the concept, and functions > like compare_scalars_simple don't have correlates named > compare_scalars_complex or such, so the reader cannot infer by > comparison what the difference might be between a "simple" case and > some non-"simple" case. The functions' issimple (or issimplep) > argument are undocumented. > > There is a comment: > > /* * statext_mcv_clauselist_selectivity * Estimate clauses using > the best multi-column statistics. .... * * - simple selectivity: > Computed without extended statistics, i.e. as if the * > columns/clauses were independent. * .... */ > > but it takes a while to find if you search for "issimple". > Yeah, true. This was added a while ago when Dean reworked the estimation (based on MCV), and it seemed clear back then. But now a comment explaining this concept (and how it affects the estimation) would be helpful. I'll try digging in the archives for the details. > > In both scalarineqsel_wrapper() and eqsel_internal(), the call to > matching_restriction_variables() should usually return false, since > comparing a variable to itself is an unusual case. The next call is > to get_restriction_variable(), which repeats the work of examining > the left and right variables. So in almost all cases, after throwing > away the results of: > > examine_variable(root, left, varRelid, &ldata); > examine_variable(root, right, varRelid, &rdata); > > performed in matching_restriction_variables(), we'll do exactly the > same work again (with one variable named differently) in > get_restriction_variable(): > > examine_variable(root, left, varRelid, vardata); > examine_variable(root, right, varRelid, &rdata); > > That'd be fine if example_variable() were a cheap function, but it > appears not to be. Do you think you could save the results rather > than recomputing them? It's a little messy, since these are the only > two functions out of about ten which follow this pattern, so you'd > have to pass NULLs into get_restriction_variable() from the other > eight callers, but it still looks like that would be a win. > I had similar concerns, although I don't think those functions are very expensive compared to the rest of the estimation code. I haven't done any measurements yet, though. But I don't think saving the results is the way to go - in a way, we already store the stats (which seems like the most expensive bit) in syscache. It seems better to just simplify examine_variable() so that it does not lookup the statistics, which we don't need here at all. The attached version of the patches fixes the other bugs reported here so far - most importantly it reworks how we set issimple while examining the clauses, so that it's never skips the initialization. Hopefully the added comments also explain it a bit more clearly. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: