Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Use extended statistics to estimate (Var op Var) clauses
Date
Msg-id CCD1B8BF-3376-454C-A2F3-9AB57A401A57@enterprisedb.com
Whole thread Raw
In response to Re: Use extended statistics to estimate (Var op Var) clauses  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Use extended statistics to estimate (Var op Var) clauses
Re: Use extended statistics to estimate (Var op Var) clauses
List pgsql-hackers

> On Aug 28, 2021, at 6:52 AM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Part 0003 fixes handling of those clauses so that we don't treat them as simple, but it does that by tweaking
statext_is_compatible_clause(),as suggested by Dean. 

Function examine_opclause_args() doesn't set issimple to anything in the IsA(rightop, Const) case, but assigns
*issimplep= issimple at the bottom.  The compiler is not complaining about using a possibly uninitialized variable, but
ifI change the "return true" on the very next line to "return issimple", the compiler complains quite loudly. 


Some functions define bool *issimple, others bool *issimplep and bool issimple.  You might want to standardize the
naming.

It's difficult to know what "simple" means in extended_stats.c.  There is no file-global comment explaining the
concept,and functions like compare_scalars_simple don't have correlates named compare_scalars_complex or such, so the
readercannot infer by comparison what the difference might be between a "simple" case and some non-"simple" case.  The
functions'issimple (or issimplep) argument are undocumented. 

There is a comment:

/*
 * statext_mcv_clauselist_selectivity
 *      Estimate clauses using the best multi-column statistics.
          ....
 *
 * - simple selectivity:  Computed without extended statistics, i.e. as if the
 * columns/clauses were independent.
 *
        ....
 */

but it takes a while to find if you search for "issimple".


In both scalarineqsel_wrapper() and eqsel_internal(), the call to matching_restriction_variables() should usually
returnfalse, since comparing a variable to itself is an unusual case.  The next call is to get_restriction_variable(),
whichrepeats the work of examining the left and right variables.  So in almost all cases, after throwing away the
resultsof: 

    examine_variable(root, left, varRelid, &ldata);
    examine_variable(root, right, varRelid, &rdata);

performed in matching_restriction_variables(), we'll do exactly the same work again (with one variable named
differently)in get_restriction_variable(): 

    examine_variable(root, left, varRelid, vardata);
    examine_variable(root, right, varRelid, &rdata);

That'd be fine if example_variable() were a cheap function, but it appears not to be.  Do you think you could save the
resultsrather than recomputing them?  It's a little messy, since these are the only two functions out of about ten
whichfollow this pattern, so you'd have to pass NULLs into get_restriction_variable() from the other eight callers, but
itstill looks like that would be a win.  

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses
Next
From: Zhihong Yu
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses