Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use extended statistics to estimate (Var op Var) clauses
Date
Msg-id c6284c56-1a83-18f3-90a7-35259f64a01e@enterprisedb.com
Whole thread Raw
In response to Re: Use extended statistics to estimate (Var op Var) clauses  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Use extended statistics to estimate (Var op Var) clauses
Re: Use extended statistics to estimate (Var op Var) clauses
List pgsql-hackers
Hi,

On 7/5/21 2:46 PM, Dean Rasheed wrote:
> On Sun, 13 Jun 2021 at 21:28, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> Here is a slightly updated version of the patch
>>
> 
> Hi,
> 
> I have looked at this in some more detail, and it all looks pretty
> good, other than some mostly cosmetic stuff.
> 

Thanks for the review!

> The new code in statext_is_compatible_clause_internal() is a little
> hard to follow because some of the comments aren't right (e.g. when
> checking clause_expr2, it isn't an (Expr op Const) or (Const op Expr)
> as the comment says). Rather than trying to comment on each
> conditional branch, it might be simpler to just have a single
> catch-all comment at the top, and also remove the "return true" in the
> middle, to make it something like:
> 
>     /*
>      * Check Vars appearing on either side by recursing, and make a note of
>      * any expressions.
>      */
>     if (IsA(clause_expr, Var))
>     {
>         if (!statext_is_compatible_clause_internal(...))
>             return false;
>     }
>     else
>         *exprs = lappend(*exprs, clause_expr);
> 
>     if (clause_expr2)
>     {
>         if (IsA(clause_expr2, Var))
>         {
>             if (!statext_is_compatible_clause_internal(...))
>                 return false;
>         }
>         else
>             *exprs = lappend(*exprs, clause_expr2);
>     }
> 
>     return true;
> 

I ended up doing something slightly different - examine_opclause_args
now "returns" a list of expressions, instead of explicitly setting two
parameters. That means we can do a simple foreach() here, which seems
cleaner. It means we have to extract the expressions from the list in a
couple places, but that seems acceptable. Do you agree?

I also went through the comments and updated those that seemed wrong.

> Is the FIXME comment in examine_opclause_args() necessary? The check
> for a single relation has already been done in
> clause[list]_selectivity_ext(), and I'm not sure what
> examine_opclause_args() would do differently.
> 

Yeah, I came to the same conclusion.

> In mcv_get_match_bitmap(), perhaps do the RESULT_IS_FINAL() checks
> first in each loop.
> 

This is how master already does that now, and I wonder if it's done in
this order intentionally. It's not clear to me doing it in the other way
would be faster?

> Also in mcv_get_match_bitmap(), the 2 "First check whether the
> constant is below the lower boundary ..." comments don't make any
> sense to me. Were those perhaps copied and pasted from somewhere else?
> They should perhaps say "Otherwise, compare the MCVItem with the
> constant" and "Otherwise compare the values from the MCVItem using the
> clause operator", or something like that.
> 

Yeah, that's another bit that comes from current master - the patch just
makes a new copy of the comment. I agree it's bogus, Seems like a
remainder of the original code which did various "smart" things we
removed over time. Will fix.

> But other than such cosmetic things, I think the patch is good, and
> gives some nice estimate improvements.
> 

Thanks, sounds good. I guess the last thing is maybe mentioning this in
the docs, adding an example etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: something is wonky with pgbench pipelining
Next
From: Robert Haas
Date:
Subject: Re: .ready and .done files considered harmful