Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Use extended statistics to estimate (Var op Var) clauses
Date
Msg-id CA+TgmobiQtcme20UH9TdYp5iE0Oc8M3nGMkz4HRMcKfgwfsRxQ@mail.gmail.com
Whole thread Raw
In response to Re: Use extended statistics to estimate (Var op Var) clauses  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Use extended statistics to estimate (Var op Var) clauses
List pgsql-hackers
On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> After looking at this for a while, it's clear the main issue is handling
> of clauses referencing the same Var twice, like for example (a = a) or
> (a < a). But it's not clear to me if this is something worth fixing, or
> if extended statistics is the right place to do it.
>
> If those clauses are worth the effort, why not to handle them better
> even without extended statistics? We can easily evaluate these clauses
> on per-column MCV, because they only reference a single Var.

+1.

It seems to me that what we ought to do is make "a < a", "a > a", and
"a != 0" all have an estimate of zero, and make "a <= a", "a >= a",
and "a = a" estimate 1-nullfrac. The extended statistics mechanism can
just ignore the first three types of clauses; the zero estimate has to
be 100% correct. It can't necessarily ignore the second three cases,
though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more
or less likely given that a is known to be not null, and extended
statistics can tell us that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Minor pg_amcheck fixes spotted while reading code
Next
From: Peter Geoghegan
Date:
Subject: Re: The Free Space Map: Problems and Opportunities