On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> After looking at this for a while, it's clear the main issue is handling
> of clauses referencing the same Var twice, like for example (a = a) or
> (a < a). But it's not clear to me if this is something worth fixing, or
> if extended statistics is the right place to do it.
>
> If those clauses are worth the effort, why not to handle them better
> even without extended statistics? We can easily evaluate these clauses
> on per-column MCV, because they only reference a single Var.
+1.
It seems to me that what we ought to do is make "a < a", "a > a", and
"a != 0" all have an estimate of zero, and make "a <= a", "a >= a",
and "a = a" estimate 1-nullfrac. The extended statistics mechanism can
just ignore the first three types of clauses; the zero estimate has to
be 100% correct. It can't necessarily ignore the second three cases,
though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more
or less likely given that a is known to be not null, and extended
statistics can tell us that.
--
Robert Haas
EDB: http://www.enterprisedb.com