Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use extended statistics to estimate (Var op Var) clauses
Date
Msg-id 47d36334-a382-f6fb-f5e5-1dace7b3b3e8@enterprisedb.com
Whole thread Raw
In response to Re: Use extended statistics to estimate (Var op Var) clauses  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Use extended statistics to estimate (Var op Var) clauses  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On 8/20/21 8:56 PM, Robert Haas wrote:
> On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> After looking at this for a while, it's clear the main issue is handling
>> of clauses referencing the same Var twice, like for example (a = a) or
>> (a < a). But it's not clear to me if this is something worth fixing, or
>> if extended statistics is the right place to do it.
>>
>> If those clauses are worth the effort, why not to handle them better
>> even without extended statistics? We can easily evaluate these clauses
>> on per-column MCV, because they only reference a single Var.
> 
> +1.
> 
> It seems to me that what we ought to do is make "a < a", "a > a", and
> "a != 0" all have an estimate of zero, and make "a <= a", "a >= a",
> and "a = a" estimate 1-nullfrac. The extended statistics mechanism can
> just ignore the first three types of clauses; the zero estimate has to
> be 100% correct. It can't necessarily ignore the second three cases,
> though. If the query says "WHERE a = a AND b = 1", "b = 1" may be more
> or less likely given that a is known to be not null, and extended
> statistics can tell us that.
> 

Yeah, I agree this seems like the right approach (except I guess you 
meant "a != a" and not "a != 0"). Assuming we want to do something about 
these clauses at all - I'm still wondering if those clauses are common 
in practice or just synthetic.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead
Next
From: Cary Huang
Date:
Subject: Re: postgres_fdw: Handle boolean comparison predicates