Re: Using multiple extended statistics for estimates - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Using multiple extended statistics for estimates
Date
Msg-id 20191114214541.3ssauovlhh3vsasr@development
Whole thread Raw
In response to Re: Using multiple extended statistics for estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Nov 14, 2019 at 03:16:04PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> For the case with equal Const values that should be mostly obvious, i.e.
>> "a=1 AND a=1 AND a=1" has the same selectivity as "a=1".
>
>> The case with different Const values is harder, unfortunately. It might
>> seem obvious that "a=1 AND a=2" means there are no matching rows, but
>> that heavily relies on the semantics of the equality operator. And we
>> can't simply compare the Const values either, I'm afraid, because there
>> are cases with cross-type operators like
>>   a = 1::int AND a = 1.0::numeric
>> where the Consts are of different type, yet both conditions can be true.
>
>FWIW, there's code in predtest.c to handle exactly that, at least for
>types sharing a btree opfamily.  Whether it's worth applying that logic
>here is unclear, but note that we've had the ability to recognize
>redundant and contradictory clauses for a long time:
>
>regression=# explain select * from tenk1 where two = 1;
>                         QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1  (cost=0.00..470.00 rows=5000 width=244)
>   Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 1::bigint;
>                         QUERY PLAN
>------------------------------------------------------------
> Seq Scan on tenk1  (cost=0.00..470.00 rows=5000 width=244)
>   Filter: (two = 1)
>(2 rows)
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::bigint;
>                          QUERY PLAN
>---------------------------------------------------------------
> Result  (cost=0.00..470.00 rows=1 width=244)
>   One-Time Filter: false
>   ->  Seq Scan on tenk1  (cost=0.00..470.00 rows=1 width=244)
>         Filter: (two = 1)
>(4 rows)
>
>It falls down on
>
>regression=# explain select * from tenk1 where two = 1 and two = 2::numeric;
>                        QUERY PLAN
>-----------------------------------------------------------
> Seq Scan on tenk1  (cost=0.00..520.00 rows=25 width=244)
>   Filter: ((two = 1) AND ((two)::numeric = '2'::numeric))
>(2 rows)
>
>because numeric isn't in the same opfamily, so these clauses can't be
>compared easily.
>
>            regards, tom lane

Yeah, and this logic still works - the redundant clauses won't even get
to the selectivity estimation, I think. So maybe the comment is not
quite necessary, because the problem does not even exist ...

Maybe we could do something about the cases that predtest.c can't solve,
but it's not clear if we can be much smarter for types with different
opfamilies.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Missing dependency tracking for TableFunc nodes
Next
From: Tomas Vondra
Date:
Subject: Re: Using multiple extended statistics for estimates