Re: Using multiple extended statistics for estimates - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Using multiple extended statistics for estimates
Date
Msg-id 10082.1573762564@sss.pgh.pa.us
Whole thread Raw
In response to Re: Using multiple extended statistics for estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Using multiple extended statistics for estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> For the case with equal Const values that should be mostly obvious, i.e.
> "a=1 AND a=1 AND a=1" has the same selectivity as "a=1".

> The case with different Const values is harder, unfortunately. It might
> seem obvious that "a=1 AND a=2" means there are no matching rows, but
> that heavily relies on the semantics of the equality operator. And we
> can't simply compare the Const values either, I'm afraid, because there
> are cases with cross-type operators like
>   a = 1::int AND a = 1.0::numeric
> where the Consts are of different type, yet both conditions can be true.

FWIW, there's code in predtest.c to handle exactly that, at least for
types sharing a btree opfamily.  Whether it's worth applying that logic
here is unclear, but note that we've had the ability to recognize
redundant and contradictory clauses for a long time:

regression=# explain select * from tenk1 where two = 1;          
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..470.00 rows=5000 width=244)
   Filter: (two = 1)
(2 rows)

regression=# explain select * from tenk1 where two = 1 and two = 1::bigint; 
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..470.00 rows=5000 width=244)
   Filter: (two = 1)
(2 rows)

regression=# explain select * from tenk1 where two = 1 and two = 2::bigint;
                          QUERY PLAN                           
---------------------------------------------------------------
 Result  (cost=0.00..470.00 rows=1 width=244)
   One-Time Filter: false
   ->  Seq Scan on tenk1  (cost=0.00..470.00 rows=1 width=244)
         Filter: (two = 1)
(4 rows)

It falls down on

regression=# explain select * from tenk1 where two = 1 and two = 2::numeric;
                        QUERY PLAN                         
-----------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..520.00 rows=25 width=244)
   Filter: ((two = 1) AND ((two)::numeric = '2'::numeric))
(2 rows)

because numeric isn't in the same opfamily, so these clauses can't be
compared easily.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Using multiple extended statistics for estimates
Next
From: Andres Freund
Date:
Subject: Re: SKIP_LOCKED test causes random buildfarm failures