Re: [HACKERS] Making clausesel.c Smarter - Mailing list pgsql-hackers

From David Rowley
Subject Re: [HACKERS] Making clausesel.c Smarter
Date
Msg-id CAKJS1f_CC59STaKEqV4vsN4KUVRA6yYt8sspmm93R=+R28fnhA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Making clausesel.c Smarter  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Making clausesel.c Smarter
List pgsql-hackers
On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> If you ask me, I'd just leave:
>
> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
> DEFAULT_INEQ_SEL)
> + {
> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
> implies we have no stats */
> +     s2 = DEFAULT_RANGE_INEQ_SEL;
> + }
> + else
> + {
> ...
> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
> +   notnullsel = 1.0 - nullsel;
> +
> +   /* Adjust for double-exclusion of NULLs */
> +   s2 += nullsel;
> +   if (s2 <= 0.0) {
> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
> +      {
> +           /* Most likely contradictory clauses found */
> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
> +      }
> +      else
> +      {
> +           /* Could be a rounding error */
> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
> +      }
> +   }
> + }
>
> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
> cautious) estimation of the amount of rounding error that could be
> there with 32-bit floats.
>
> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
> an estimation based on stats, maybe approximate, but one that makes
> sense (ie: preserves the monotonicity of the CPF), and as such
> negative values are either sign of a contradiction or rounding error.
> The previous code left the possibility of negatives coming out of
> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
> but that doesn't seem possible coming out of scalarineqsel.

I notice this does change the estimates for contradictory clauses such as:

create table a (value int);
insert into a select x/100 from generate_Series(1,10000) x;
analyze a;
explain analyze select * from a where value between 101 and -1;

We now get:

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
time=1.904..1.904 rows=0 loops=1)
   Filter: ((value >= 101) AND (value <= '-1'::integer))
   Rows Removed by Filter: 10000
 Planning time: 0.671 ms
 Execution time: 1.950 ms
(5 rows)

where before we'd get:

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
time=0.903..0.903 rows=0 loops=1)
   Filter: ((value >= 101) AND (value <= '-1'::integer))
   Rows Removed by Filter: 10000
 Planning time: 0.162 ms
 Execution time: 0.925 ms
(5 rows)

Which comes from the 10000 * 0.005 selectivity estimate from tuples *
DEFAULT_RANGE_INEQ_SEL

I've attached a patch to this effect.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Vitaly Burovoy
Date:
Subject: Re: [HACKERS] identity columns
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] No-op case in ExecEvalConvertRowtype