Re: [HACKERS] Making clausesel.c Smarter - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [HACKERS] Making clausesel.c Smarter
Date
Msg-id CAGTBQpZ6ekdwgQ2p558ic0ma+SJiANboF0zdXmPDvanCQUePZg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Making clausesel.c Smarter  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Making clausesel.c Smarter
Re: [HACKERS] Making clausesel.c Smarter
List pgsql-hackers
On Fri, Apr 7, 2017 at 2:28 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> +   notnullsel = 1.0 - nullsel;
>> +
>> +   /* Adjust for double-exclusion of NULLs */
>> +   s2 += nullsel;
>> +   if (s2 <= 0.0) {
>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> +      {
>> +           /* Most likely contradictory clauses found */
>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> +      }
>> +      else
>> +      {
>> +           /* Could be a rounding error */
>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> +      }
>> +   }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>
> I notice this does change the estimates for contradictory clauses such as:
>
> create table a (value int);
> insert into a select x/100 from generate_Series(1,10000) x;
> analyze a;
> explain analyze select * from a where value between 101 and -1;
>
> We now get:
>
>                                          QUERY PLAN
> ---------------------------------------------------------------------------------------------
>  Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
> time=1.904..1.904 rows=0 loops=1)
>    Filter: ((value >= 101) AND (value <= '-1'::integer))
>    Rows Removed by Filter: 10000
>  Planning time: 0.671 ms
>  Execution time: 1.950 ms
> (5 rows)
>
> where before we'd get:
>
>                                           QUERY PLAN
> ----------------------------------------------------------------------------------------------
>  Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
> time=0.903..0.903 rows=0 loops=1)
>    Filter: ((value >= 101) AND (value <= '-1'::integer))
>    Rows Removed by Filter: 10000
>  Planning time: 0.162 ms
>  Execution time: 0.925 ms
> (5 rows)
>
> Which comes from the 10000 * 0.005 selectivity estimate from tuples *
> DEFAULT_RANGE_INEQ_SEL
>
> I've attached a patch to this effect.

+                    /*
+                     * A zero or slightly negative s2 should be converted into
+                     * a small positive value; we probably are dealing with a
+                     * very tight range and got a bogus result due to roundoff
+                     * errors. However, if s2 is very negative, then we
+                     * probably have default selectivity estimates on one or
+                     * both sides of the range that we failed to recognize
+                     * above for some reason.
+                     */
+                    if (s2 <= 0.0)

That comment seems outdated

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [HACKERS] partitioned tables and contrib/sepgsql
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Compilation warning with MSVC in pg_depend.c