Re: [HACKERS] Making clausesel.c Smarter - Mailing list pgsql-hackers
From | Claudio Freire |
---|---|
Subject | Re: [HACKERS] Making clausesel.c Smarter |
Date | |
Msg-id | CAGTBQpZ6ekdwgQ2p558ic0ma+SJiANboF0zdXmPDvanCQUePZg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Making clausesel.c Smarter (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Making clausesel.c Smarter
Re: [HACKERS] Making clausesel.c Smarter |
List | pgsql-hackers |
On Fri, Apr 7, 2017 at 2:28 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound == >> DEFAULT_INEQ_SEL) >> + { >> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL >> implies we have no stats */ >> + s2 = DEFAULT_RANGE_INEQ_SEL; >> + } >> + else >> + { >> ... >> + s2 = rqlist->hibound + rqlist->lobound - 1.0; >> + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid); >> + notnullsel = 1.0 - nullsel; >> + >> + /* Adjust for double-exclusion of NULLs */ >> + s2 += nullsel; >> + if (s2 <= 0.0) { >> + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6)) >> + { >> + /* Most likely contradictory clauses found */ >> + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel; >> + } >> + else >> + { >> + /* Could be a rounding error */ >> + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; >> + } >> + } >> + } >> >> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly >> cautious) estimation of the amount of rounding error that could be >> there with 32-bit floats. >> >> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is >> an estimation based on stats, maybe approximate, but one that makes >> sense (ie: preserves the monotonicity of the CPF), and as such >> negative values are either sign of a contradiction or rounding error. >> The previous code left the possibility of negatives coming out of >> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates, >> but that doesn't seem possible coming out of scalarineqsel. > > I notice this does change the estimates for contradictory clauses such as: > > create table a (value int); > insert into a select x/100 from generate_Series(1,10000) x; > analyze a; > explain analyze select * from a where value between 101 and -1; > > We now get: > > QUERY PLAN > --------------------------------------------------------------------------------------------- > Seq Scan on a (cost=0.00..195.00 rows=1 width=4) (actual > time=1.904..1.904 rows=0 loops=1) > Filter: ((value >= 101) AND (value <= '-1'::integer)) > Rows Removed by Filter: 10000 > Planning time: 0.671 ms > Execution time: 1.950 ms > (5 rows) > > where before we'd get: > > QUERY PLAN > ---------------------------------------------------------------------------------------------- > Seq Scan on a (cost=0.00..195.00 rows=50 width=4) (actual > time=0.903..0.903 rows=0 loops=1) > Filter: ((value >= 101) AND (value <= '-1'::integer)) > Rows Removed by Filter: 10000 > Planning time: 0.162 ms > Execution time: 0.925 ms > (5 rows) > > Which comes from the 10000 * 0.005 selectivity estimate from tuples * > DEFAULT_RANGE_INEQ_SEL > > I've attached a patch to this effect. + /* + * A zero or slightly negative s2 should be converted into + * a small positive value; we probably are dealing with a + * very tight range and got a bogus result due to roundoff + * errors. However, if s2 is very negative, then we + * probably have default selectivity estimates on one or + * both sides of the range that we failed to recognize + * above for some reason. + */ + if (s2 <= 0.0) That comment seems outdated Otherwise, the patch LGTM, but I'd like to solve the quadratic behavior too... are you going to try? Otherwise I could take a stab at it myself. It doesn't seem very difficult. Also, can you add a test case? Cost values could make the test fragile, so if that gives you trouble I'll understand, but it'd be best to try and test this if possible.
pgsql-hackers by date: