Thread: BUG #8571: Planner miss-estimates ' is not true' as not matching any rows

BUG #8571: Planner miss-estimates ' is not true' as not matching any rows

From
tgarnett@panjiva.com
Date:
The following bug has been logged on the website:

Bug reference:      8571
Logged by:          Timothy Garnett
Email address:      tgarnett@panjiva.com
PostgreSQL version: 9.2.4
Operating system:   Ubuntu Linux x86_64 12.04.3 LTS
Description:

The planner seems to be badly miss-estimating the selectivity of '<always
null col> IS NOT true/false', it does not seem to do so for equivalent
expressions such as '<col> is null or not <col>' or expressions where the
selectivity is difficult to determine.


Simplified scenario:


create temporary table test_tbl as (select generate_series as id,
null::boolean as val from generate_series(1,1000));
analyze verbose test_tbl;


select count(*) from test_tbl where val is not true;
=> 1000


explain select * from test_tbl where val is null; -- correctly estimates
1000 rows
=> Seq Scan on test_tbl  (cost=0.00..15.00 rows=1000 width=5)
   Filter: (val IS NULL)


-- problem case --
explain select * from test_tbl where val is not true; -- estimates only 1
row!
=>  Seq Scan on test_tbl  (cost=0.00..15.00 rows=1 width=5)
   Filter: (val IS NOT TRUE)


explain select * from test_tbl where NOT coalesce(val, false); -- estimates
500 rows (actual 1,000) but will still generally result in a reasonable
plan
=>  Seq Scan on test_tbl  (cost=0.00..15.00 rows=500 width=5)
   Filter: (NOT COALESCE(val, false))


explain select * from test_tbl where val is null or not val; -- correctly
estimates 1000 rows
=> Seq Scan on test_tbl  (cost=0.00..15.00 rows=1000 width=5)
   Filter: ((val IS NULL) OR (NOT val))




Estimating only one row for the selectivity of a where clause with
additional joins involved when many rows match can result in some very bad
plans which is what we ran into.
tgarnett@panjiva.com writes:
> The following bug has been logged on the website:
> Bug reference:      8571
> Logged by:          Timothy Garnett
> Email address:      tgarnett@panjiva.com
> PostgreSQL version: 9.2.4
> Operating system:   Ubuntu Linux x86_64 12.04.3 LTS
> Description:

> The planner seems to be badly miss-estimating the selectivity of '<always
> null col> IS NOT true/false', it does not seem to do so for equivalent
> expressions such as '<col> is null or not <col>' or expressions where the
> selectivity is difficult to determine.

This example works as desired for me in 9.2.5.  I think the fix was this:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [b32a25c3d] 2013-07-24 00:44:09 -0400
Branch: REL9_3_STABLE Release: REL9_3_0 [808d1f812] 2013-07-24 00:44:22 -0400
Branch: REL9_2_STABLE Release: REL9_2_5 [9f8254c18] 2013-07-24 00:44:36 -0400
Branch: REL9_1_STABLE Release: REL9_1_10 [13f11c8a8] 2013-07-24 00:44:46 -0400
Branch: REL9_0_STABLE Release: REL9_0_14 [8e992b018] 2013-07-24 00:44:59 -0400
Branch: REL8_4_STABLE Release: REL8_4_18 [0766904ad] 2013-07-24 00:45:15 -0400

    Fix booltestsel() for case where we have NULL stats but not MCV stats.

    In a boolean column that contains mostly nulls, ANALYZE might not find
    enough non-null values to populate the most-common-values stats,
    but it would still create a pg_statistic entry with stanullfrac set.
    The logic in booltestsel() for this situation did the wrong thing for
    "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null
    values would satisfy these tests (so that the true selectivity would
    be close to one, not close to zero).  Per bug #8274.

    Fix by Andrew Gierth, some comment-smithing by me.


            regards, tom lane

Re: BUG #8571: Planner miss-estimates ' is not true' as not matching any rows

From
Timothy Garnett
Date:
Thanks for the update, glad to here it's fixed in the latest version (we're
planning on upgrading soon).

Tim


On Thu, Oct 31, 2013 at 11:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> tgarnett@panjiva.com writes:
> > The following bug has been logged on the website:
> > Bug reference:      8571
> > Logged by:          Timothy Garnett
> > Email address:      tgarnett@panjiva.com
> > PostgreSQL version: 9.2.4
> > Operating system:   Ubuntu Linux x86_64 12.04.3 LTS
> > Description:
>
> > The planner seems to be badly miss-estimating the selectivity of '<always
> > null col> IS NOT true/false', it does not seem to do so for equivalent
> > expressions such as '<col> is null or not <col>' or expressions where the
> > selectivity is difficult to determine.
>
> This example works as desired for me in 9.2.5.  I think the fix was this:
>
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Branch: master [b32a25c3d] 2013-07-24 00:44:09 -0400
> Branch: REL9_3_STABLE Release: REL9_3_0 [808d1f812] 2013-07-24 00:44:22
> -0400
> Branch: REL9_2_STABLE Release: REL9_2_5 [9f8254c18] 2013-07-24 00:44:36
> -0400
> Branch: REL9_1_STABLE Release: REL9_1_10 [13f11c8a8] 2013-07-24 00:44:46
> -0400
> Branch: REL9_0_STABLE Release: REL9_0_14 [8e992b018] 2013-07-24 00:44:59
> -0400
> Branch: REL8_4_STABLE Release: REL8_4_18 [0766904ad] 2013-07-24 00:45:15
> -0400
>
>     Fix booltestsel() for case where we have NULL stats but not MCV stats.
>
>     In a boolean column that contains mostly nulls, ANALYZE might not find
>     enough non-null values to populate the most-common-values stats,
>     but it would still create a pg_statistic entry with stanullfrac set.
>     The logic in booltestsel() for this situation did the wrong thing for
>     "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null
>     values would satisfy these tests (so that the true selectivity would
>     be close to one, not close to zero).  Per bug #8274.
>
>     Fix by Andrew Gierth, some comment-smithing by me.
>
>
>                         regards, tom lane
>