Thread: [GENERAL] Curious planning decision

[GENERAL] Curious planning decision

From
Steve Rogerson
Date:
I have a query that behaves in an odd way, specifically it does different
things depending on if I use "=" or ">=" :

mydb=# explain analyze select count(*) from ctable c   where  c.date1 >=
'2017-08-10' and c.col2 = '637' and c.col3 = 'TEXT3' and c.col4 not in ('TEXT4');
                                                                       QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=104183.81..104183.82 rows=1 width=0) (actual
time=520.221..520.221 rows=1 loops=1)
   ->  Bitmap Heap Scan on ctable c  (cost=83843.79..104161.45 rows=8947
width=0) (actual time=515.833..520.192 rows=540 loops=1)
         Recheck Cond: ((date1 >= '2017-08-10'::date) AND (col2 = 637) AND
((col3)::text = 'TEXT3'::text))
         Rows Removed by Index Recheck: 4894
         Filter: ((col4)::text <> 'TEXT4'::text)
         Rows Removed by Filter: 360
         Heap Blocks: exact=1159
         ->  BitmapAnd  (cost=83843.79..83843.79 rows=10294 width=0) (actual
time=515.629..515.629 rows=0 loops=1)
               ->  Bitmap Index Scan on ctable_date1  (cost=0.00..22534.71
rows=598419 width=0) (actual time=5.204..5.204 rows=37536 loops=1)
                     Index Cond: (date1 >= '2017-08-10'::date)
               ->  Bitmap Index Scan on ctable_col2_col3  (cost=0.00..61304.36
rows=1158379 width=0) (actual time=505.479..505.479 rows=1213850 loops=1)
                     Index Cond: ((col2 = 637) AND ((col3)::text = 'TEXT3'::text))
 Planning time: 0.306 ms
 Execution time: 520.283 ms
(14 rows)

mydb=# explain analyze select count(*) from ctable c   where  c.date1 =
'2017-08-10' and c.col2 = '637' and c.col3 = 'TEXT3' and c.col4 not in ('TEXT4');

QUERY
PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9929.67..9929.68 rows=1 width=0) (actual time=10.500..10.501
rows=1 loops=1)
   ->  Index Scan using ctable_date1_col3_col4_in on ctable c
(cost=0.57..9929.05 rows=245 width=0) (actual time=0.062..10.456 rows=540 loops=1)
         Index Cond: ((date1 = '2017-08-10'::date) AND ((col3)::text =
'TEXT3'::text))
         Filter: (((col4)::text <> 'TEXT4'::text) AND (col2 = 637))
         Rows Removed by Filter: 9059
 Planning time: 0.499 ms
 Execution time: 10.557 ms
(7 rows)

The used indexes are:
Indexes:

"ctable_date1" btree (date1)
"ctable_col2_col3" btree (col2, col3)
"ctable_date1_col2_col3" btree (date1, col2, col3)

ctable has about 60,000,000 rows.

[ Caveat: I sanitized the query and I might not have this character perfect,
though I believe it's ok ]
The date is "today" at the time of writing. The table is analyzed, to pg
should "know" that date1 <= "today" hence there is no data beyond today.

On similar - and more relevant lines the same query with a date range so as
above but "c.date1 >= '2017-08-01' and c.date1 <= '2017-08-10'" uses the
date1_col2_col3 index and takes about 143ms, whereas "c.date1 >= '2017-07-31'
and c.date1 <= '2017-08-10'" uses the date1/col2_col3 indexes takes about 902ms.


Is there a planner option I can tweak that might help?

Steve



Re: [GENERAL] Curious planning decision

From
Tom Lane
Date:
Steve Rogerson <steve.pg@yewtc.demon.co.uk> writes:
> The date is "today" at the time of writing. The table is analyzed, to pg
> should "know" that date1 <= "today" hence there is no data beyond today.

It doesn't seem to, given this:

>                ->  Bitmap Index Scan on ctable_date1  (cost=0.00..22534.71
> rows=598419 width=0) (actual time=5.204..5.204 rows=37536 loops=1)
>                      Index Cond: (date1 >= '2017-08-10'::date)

You should try just "explain analyze select * from ctable c
where c.date1 >= '2017-08-10'" to confirm that that row estimate
is wrong in isolation, but it looks like it is off by more than 10X.

> ctable has about 60,000,000 rows.

So basically, the planner is estimating 1% selectivity for this condition,
when the true figure is more like 0.0625%.  I find this unsurprising if
you're using the default statistics target of 100 --- that means that the
accuracy of histogram-related predictions can't be expected to be any
better than 1%.  If you crank up the stats target (for this column, or
the whole table, or globally) and re-analyze, the estimate should get
better, and then you should get a better plan.

            regards, tom lane