Thread: [PERFORM] Regression from 9.4-9.6

[PERFORM] Regression from 9.4-9.6

From
Jim Nasby
Date:
I've got a query that's regressed from 9.4 to 9.6. I suspect it has 
something to do with the work done around bad plans from single-row 
estimates. There's a SeqScan happening even though the join is to the PK 
of bd_ident. Full plans are at [1,2,3], but here's the relevant bits...

9.4:
>    ->  Nested Loop Left Join  (cost=1.00..50816.55 rows=1 width=27) (actual time=979.406..3213.286 rows=508 loops=1)
>          ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..50807.51 rows=1 width=36) (actual
time=979.381..3207.777rows=508 loops=1)
 
>                Index Cond: ((filed_departuretime >= '2017-07-20 05:00:00'::timestamp without time zone) AND
(filed_departuretime<= '2017-07-30 04:59:59'::timestamp without time zone))
 
...
>          ->  Index Scan using bd_ident_pkey on bd_ident i  (cost=0.43..4.45 rows=1 width=11) (actual
time=0.006..0.006rows=1 loops=508)
 
>                Index Cond: (bdata_forks.ident_id = id)
>          SubPlan 1
>            ->  Index Scan using bd_airport_pkey on bd_airport  (cost=0.56..4.58 rows=1 width=20) (actual
time=0.003..0.003rows=1 loops=508)
 
>                  Index Cond: (id = bdata_forks.destination_id)

9.6:
>    ->  Nested Loop Left Join  (cost=0.57..14994960.40 rows=1 width=71) (actual time=931.479..327972.891 rows=508
loops=1)
>          Join Filter: (bdata_forks.ident_id = i.id)
>          Rows Removed by Join Filter: 1713127892
>          ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..14894236.06 rows=1 width=36)
(actualtime=892.664..3025.653 rows=508 loops=1)
 
...
>          ->  Seq Scan on bd_ident i  (cost=0.00..58566.00 rows=3372300 width=11) (actual time=0.002..280.966
rows=3372300loops=508)               ^^^^^^^^
 
>          SubPlan 1
>            ->  Index Scan using bd_airport_pkey on bd_airport  (cost=0.56..4.58 rows=1 width=20) (actual
time=0.009..0.009rows=1 loops=508)
 
>                  Index Cond: (id = bdata_forks.destination_id)

Altering the predicates somewhat (removing one of the timestamp 
conditions) results in the input to the outer part of the nested loop 
estimating at 326 rows instead of 1, which generates a good plan:

>    ->  Nested Loop Left Join  (cost=1.00..14535906.91 rows=326 width=71) (actual time=23.670..4558.273 rows=3543
loops=1)
>          ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..14532973.05 rows=326 width=36)
(actualtime=23.647..4522.428 rows=3543 loops=1)                          ^^^^^^^^
 
...
>          ->  Index Scan using bd_ident_pkey on bd_ident i  (cost=0.43..4.40 rows=1 width=11) (actual
time=0.005..0.006rows=1 loops=3543)
 
>                Index Cond: (bdata_forks.ident_id = id)
>          SubPlan 1
>            ->  Index Scan using bd_airport_pkey on bd_airport  (cost=0.56..4.58 rows=1 width=20) (actual
time=0.003..0.003rows=1 loops=3543)
 
>                  Index Cond: (id = bdata_forks.destination_id)

1: https://explain.depesz.com/s/2A90
2: https://explain.depesz.com/s/jKdr
3: https://explain.depesz.com/s/nFh
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Regression from 9.4-9.6

From
Tom Lane
Date:
Jim Nasby <jim@nasby.net> writes:
> I've got a query that's regressed from 9.4 to 9.6. I suspect it has
> something to do with the work done around bad plans from single-row
> estimates.

Why has this indexscan's cost estimate changed so much?

>> ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..50807.51 rows=1 width=36) (actual
time=979.381..3207.777rows=508 loops=1) 

>> ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..14894236.06 rows=1 width=36) (actual
time=892.664..3025.653rows=508 loops=1) 

I think the reason it's discarding the preferable plan is that, with this
huge increment in the estimated cost getting added to both alternatives,
the two nestloop plans have fuzzily the same total cost, and it's picking
the one you don't want on the basis of some secondary criterion.
        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Regression from 9.4-9.6

From
Jim Nasby
Date:
On 10/8/17 2:34 PM, Tom Lane wrote:
> Jim Nasby <jim@nasby.net> writes:
>> I've got a query that's regressed from 9.4 to 9.6. I suspect it has
>> something to do with the work done around bad plans from single-row
>> estimates.
> 
> Why has this indexscan's cost estimate changed so much?
> 
>>> ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..50807.51 rows=1 width=36) (actual
time=979.381..3207.777rows=508 loops=1)
 
> 
>>> ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..14894236.06 rows=1 width=36) (actual
time=892.664..3025.653rows=508 loops=1)
 
> 
> I think the reason it's discarding the preferable plan is that, with this
> huge increment in the estimated cost getting added to both alternatives,
> the two nestloop plans have fuzzily the same total cost, and it's picking
> the one you don't want on the basis of some secondary criterion.

Great question... the only thing that sticks out is the coalesce(). Let 
me see if an analyze with a higher stats target changes anything. FWIW, 
the 9.6 database is copied from the 9.4 one once a week and then 
pg_upgraded. I'm pretty sure an ANALYZE is part of that process.

9.4:
>          ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..50807.51 rows=1 width=36) (actual
time=979.381..3207.777rows=508 loops=1)
 
>                Index Cond: ((filed_departuretime >= '2017-07-20 05:00:00'::timestamp without time zone) AND
(filed_departuretime<= '2017-07-30 04:59:59'::timestamp without time zone))
 
>                Filter: (((view_www IS NULL) OR (view_www IS TRUE)) AND (sch_block_out IS NOT NULL) AND (diverted IS
NOTTRUE) AND (true_cancel IS NOT TRUE) AND (sch_block_out >= '2017-07-23 05:00:00'::timestamp without time zone) AND
(sch_block_out<= '2017-07-24 04:59:59'::timestamp without time zone) AND (COALESCE(actualarrivaltime, cancellation) >=
actualdeparturetime)AND ((act_block_out - sch_block_out) >= '00:15:00'::interval) AND (((SubPlan 2))::text =
'KORD'::text))
>                Rows Removed by Filter: 2696593
>                SubPlan 2
>                  ->  Index Scan using bd_airport_pkey on bd_airport bd_airport_1  (cost=0.56..4.58 rows=1 width=20)
(actualtime=0.003..0.003 rows=1 loops=21652)
 
>                        Index Cond: (id = bdata_forks.origin_id)

9.6:
>          ->  Index Scan using bdata_filed_departuretime on bdata_forks  (cost=0.57..14894236.06 rows=1 width=36)
(actualtime=892.664..3025.653 rows=508 loops=1)
 
>                Index Cond: ((filed_departuretime >= '2017-07-20 05:00:00'::timestamp without time zone) AND
(filed_departuretime<= '2017-07-30 04:59:59'::timestamp without time zone))
 
>                Filter: (((view_www IS NULL) OR (view_www IS TRUE)) AND (sch_block_out IS NOT NULL) AND (diverted IS
NOTTRUE) AND (true_cancel IS NOT TRUE) AND (sch_block_out >= '2017-07-23 05:00:00'::timestamp without time zone) AND
(sch_block_out<= '2017-07-24 04:59:59'::timestamp without time zone) AND (COALESCE(actualarrivaltime, cancellation) >=
actualdeparturetime)AND ((act_block_out - sch_block_out) >= '00:15:00'::interval) AND (((SubPlan 2))::text =
'KORD'::text))
>                Rows Removed by Filter: 2696592
>                SubPlan 2
>                  ->  Index Scan using bd_airport_pkey on bd_airport bd_airport_1  (cost=0.56..4.58 rows=1 width=20)
(actualtime=0.004..0.004 rows=1 loops=21652)
 
>                        Index Cond: (id = bdata_forks.origin_id)


-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Regression from 9.4-9.6

From
Jim Nasby
Date:
On 10/8/17 3:02 PM, Jim Nasby wrote:
>>
>>>> -> Index Scan using bdata_filed_departuretime on bdata_forks 
>>>> (cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 
>>>> rows=508 loops=1)
>>
>>>> -> Index Scan using bdata_filed_departuretime on bdata_forks 
>>>> (cost=0.57..14894236.06 rows=1 width=36) (actual 
>>>> time=892.664..3025.653 rows=508 loops=1)
>>
>> I think the reason it's discarding the preferable plan is that, with this
>> huge increment in the estimated cost getting added to both alternatives,
>> the two nestloop plans have fuzzily the same total cost, and it's picking
>> the one you don't want on the basis of some secondary criterion.
> 
> Great question... the only thing that sticks out is the coalesce(). Let 
> me see if an analyze with a higher stats target changes anything. FWIW, 
> the 9.6 database is copied from the 9.4 one once a week and then 
> pg_upgraded. I'm pretty sure an ANALYZE is part of that process.

Turns out that analyze is the 'problem'. On the 9.4 database, pg_stats 
shows that the newest date in filed_departuretime is 3/18/2017, while 
the 9.6 database is up-to-date. If I change the query to use 2/9/2018 
instead of 7/20/2017 I get the same results.

So, the larger cost estimate is theoretically more correct. If I set 
random_page_cost = 1 I end up with a good plan.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Regression from 9.4-9.6

From
Tom Lane
Date:
Jim Nasby <jim@nasby.net> writes:
> On 10/8/17 2:34 PM, Tom Lane wrote:
>> Why has this indexscan's cost estimate changed so much?

> Great question... the only thing that sticks out is the coalesce(). Let 
> me see if an analyze with a higher stats target changes anything. FWIW, 
> the 9.6 database is copied from the 9.4 one once a week and then 
> pg_upgraded. I'm pretty sure an ANALYZE is part of that process.

Hm, now that I see the SubPlan in there, I wonder whether 9.6 is
accounting more conservatively for the cost of the subplan.  It
probably is assuming that the subplan gets run for each row fetched
from the index, although the loops and rows-removed counts show
that the previous filter conditions reject 99% of the fetched rows.

But that code looks the same in 9.4, so I don't understand why
the 9.4 estimate isn't equally large ...
        regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Regression from 9.4-9.6

From
Jim Nasby
Date:
On 10/8/17 3:37 PM, Tom Lane wrote:
> Jim Nasby <jim@nasby.net> writes:
>> On 10/8/17 2:34 PM, Tom Lane wrote:
>>> Why has this indexscan's cost estimate changed so much?
> 
>> Great question... the only thing that sticks out is the coalesce(). Let
>> me see if an analyze with a higher stats target changes anything. FWIW,
>> the 9.6 database is copied from the 9.4 one once a week and then
>> pg_upgraded. I'm pretty sure an ANALYZE is part of that process.
> 
> Hm, now that I see the SubPlan in there, I wonder whether 9.6 is
> accounting more conservatively for the cost of the subplan.  It
> probably is assuming that the subplan gets run for each row fetched
> from the index, although the loops and rows-removed counts show
> that the previous filter conditions reject 99% of the fetched rows.
> 
> But that code looks the same in 9.4, so I don't understand why
> the 9.4 estimate isn't equally large ...

Besides the analyze issue, the other part of this is

asdidata@graceful.hou/20106> select 
pg_size_pretty(pg_relation_size('bdata_forks')); pg_size_pretty
---------------- 106 GB
(1 row)

asdidata@graceful.hou/20106> select relpages::bigint*8192/reltuples from 
pg_class where relname='bdata_forks';     ?column?
------------------ 185.559397863791
(1 row)

With an effective_cache_size of 200GB that's not really helping things. 
But it's also another example of the planner's reluctance towards index 
scans.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance