Thread: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From
"Erdmann, Markus @ Bellevue"
Date:
Hello,

We’re trying to debug a performance issue affecting our staging database, and we’ve narrowed it down to a difference in the query optimizer in 9.5.2. Upgrading to 9.5 is important for us because we need the ability to import foreign schemas.

This is the query we’re running:

CREATE TEMP TABLE tmp_joined_transactions_9gkgptn5xcp9 ( transaction_id integer PRIMARY KEY );

INSERT INTO tmp_joined_transactions_9gkgptn5xcp9 (transaction_id)
SELECT DISTINCT ON ("transactions_transaction"."id") "transactions_transaction"."id"
FROM "transactions_transaction" , "tmp_joined_transactions_75chlsokrsev"
WHERE (NOT ("transactions_transaction"."id"
IN (SELECT U0."id" AS Col1
FROM "transactions_transaction" U0
LEFT OUTER JOIN "transactions_commission" U1
ON ( U0."id" = U1."transaction_id" )
WHERE U1."id" IS NULL)) 
AND "transactions_transaction"."date_created" >= '2010-01-01'::date
AND "transactions_transaction"."date_created" <= '2015-12-31'::date
AND "transactions_transaction"."deal_status" IN (1)
AND (transactions_transaction.id = tmp_joined_transactions_75chlsokrsev.transaction_id))
ORDER BY "transactions_transaction"."id" ASC;

And this is the EXPLAIN ANALYZE output in 9.5.2:

Insert on tmp_joined_transactions_9gkgptn5xcp9  (cost=42111.29..42991.60 rows=1 width=4) (actual time=470236.029..470236.029 rows=0 loops=1)
  ->  Unique  (cost=42111.29..42991.59 rows=1 width=4) (actual time=1109.636..470222.609 rows=4652 loops=1)
        ->  Nested Loop  (cost=42111.29..42991.59 rows=1 width=4) (actual time=1109.635..470219.259 rows=4652 loops=1)
              Join Filter: (transactions_transaction.id = tmp_joined_transactions_75chlsokrsev.transaction_id)
              Rows Removed by Join Filter: 1153472704
              ->  Index Only Scan using tmp_joined_transactions_75chlsokrsev_pkey on tmp_joined_transactions_75chlsokrsev  (cost=0.28..224.61 rows=5355 width=4) (actual time=0.016..4.829 rows=4652 loops=1)
                    Heap Fetches: 4652
              ->  Materialize  (cost=42111.01..42686.66 rows=1 width=4) (actual time=0.240..51.639 rows=247953 loops=4652)
                    ->  Bitmap Heap Scan on transactions_transaction  (cost=42111.01..42686.65 rows=1 width=4) (actual time=1109.580..1238.034 rows=247953 loops=1)
                          Recheck Cond: ((date_created >= '2010-01-01'::date) AND (date_created <= '2015-12-31'::date))
                          Filter: ((NOT (hashed SubPlan 1)) AND (deal_status = 1))
                          Rows Removed by Filter: 21068
                          Heap Blocks: exact=8073
                          ->  Bitmap Index Scan on transactions_transaction_c69e55a4  (cost=0.00..18.02 rows=161 width=0) (actual time=20.153..20.153 rows=269021 loops=1)
                                Index Cond: ((date_created >= '2010-01-01'::date) AND (date_created <= '2015-12-31'::date))
                          SubPlan 1
                            ->  Hash Right Join  (cost=8799.57..42092.98 rows=1 width=4) (actual time=852.012..1086.671 rows=3395 loops=1)
                                  Hash Cond: (u1.transaction_id = u0.id)
                                  Filter: (u1.id IS NULL)
                                  Rows Removed by Filter: 995248
                                  ->  Seq Scan on transactions_commission u1  (cost=0.00..27020.63 rows=1039763 width=8) (actual time=0.003..221.750 rows=1039763 loops=1)
                                  ->  Hash  (cost=8395.92..8395.92 rows=32292 width=4) (actual time=153.984..153.984 rows=272663 loops=1)
                                        Buckets: 131072 (originally 32768)  Batches: 4 (originally 1)  Memory Usage: 3425kB
                                        ->  Seq Scan on transactions_transaction u0  (cost=0.00..8395.92 rows=32292 width=4) (actual time=0.003..71.988 rows=272663 loops=1)
Planning time: 0.410 ms
Execution time: 470237.400 ms

And this is the EXPLAIN ANALYZE output in 9.4.5:

Insert on tmp_joined_transactions_9gkgptn5xcp9  (cost=88544.31..88576.62 rows=2154 width=4) (actual time=1356.065..1356.065 rows=0 loops=1)
  ->  Unique  (cost=88544.31..88555.08 rows=2154 width=4) (actual time=1347.480..1350.548 rows=4715 loops=1)
        ->  Sort  (cost=88544.31..88549.70 rows=2154 width=4) (actual time=1347.477..1348.432 rows=4715 loops=1)
              Sort Key: transactions_transaction.id
              Sort Method: quicksort  Memory: 414kB
              ->  Hash Join  (cost=72700.01..88425.06 rows=2154 width=4) (actual time=1107.077..1345.650 rows=4715 loops=1)
                    Hash Cond: (transactions_transaction.id = tmp_joined_transactions_75chlsokrsev.transaction_id)
                    ->  Seq Scan on transactions_transaction  (cost=72565.61..87199.11 rows=122287 width=4) (actual time=1104.855..1269.783 rows=251736 loops=1)
                          Filter: ((NOT (hashed SubPlan 1)) AND (date_created >= '2010-01-01'::date) AND (date_created <= '2015-12-31'::date) AND (deal_status = 1))
                          Rows Removed by Filter: 34441
                          SubPlan 1
                            ->  Hash Right Join  (cost=16466.94..72565.61 rows=1 width=4) (actual time=887.763..1103.391 rows=3551 loops=1)
                                  Hash Cond: (u1.transaction_id = u0.id)
                                  Filter: (u1.id IS NULL)
                                  Rows Removed by Filter: 1047414
                                  ->  Seq Scan on transactions_commission u1  (cost=0.00..28019.85 rows=1073685 width=8) (actual time=0.004..360.549 rows=1072675 loops=1)
                                  ->  Hash  (cost=11771.75..11771.75 rows=286175 width=4) (actual time=167.675..167.675 rows=286177 loops=1)
                                        Buckets: 16384  Batches: 4  Memory Usage: 2528kB
                                        ->  Seq Scan on transactions_transaction u0  (cost=0.00..11771.75 rows=286175 width=4) (actual time=0.002..84.517 rows=286177 loops=1)
                    ->  Hash  (cost=71.40..71.40 rows=5040 width=4) (actual time=2.202..2.202 rows=4715 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 166kB
                          ->  Seq Scan on tmp_joined_transactions_75chlsokrsev  (cost=0.00..71.40 rows=5040 width=4) (actual time=0.009..0.983 rows=4715 loops=1)
Planning time: 0.448 ms
Execution time: 1356.648 ms

The query is slower in 9.5.2 by several orders of magnitude (1300 ms vs 470000 ms).

Does anyone know if there is a way to modify our query to guide the query optimizer back in the right direction?

Thank you for your assistance,
Markus Erdmann

Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From
"David G. Johnston"
Date:
On Thu, Jun 16, 2016 at 8:40 PM, Erdmann, Markus @ Bellevue <Markus.Erdmann@cbre.com> wrote:
Hello,

We’re trying to debug a performance issue affecting our staging database, and we’ve narrowed it down to a difference in the query optimizer in 9.5.2. Upgrading to 9.5 is important for us because we need the ability to import foreign schemas.

This is the query we’re running:

CREATE TEMP TABLE tmp_joined_transactions_9gkgptn5xcp9 ( transaction_id integer PRIMARY KEY );

INSERT INTO tmp_joined_transactions_9gkgptn5xcp9 (transaction_id)

​This...
SELECT DISTINCT ON ("transactions_transaction"."id") "transactions_transaction"."id"

​​DISTINCT is a code smell.  DISTINCT ON less so - it helps to avoid self-joins - but your inclusion of ON here is pointless since the only output column is "id".

​As written there should be no way to get duplicate "id"s into the output result.  Or, if the tmp_joined_transactions relationship is 1-to-many you should instead use a semi-join instead of an inner join.

FROM "transactions_transaction" , "tmp_joined_transactions_75chlsokrsev"
WHERE
 
​Here...​

(NOT ("transactions_transaction"."id"
IN (SELECT U0."id" AS Col1
FROM "transactions_transaction" U0
LEFT OUTER JOIN "transactions_commission" U1
ON ( U0."id" = U1."transaction_id" )
WHERE U1."id" IS NULL)) 

​Making this an anti-join (NOT EXISTS instead of NOT IN) stands out as an easy improvement to try:

​It also makes the logic clearer since you seem to have a double-negative here which means you really want a semi-join (which I wrote below)

WHERE EXISTS (SELECT 1 FROM transactions_commission WHERE transactions_transaction.id = transactions_commission.transaction_id)

I won't promise this gives the same answer...I don't have enough spare brain power or the ability to test it...but its seems correct.

AND "transactions_transaction"."date_created" >= '2010-01-01'::date
AND "transactions_transaction"."date_created" <= '2015-12-31'::date
AND "transactions_transaction"."deal_status" IN (1)

​Also...
 
AND (transactions_transaction.id = tmp_joined_transactions_75chlsokrsev.transaction_id))

​This is style but I'm really a fan of using ANSI JOIN syntax...turning the above into a <JOIN tmp_* ON​>

ORDER BY "transactions_transaction"."id" ASC;


​The regression itself someone else would need to comment on.

David J.

Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From
Tom Lane
Date:
"Erdmann, Markus @ Bellevue" <Markus.Erdmann@cbre.com> writes:
> We�re trying to debug a performance issue affecting our staging database, and we�ve narrowed it down to a difference
inthe query optimizer in 9.5.2. Upgrading to 9.5 is important for us because we need the ability to import foreign
schemas.

I think the core of the problem is the large rowcount estimation error
here:

>                           ->  Bitmap Index Scan on transactions_transaction_c69e55a4  (cost=0.00..18.02 rows=161
width=0)(actual time=20.153..20.153 rows=269021 loops=1) 
>                                 Index Cond: ((date_created >= '2010-01-01'::date) AND (date_created <=
'2015-12-31'::date))

That's a pretty simple condition and it's hard to believe that 9.5 does it
much differently than 9.4 did.  Perhaps you forgot to ANALYZE, or were
using a larger statistics target in the 9.4 installation?

            regards, tom lane


Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From
"Erdmann, Markus @ Bellevue"
Date:
Thank you Tom and David for your very helpful replies. We dumped and
restored the RDS staging database on a local installation of pg and were
not able to reproduce the issue in 9.5.2, which led us to try running a
VACUUM ANALYZE and recreating indexes. After this we no longer saw a
discrepancy between the query plan in the separate environments. Is this
what you meant, Tom, by making sure to ANALYZE? Or did you mean including
ANALYZE in EXPLAIN ANALYZE?


On 6/17/16, 11:17 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

>"Erdmann, Markus @ Bellevue" <Markus.Erdmann@cbre.com> writes:
>> We¹re trying to debug a performance issue affecting our staging
>>database, and we¹ve narrowed it down to a difference in the query
>>optimizer in 9.5.2. Upgrading to 9.5 is important for us because we need
>>the ability to import foreign schemas.
>
>I think the core of the problem is the large rowcount estimation error
>here:
>
>>                           ->  Bitmap Index Scan on
>>transactions_transaction_c69e55a4  (cost=0.00..18.02 rows=161 width=0)
>>(actual time=20.153..20.153 rows=269021 loops=1)
>>                                 Index Cond: ((date_created >=
>>'2010-01-01'::date) AND (date_created <= '2015-12-31'::date))
>
>That's a pretty simple condition and it's hard to believe that 9.5 does it
>much differently than 9.4 did.  Perhaps you forgot to ANALYZE, or were
>using a larger statistics target in the 9.4 installation?
>
>            regards, tom lane



Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From
"David G. Johnston"
Date:
On Fri, Jun 17, 2016 at 5:05 PM, Erdmann, Markus @ Bellevue <Markus.Erdmann@cbre.com> wrote:
Thank you Tom and David for your very helpful replies. We dumped and
restored the RDS staging database on a local installation of pg and were
not able to reproduce the issue in 9.5.2, which led us to try running a
VACUUM ANALYZE and recreating indexes. After this we no longer saw a
discrepancy between the query plan in the separate environments. Is this
what you meant, Tom, by making sure to ANALYZE? Or did you mean including
ANALYZE in EXPLAIN ANALYZE?


​Please don't top-post.

Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt - and possibly helps - but wasn't required).  ANALYZE recomputes the statistics for your database.  The apparent problem was that those statistics were wrong which causes the planner to choose the wrong plan.

EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner both end up using the bad statistics.  The addition of ANALYZE to EXPLAIN simply tells the system to not only explain your query but to execute it as well (but discard the results).  Aside from sharing the same 7 characters the two words have nothing in common.

I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you only get to explain one statement at a time.

David J.

Re: Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

From
"Erdmann, Markus @ Bellevue"
Date:

On Jun 20, 2016, at 1:36 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

​Please don't top-post.

Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt - and possibly helps - but wasn't required).  ANALYZE recomputes the statistics for your database.  The apparent problem was that those statistics were wrong which causes the planner to choose the wrong plan.

EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner both end up using the bad statistics.  The addition of ANALYZE to EXPLAIN simply tells the system to not only explain your query but to execute it as well (but discard the results).  Aside from sharing the same 7 characters the two words have nothing in common.

I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you only get to explain one statement at a time.

David J.


Thank you, David. My confusion originated from a lack of familiarity with the ANALYZE command. Your responses and Tom’s response have been very enlightening.

The head-scratcher for us is that our statistics became so out of date even though we have the autovacuum daemon enabled in RDS, and according to the docs that does run ANALYZE periodically. Now we know (thanks to your help) to check for this issue immediately when the planner is showing a large disparity between the estimated and actual cost.

Markus E.