9.5 regression with unwanted nested loop left join - Mailing list pgsql-general

From Greg Sabino Mullane
Subject 9.5 regression with unwanted nested loop left join
Date
Msg-id 20160524143357.GK8843@localhost.localdomain
Whole thread Raw
Responses Re: 9.5 regression with unwanted nested loop left join  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
We are trying to upgrade a client app to Postgres 9.5, but are running
into some performance regression issues (even though the curent db is 8.x!).
One in particular that is puzzling me involves a query that keeps slipping
into a nested loop left join, rather than a much preferred hash join.
The nested loop is many orders of magnitude slower than the hash. My
testing showed that the 8.x series handles the query fine, but as of 9.0
and up it slips into the nested loop when certain criteria are met; if
the where clause asks for a certain number of dates. The number varies,
but in general anything less than 130 days goes to the slow nested loop.

I made a test case that illustrates the problem. Yes, it could be written
better to alleviate the issue, but I'd rather find some way of tuning the
db (yes I said the T word) rather than rewriting queries one by one.

Setup:

DROP TABLE gregtest, gregtest_status;
CREATE TABLE gregtest(id INT, order_date TIMESTAMPTZ, status TEXT);
INSERT INTO gregtest SELECT (123*RANDOM())::INT, NOW() - '1 DAY'::INTERVAL * 1234*RANDOM(),
  (10*RANDOM())::INT FROM GENERATE_SERIES(1,500000);
CREATE TABLE gregtest_status(id INT, update_time TIMESTAMPTZ, status TEXT);
INSERT INTO gregtest_status SELECT (123*RANDOM())::INT, NOW() - '1 DAY'::INTERVAL * 1234*RANDOM(),
  (10*RANDOM())::INT FROM GENERATE_SERIES(1,1000000);
ANALYZE;

Fast and slow examples:

EXPLAIN ANALYZE SELECT .* FROM gregtest t
LEFT JOIN (
           SELECT gs.id, gs.status
           FROM gregtest_status gs
           JOIN (
                 SELECT id, min(update_time) AS update_time
                 FROM gregtest_status
                 WHERE update_time >= '2015-01-01'::date
                 GROUP BY id
           ) gs2 ON gs.id = gs2.id AND gs.update_time = gs2.update_time
      ) foo ON t.id = foo.id
WHERE t.order_date > '2016-01-20';
                                                                    QUERY PLAN
                          

------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (C=30990.39..54869.49 R=50070 W=14) (AT=842.020..1261.843 R=50503 L=1)
   Hash Cond: (gs.id = t.id)
   ->  Hash Join  (C=20929.52..44799.53 R=1 W=4) (AT=624.288..1016.711 R=124 L=1)
         Hash Cond: ((gs.id = gregtest_status.id) AND (gs.update_time = (min(gregtest_status.update_time))))
         ->  Seq Scan on gregtest_status gs  (C=0.00..16370.00 R=1000000 W=12) (AT=0.012..155.630 R=1000000 L=1)
         ->  Hash  (C=20928.74..20928.74 R=52 W=12) (AT=614.657..614.657 R=124 L=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  HashAggregate  (C=20927.70..20928.22 R=52 W=12) (AT=614.568..614.603 R=124 L=1)
                     Group Key: gregtest_status.id
                     ->  Seq Scan on gregtest_status  (C=0.00..18870.00 R=411540 W=12) (AT=0.025..417.488 R=412301 L=1)
                           Filter: (update_time >= '2015-01-01'::date)
                           Rows Removed by Filter: 587699
   ->  Hash  (C=9435.00..9435.00 R=50070 W=14) (AT=217.594..217.594 R=50503 L=1)
         Buckets: 65536  Batches: 1  Memory Usage: 2981kB
         ->  Seq Scan on gregtest t  (C=0.00..9435.00 R=50070 W=14) (AT=0.028..196.038 R=50503 L=1)
               Filter: (order_date > '2016-01-20 00:00:00-05'::timestamp with time zone)
               Rows Removed by Filter: 449497
 Planning time: 0.359 ms
 Execution time: 1265.709 ms

EXPLAIN ANALYZE SELECT t.* FROM gregtest t
LEFT JOIN (
           SELECT gs.id, gs.status
           FROM gregtest_status gs
           JOIN (
                 SELECT id, min(update_time) AS update_time
                 FROM gregtest_status
                 WHERE update_time >= '2015-01-01'::date
                 GROUP BY id
           ) gs2 ON gs.id = gs2.id AND gs.update_time = gs2.update_time
      ) foo ON t.id = foo.id
WHERE t.order_date > '2016-01-24';
                                                                       QUERY PLAN
                                

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (C=20929.52..54961.99 R=48497 W=14) (AT=948.436..2871.727 R=48856 L=1)
   Join Filter: (t.id = gs.id)
   Rows Removed by Join Filter: 6009288
   ->  Seq Scan on gregtest t  (C=0.00..9435.00 R=48497 W=14) (AT=0.023..113.814 R=48856 L=1)
         Filter: (order_date > '2016-01-24 00:00:00-05'::timestamp with time zone)
         Rows Removed by Filter: 451144
   ->  Materialize  (C=20929.52..44799.54 R=1 W=4) (AT=0.013..0.033 R=124 L=48856)
         ->  Hash Join  (C=20929.52..44799.53 R=1 W=4) (AT=626.631..1009.945 R=124 L=1)
               Hash Cond: ((gs.id = gregtest_status.id) AND (gs.update_time = (min(gregtest_status.update_time))))
               ->  Seq Scan on gregtest_status gs  (C=0.00..16370.00 R=1000000 W=12) (AT=0.010..150.791 R=1000000 L=1)
               ->  Hash  (C=20928.74..20928.74 R=52 W=12) (AT=617.172..617.172 R=124 L=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 14kB
                     ->  HashAggregate  (C=20927.70..20928.22 R=52 W=12) (AT=617.085..617.113 R=124 L=1)
                           Group Key: gregtest_status.id
                           ->  Seq Scan on gregtest_status  (C=0.00..18870.00 R=411540 W=12) (AT=0.013..423.964
R=412301L=1) 
                                 Filter: (update_time >= '2015-01-01'::date)
                                 Rows Removed by Filter: 587699
 Planning time: 0.970 ms
 Execution time: 2875.785 ms


Sometimes the final date in the where clause must be adjusted to see the nested loop.

The actual production query is > 200x as slow.

Yes, the 2015 date is hardcoded in there. All that changes is the final date in the WHERE clause.

Adding indexes does not seem to help, so this example left them out.

Bumping default_statistics_target to 5000 merely moves the first appearance of nested loop *earlier* a few days.

Changing cpu_tuple_cost moves it forward a bit, but not enough to cover the actual use case (now() - 1 month).

Leaving out the 'analyze' results in a much better plan, but having the tables stay unanalyzed does not seem
like a great solution. :) I'm hoping I am overlooking something obvious.


--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Attachment

pgsql-general by date:

Previous
From: Umair Shahid
Date:
Subject: Members in the Middle East?
Next
From: Tom Lane
Date:
Subject: Re: 9.5 regression with unwanted nested loop left join