Re: [GENERAL] Analyzing performance regression from 9.2 to 9.6 - Mailing list pgsql-general

From Dave Peticolas
Subject Re: [GENERAL] Analyzing performance regression from 9.2 to 9.6
Date
Msg-id CAPRbp07yGdroT_1Fv5R4rsrgW4QD7QmV-DQvUcFjEhfDsFy2Kw@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Analyzing performance regression from 9.2 to 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Am I right in guessing that report_submission.id is a
> declared-not-null column, so that the join  FROM "report_skilltype"
> should be understood as an anti-join?  

You are exactly right.

> Any chance of whacking your ORM upside the head to the point where it would emit that?

In this case it turns out to be pretty simple to do and it makes all the difference.
Thank you so much, this has been very enlightening. New plans for 9.6 and 9.2 below.

Limit  (cost=203873.66..203873.87 rows=86 width=66) (actual time=2776.352..2776.352 rows=0 loops=1)
   ->  Sort  (cost=203873.66..204380.31 rows=202660 width=66) (actual time=2776.350..2776.350 rows=0 loops=1)
         Sort Key: report_typereport.id DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Hash Right Join  (cost=157720.28..196348.63 rows=202660 width=66) (actual time=2776.343..2776.343 rows=0 loops=1)
               Hash Cond: (report_typereport.skill_type_id = report_skilltype.uuid)
               ->  Seq Scan on report_typereport  (cost=0.00..34513.92 rows=824892 width=20) (never executed)
               ->  Hash  (cost=155187.03..155187.03 rows=202660 width=62) (actual time=2775.983..2775.983 rows=0 loops=1)
                     Buckets: 262144  Batches: 1  Memory Usage: 2048kB
                     ->  Hash Join  (cost=81562.64..155187.03 rows=202660 width=62) (actual time=2775.982..2775.982 rows=0 loops=1)
                           Hash Cond: (report_type.id = report_skilltype.type_id)
                           ->  Seq Scan on report_type  (cost=0.00..55456.48 rows=843731 width=4) (actual time=0.013..496.357 rows=847827 loops=1)
                                 Filter: ((NOT is_current) AND (category <> ALL ('{7,8}'::integer[])))
                                 Rows Removed by Filter: 809331
                           ->  Hash  (cost=72311.12..72311.12 rows=398041 width=62) (actual time=1998.763..1998.763 rows=463108 loops=1)
                                 Buckets: 524288  Batches: 2  Memory Usage: 25322kB
                                 ->  Hash Anti Join  (cost=16689.46..72311.12 rows=398041 width=62) (actual time=252.111..1815.212 rows=463108 loops=1)
                                       Hash Cond: (report_skilltype.uuid = report_submission.skill_type_id)
                                       ->  Hash Join  (cost=75.89..49200.31 rows=566266 width=62) (actual time=1.342..1164.145 rows=950561 loops=1)
                                             Hash Cond: (report_skilltype.skill_id = kbdata_skill.id)
                                             ->  Seq Scan on report_skilltype  (cost=13.51..40399.71 rows=818608 width=62) (actual time=0.232..661.263 rows=1637183 loops=1)
                                                   Filter: (NOT (hashed SubPlan 1))
                                                   Rows Removed by Filter: 33
                                                   SubPlan 1
                                                     ->  Hash Join  (cost=1.05..13.07 rows=176 width=4) (actual time=0.064..0.206 rows=26 loops=1)
                                                           Hash Cond: (u2.product_id = u3.id)
                                                           ->  Seq Scan on product_kbdata_skills u2  (cost=0.00..8.28 rows=528 width=8) (actual time=0.003..0.083 rows=528 loops=1)
                                                           ->  Hash  (cost=1.04..1.04 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
                                                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                 ->  Seq Scan on product u3  (cost=0.00..1.04 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=1)
                                                                       Filter: ((slug)::text = 'product1'::text)
                                                                       Rows Removed by Filter: 2
                                             ->  Hash  (cost=57.68..57.68 rows=376 width=12) (actual time=1.104..1.104 rows=374 loops=1)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 25kB
                                                   ->  Hash Join  (cost=32.00..57.68 rows=376 width=12) (actual time=0.600..0.977 rows=374 loops=1)
                                                         Hash Cond: (kbdata_skill.id = product_kbdata_skills.skill_id)
                                                         ->  Seq Scan on kbdata_skill  (cost=0.00..19.79 rows=542 width=4) (actual time=0.008..0.198 rows=542 loops=1)
                                                               Filter: (external_uuid <> 'CE9EC8A7-80D8-42A2-B506-0C1B7248CF8C'::uuid)
                                                               Rows Removed by Filter: 1
                                                         ->  Hash  (cost=27.16..27.16 rows=387 width=8) (actual time=0.585..0.585 rows=374 loops=1)
                                                               Buckets: 1024  Batches: 1  Memory Usage: 23kB
                                                               ->  Hash Join  (cost=11.71..27.16 rows=387 width=8) (actual time=0.193..0.483 rows=374 loops=1)
                                                                     Hash Cond: (product_kbdata_skills.skill_id = kbdata_skill_aspects.skill_id)
                                                                     ->  Seq Scan on product_kbdata_skills  (cost=0.00..8.28 rows=528 width=4) (actual time=0.003..0.107 rows=528 loops=1)
                                                                           Filter: (product_id IS NOT NULL)
                                                                     ->  Hash  (cost=6.87..6.87 rows=387 width=4) (actual time=0.183..0.183 rows=387 loops=1)
                                                                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
                                                                           ->  Seq Scan on kbdata_skill_aspects  (cost=0.00..6.87 rows=387 width=4) (actual time=0.002..0.096 rows=387 loops=1)
                                                                                 Filter: (aspect_id IS NOT NULL)
                                       ->  Hash  (cost=10488.81..10488.81 rows=489981 width=16) (actual time=250.096..250.096 rows=489981 loops=1)
                                             Buckets: 524288  Batches: 1  Memory Usage: 27064kB
                                             ->  Seq Scan on report_submission  (cost=0.00..10488.81 rows=489981 width=16) (actual time=0.004..128.478 rows=489981 loops=1)
 Planning time: 13.808 ms
 Execution time: 2776.598 ms
(54 rows)

 Limit  (cost=220583.50..220583.71 rows=86 width=66) (actual time=5012.934..5012.934 rows=0 loops=1)
   ->  Sort  (cost=220583.50..221311.03 rows=291012 width=66) (actual time=5012.933..5012.933 rows=0 loops=1)
         Sort Key: report_typereport.id
         Sort Method: quicksort  Memory: 25kB
         ->  Hash Join  (cost=168290.94..209777.84 rows=291012 width=66) (actual time=5012.927..5012.927 rows=0 loops=1)
               Hash Cond: (report_skilltype.skill_id = kbdata_skill.id)
               ->  Hash Right Join  (cost=168228.56..206606.96 rows=291208 width=66) (actual time=3772.672..4954.449 rows=351045 loops=1)
                     Hash Cond: (report_typereport.skill_type_id = report_skilltype.uuid)
                     ->  Seq Scan on report_typereport  (cost=0.00..33918.40 rows=803840 width=20) (actual time=0.008..364.048 rows=805350 loops=1)
                     ->  Hash  (cost=164588.46..164588.46 rows=291208 width=62) (actual time=3772.568..3772.568 rows=349748 loops=1)
                           Buckets: 32768  Batches: 1  Memory Usage: 30074kB
                           ->  Hash Anti Join  (cost=85026.50..164588.46 rows=291208 width=62) (actual time=953.490..3608.328 rows=349748 loops=1)
                                 Hash Cond: (report_skilltype.uuid = report_submission.skill_type_id)
                                 ->  Hash Join  (cost=68845.95..138820.85 rows=411043 width=62) (actual time=709.782..2667.964 rows=826198 loops=1)
                                       Hash Cond: (report_skilltype.type_id = report_type.id)
                                       ->  Seq Scan on report_skilltype  (cost=13.51..40175.69 rows=809647 width=62) (actual time=0.217..668.918 rows=1619274 loops=1)
                                             Filter: (NOT (hashed SubPlan 1))
                                             Rows Removed by Filter: 33
                                             SubPlan 1
                                               ->  Hash Join  (cost=1.05..13.07 rows=176 width=4) (actual time=0.056..0.194 rows=26 loops=1)
                                                     Hash Cond: (u2.product_id = u3.id)
                                                     ->  Seq Scan on product_kbdata_skills u2  (cost=0.00..8.28 rows=528 width=8) (actual time=0.001..0.083 rows=528 loops=1)
                                                     ->  Hash  (cost=1.04..1.04 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
                                                           Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                                           ->  Seq Scan on product u3  (cost=0.00..1.04 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
                                                                 Filter: ((slug)::text = 'product1'::text)
                                                                 Rows Removed by Filter: 2
                                       ->  Hash  (cost=55200.19..55200.19 rows=830900 width=4) (actual time=709.334..709.334 rows=838440 loops=1)
                                             Buckets: 131072  Batches: 2  Memory Usage: 14741kB
                                             ->  Seq Scan on report_type  (cost=0.00..55200.19 rows=830900 width=4) (actual time=0.005..485.451 rows=838440 loops=1)
                                                   Filter: ((NOT is_current) AND (category <> ALL ('{7,8}'::integer[])))
                                                   Rows Removed by Filter: 801846
                                 ->  Hash  (cost=10236.91..10236.91 rows=475491 width=16) (actual time=243.534..243.534 rows=479828 loops=1)
                                       Buckets: 65536  Batches: 1  Memory Usage: 22492kB
                                       ->  Seq Scan on report_submission  (cost=0.00..10236.91 rows=475491 width=16) (actual time=0.003..115.188 rows=479828 loops=1)
               ->  Hash  (cost=57.68..57.68 rows=376 width=12) (actual time=1.189..1.189 rows=374 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 17kB
                     ->  Hash Join  (cost=32.00..57.68 rows=376 width=12) (actual time=0.627..1.085 rows=374 loops=1)
                           Hash Cond: (kbdata_skill.id = product_kbdata_skills.skill_id)
                           ->  Seq Scan on kbdata_skill  (cost=0.00..19.79 rows=542 width=4) (actual time=0.007..0.260 rows=542 loops=1)
                                 Filter: (external_uuid <> 'CE9EC8A7-80D8-42A2-B506-0C1B7248CF8C'::uuid)
                                 Rows Removed by Filter: 1
                           ->  Hash  (cost=27.16..27.16 rows=387 width=8) (actual time=0.616..0.616 rows=374 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 15kB
                                 ->  Hash Join  (cost=11.71..27.16 rows=387 width=8) (actual time=0.204..0.530 rows=374 loops=1)
                                       Hash Cond: (product_kbdata_skills.skill_id = kbdata_skill_aspects.skill_id)
                                       ->  Seq Scan on product_kbdata_skills  (cost=0.00..8.28 rows=528 width=4) (actual time=0.002..0.121 rows=528 loops=1)
                                             Filter: (product_id IS NOT NULL)
                                       ->  Hash  (cost=6.87..6.87 rows=387 width=4) (actual time=0.195..0.195 rows=387 loops=1)
                                             Buckets: 1024  Batches: 1  Memory Usage: 14kB
                                             ->  Seq Scan on kbdata_skill_aspects  (cost=0.00..6.87 rows=387 width=4) (actual time=0.001..0.085 rows=387 loops=1)
                                                   Filter: (aspect_id IS NOT NULL)
 Total runtime: 5013.070 ms
(53 rows)


On Sat, Sep 9, 2017 at 8:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Peticolas <dave@krondo.com> writes:
> Hi, I am trying to analyze a performance regression from 9.2.21 to 9.6.3.
> The query and execution plans are below with 9.6.3 first.

Hm.  Neither version is exactly covering itself with glory.  I'm not sure
why 9.6 doesn't pick the same plan as 9.2, but given the planner's
estimate that the report_submission/report_skilltype join will produce
only one row, the difference in the estimated costs would be negligible.
Since that join is actually producing over a million rows, your chances
of getting a good plan hinge on improving that estimate.

> I realize the query itself is probably not great and would benefit from a
> different approach, but I'd like to know if there are 9.6 settings I should
> look into to get a better plan without changing the query if possible.

Doubt it.  Am I right in guessing that report_submission.id is a
declared-not-null column, so that the join

 FROM "report_skilltype"
   LEFT OUTER JOIN "report_submission"
     ON ("report_skilltype"."uuid" = "report_submission"."skill_type_id")
   WHERE ... "report_submission"."id" IS NULL

should be understood as an anti-join?  The planner doesn't get that
at the moment, for implementation reasons that needn't concern us here.
But it would get it if you said

   WHERE ... "report_submission"."skill_type_id" IS NULL

i.e. constrain the join column to be null.  Any chance of whacking your
ORM upside the head to the point where it would emit that?

                        regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] Analyzing performance regression from 9.2 to 9.6
Next
From: Magnus Hagander
Date:
Subject: Re: [GENERAL] pg_ident mapping Kerberos Usernames