Re: complex query performance assistance request

From: John Mendenhall
Subject: Re: complex query performance assistance request
Date: ,
Msg-id: 20050823190525.GA27623@calvin.surfutopia.net
(view: Whole thread, Raw)
In response to: Re: complex query performance assistance request  (Tom Lane)
List: pgsql-performance

Tree view

complex query performance assistance request  (John Mendenhall, )
 Re: complex query performance assistance request  (John Mendenhall, )
  Re: complex query performance assistance request  (Tom Lane, )
   Re: complex query performance assistance request  (John Mendenhall, )
    Re: complex query performance assistance request  (Tom Lane, )
     Re: complex query performance assistance request  (John Mendenhall, )

Tom,

> > Would it be best to attempt to rewrite it for IN?
> > Or, should we try to tie it in with a join?
>
> Couldn't say without a deeper understanding of what you're trying to
> accomplish.

Here are the results of each SQL rewrite.

The first pass, I rewrote it as c.id IN ():
-----
LOG:  duration: 2669.682 ms  statement: explain analyze
SELECT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 p.id AS partner_id,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
 co.name AS contact_country,
 c.master_key_token
FROM
 sales_reps sr
 JOIN partners p ON (sr.id = p.sales_rep_id)
 JOIN contacts c ON (p.id = c.partner_id)
 JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
 JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
WHERE
 c.lead_deleted IS NULL
 AND c.id IN
 (
  SELECT
   lr.contact_id
  FROM
   lead_requests lr,
   lead_request_status lrs
  WHERE
   lr.status_id = lrs.id AND
   lrs.is_closed = 0
 )
ORDER BY
 contact_company, contact_id
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4413.35..4416.16 rows=1123 width=102) (actual time=2617.069..2617.719 rows=1071 loops=1)
   Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id 
   ->  Merge Join  (cost=4311.31..4356.45 rows=1123 width=102) (actual time=2549.717..2589.398 rows=1071 loops=1)
         Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
         ->  Sort  (cost=14.00..14.61 rows=242 width=19) (actual time=9.765..9.966 rows=240 loops=1)
               Sort Key: lower((co.code)::text)
               ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.142..5.118 rows=242
loops=1)
         ->  Sort  (cost=4297.31..4299.63 rows=928 width=95) (actual time=2539.685..2540.913 rows=1071 loops=1)
               Sort Key: lower((c.country)::text)
               ->  Merge IN Join  (cost=4163.02..4251.57 rows=928 width=95) (actual time=2377.539..2524.844 rows=1071
loops=1)
                     Merge Cond: ("outer".id = "inner".contact_id)
                     ->  Sort  (cost=1835.53..1851.27 rows=6296 width=95) (actual time=1843.866..1853.193 rows=6349
loops=1)
                           Sort Key: c.id
                           ->  Merge Join  (cost=75.65..1438.24 rows=6296 width=95) (actual time=51.713..1505.633
rows=6349loops=1) 
                                 Merge Cond: ("outer".partner_id = "inner".id)
                                 ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..5303.84
rows=40243width=85) (actual time=0.077..584.736 rows=40267 loops=1) 
                                       Filter: (lead_deleted IS NULL)
                                 ->  Sort  (cost=75.65..76.37 rows=290 width=20) (actual time=51.508..62.288 rows=6462
loops=1)
                                       Sort Key: p.id
                                       ->  Merge Join  (cost=59.24..63.79 rows=290 width=20) (actual
time=30.152..38.281rows=395 loops=1) 
                                             Merge Cond: ("outer".id = "inner".sales_rep_id)
                                             ->  Sort  (cost=2.42..2.52 rows=39 width=10) (actual time=1.390..1.505
rows=39loops=1) 
                                                   Sort Key: sr.id
                                                   ->  Seq Scan on sales_reps sr  (cost=0.00..1.39 rows=39 width=10)
(actualtime=0.026..0.380 rows=39 loops=1) 
                                             ->  Sort  (cost=56.82..57.55 rows=290 width=20) (actual
time=28.558..29.120rows=395 loops=1) 
                                                   Sort Key: p.sales_rep_id
                                                   ->  Nested Loop  (cost=24.35..44.96 rows=290 width=20) (actual
time=0.191..21.408rows=395 loops=1) 
                                                         Join Filter: ("inner".classification_id = "outer".id)
                                                         ->  Seq Scan on partner_classification pc  (cost=0.00..1.04
rows=2width=10) (actual time=0.068..0.121 rows=2 loops=1) 
                                                               Filter: ((classification)::text <> 'Sales Rep'::text)
                                                         ->  Materialize  (cost=24.35..28.70 rows=435 width=30) (actual
time=0.029..5.380rows=435 loops=2) 
                                                               ->  Seq Scan on partners p  (cost=0.00..24.35 rows=435
width=30)(actual time=0.038..8.161 rows=435 loops=1) 
                     ->  Sort  (cost=2327.50..2351.43 rows=9573 width=11) (actual time=533.508..535.629 rows=1742
loops=1)
                           Sort Key: lr.contact_id
                           ->  Merge Join  (cost=1520.94..1694.49 rows=9573 width=11) (actual time=302.932..461.644
rows=1745loops=1) 
                                 Merge Cond: ("outer".id = "inner".status_id)
                                 ->  Sort  (cost=1.28..1.30 rows=8 width=10) (actual time=0.392..0.404 rows=7 loops=1)
                                       Sort Key: lrs.id
                                       ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16 rows=8 width=10)
(actualtime=0.117..0.280 rows=7 loops=1) 
                                             Filter: (is_closed = 0::numeric)
                                 ->  Sort  (cost=1519.66..1558.55 rows=15556 width=21) (actual time=302.423..321.939
rows=15387loops=1) 
                                       Sort Key: lr.status_id
                                       ->  Seq Scan on lead_requests lr  (cost=0.00..436.56 rows=15556 width=21)
(actualtime=0.029..164.708 rows=15559 loops=1) 
 Total runtime: 2632.987 ms
(44 rows)
-----

The second pass, I rewrote it to tie in with a JOIN, adding
a DISTINCT at the top to get rid of the duplicates:
-----
LOG:  duration: 3285.645 ms  statement: explain analyze
SELECT DISTINCT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 p.id AS partner_id,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
 co.name AS contact_country,
 c.master_key_token
FROM
 sales_reps sr
 JOIN partners p ON (sr.id = p.sales_rep_id)
 JOIN contacts c ON (p.id = c.partner_id)
 JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
 JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
 JOIN lead_requests lr ON (c.id = lr.contact_id)
 JOIN lead_request_status lrs ON (lr.status_id = lrs.id AND lrs.is_closed = 0)
WHERE
 c.lead_deleted IS NULL
ORDER BY
 contact_company, contact_id
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=3039.78..3071.46 rows=1810 width=102) (actual time=3219.707..3228.637 rows=1071 loops=1)
   ->  Sort  (cost=3039.78..3044.31 rows=1810 width=102) (actual time=3219.695..3220.560 rows=1118 loops=1)
         Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id, sr.id, p.id, co.name, c.master_key_token 
         ->  Merge Join  (cost=2870.92..2941.85 rows=1810 width=102) (actual time=3156.788..3188.338 rows=1118 loops=1)
               Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
               ->  Sort  (cost=14.00..14.61 rows=242 width=19) (actual time=9.196..9.445 rows=240 loops=1)
                     Sort Key: lower((co.code)::text)
                     ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.128..3.914
rows=242loops=1) 
               ->  Sort  (cost=2856.92..2860.66 rows=1496 width=95) (actual time=3147.340..3148.477 rows=1118 loops=1)
                     Sort Key: lower((c.country)::text)
                     ->  Merge Join  (cost=2750.88..2778.03 rows=1496 width=95) (actual time=3008.933..3132.122
rows=1118loops=1) 
                           Merge Cond: ("outer".id = "inner".status_id)
                           ->  Sort  (cost=1.28..1.30 rows=8 width=10) (actual time=0.366..0.379 rows=7 loops=1)
                                 Sort Key: lrs.id
                                 ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16 rows=8 width=10) (actual
time=0.094..0.254rows=7 loops=1) 
                                       Filter: (is_closed = 0::numeric)
                           ->  Sort  (cost=2749.60..2755.67 rows=2430 width=105) (actual time=3008.396..3023.502
rows=9992loops=1) 
                                 Sort Key: lr.status_id
                                 ->  Merge Join  (cost=1835.53..2612.95 rows=2430 width=105) (actual
time=1975.714..2912.632rows=10089 loops=1) 
                                       Merge Cond: ("outer".contact_id = "inner".id)
                                       ->  Index Scan using lead_requests_contact_id_idx on lead_requests lr
(cost=0.00..683.87rows=15556 width=21) (actual time=0.073..247.148 rows=15556 loops=1) 
                                       ->  Sort  (cost=1835.53..1851.27 rows=6296 width=95) (actual
time=1975.273..1988.664rows=10089 loops=1) 
                                             Sort Key: c.id
                                             ->  Merge Join  (cost=75.65..1438.24 rows=6296 width=95) (actual
time=56.107..1625.186rows=6349 loops=1) 
                                                   Merge Cond: ("outer".partner_id = "inner".id)
                                                   ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..5303.84rows=40243 width=85) (actual time=0.047..580.311 rows=40267 loops=1) 
                                                         Filter: (lead_deleted IS NULL)
                                                   ->  Sort  (cost=75.65..76.37 rows=290 width=20) (actual
time=55.935..65.502rows=6462 loops=1) 
                                                         Sort Key: p.id
                                                         ->  Merge Join  (cost=59.24..63.79 rows=290 width=20) (actual
time=31.765..39.925rows=395 loops=1) 
                                                               Merge Cond: ("outer".id = "inner".sales_rep_id)
                                                               ->  Sort  (cost=2.42..2.52 rows=39 width=10) (actual
time=1.072..1.117rows=39 loops=1) 
                                                                     Sort Key: sr.id
                                                                     ->  Seq Scan on sales_reps sr  (cost=0.00..1.39
rows=39width=10) (actual time=0.022..0.312 rows=39 loops=1) 
                                                               ->  Sort  (cost=56.82..57.55 rows=290 width=20) (actual
time=30.489..30.893rows=395 loops=1) 
                                                                     Sort Key: p.sales_rep_id
                                                                     ->  Nested Loop  (cost=24.35..44.96 rows=290
width=20)(actual time=0.159..23.356 rows=395 loops=1) 
                                                                           Join Filter: ("inner".classification_id =
"outer".id)
                                                                           ->  Seq Scan on partner_classification pc
(cost=0.00..1.04rows=2 width=10) (actual time=0.047..0.086 rows=2 loops=1) 
                                                                                 Filter: ((classification)::text <>
'SalesRep'::text) 
                                                                           ->  Materialize  (cost=24.35..28.70 rows=435
width=30)(actual time=0.028..6.124 rows=435 loops=2) 
                                                                                 ->  Seq Scan on partners p
(cost=0.00..24.35rows=435 width=30) (actual time=0.039..9.383 rows=435 loops=1) 
 Total runtime: 3241.139 ms
(43 rows)
-----

The DISTINCT ON condition was about the same amount of time,
statistically.  Removing the DISTINCT entirely only gave a
very slight improvement in performance.

So, the bottom line is, unless there are other ideas to
improve the performance, I will most likely rewrite our
application to use the c.id IN () option.

Thank you very much for your input and suggestions.

JohnM

--
John Mendenhall

surf utopia
internet services


pgsql-performance by date:

From: "Jeffrey W. Baker"
Date:
Subject: Re: Read/Write block sizes (Was: Caching by Postgres)
From: Josh Berkus
Date:
Subject: Re: Caching by Postgres