Thread: complex query performance assistance request

complex query performance assistance request

From
John Mendenhall
Date:
I need to improve the performance for the following
query.

Soon after I reboot my server, the following query takes
20 seconds the first time I run it.
When I run it after that, it takes approximately 2 seconds.
I understand the caching taking place (at the os or db
level, it doesn't matter here).

Here are the results of the explain analyze run:

-----
LOG:  duration: 6259.632 ms  statement: explain analyze
SELECT
c.id AS contact_id,
sr.id AS sales_rep_id,
LTRIM(RTRIM(sr.firstname || ' ' || sr.lastname)) AS sales_rep_name,
p.id AS partner_id,
p.company AS partner_company,
coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname)))
AS contact_company,
LTRIM(RTRIM(c.city || ' ' || c.state || ' ' || c.postalcode || ' ' || c.country))
AS contact_location,
c.phone AS contact_phone,
c.email AS contact_email,
co.name AS contact_country,
TO_CHAR(c.request_status_last_modified, 'mm/dd/yy hh12:mi pm')
AS request_status_last_modified,
TO_CHAR(c.request_status_last_modified, 'yyyymmddhh24miss')
AS rqst_stat_last_mdfd_sortable,
c.token_id,
c.master_key_token AS token
FROM
sales_reps sr
JOIN partners p ON (sr.id = p.sales_rep_id)
JOIN contacts c ON (p.id = c.partner_id)
JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
WHERE
c.lead_deleted IS NULL
AND EXISTS
(
SELECT
lr.id
FROM
lead_requests lr,
lead_request_status lrs
WHERE
c.id = lr.contact_id AND
lr.status_id = lrs.id  AND
lrs.is_closed = 0
)
ORDER BY
contact_company, contact_id;
                                                                                        QUERY PLAN
        


--------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------
 Sort  (cost=39093.16..39102.80 rows=3856 width=238) (actual time=6220.481..6221.188 rows=1071 loops=1)
   Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id 
   ->  Merge Join  (cost=38580.89..38863.48 rows=3856 width=238) (actual time=6015.751..6184.199 rows=1071 loops=1)
         Merge Cond: ("outer"."?column3?" = "inner"."?column19?")
         ->  Sort  (cost=14.00..14.61 rows=242 width=19) (actual time=9.250..9.500 rows=240 loops=1)
               Sort Key: lower((co.code)::text)
               ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.132..4.498 rows=242
loops=1)
         ->  Sort  (cost=38566.89..38574.86 rows=3186 width=225) (actual time=6005.644..6006.954 rows=1071 loops=1)
               Sort Key: lower((c.country)::text)
               ->  Merge Join  (cost=75.65..38381.50 rows=3186 width=225) (actual time=58.086..5979.287 rows=1071
loops=1)
                     Merge Cond: ("outer".partner_id = "inner".id)
                     ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..160907.39 rows=20106
width=171)(actual time=2.569..5816.985 rows=1547 loops=1) 
                           Filter: ((lead_deleted IS NULL) AND (subplan))
                           SubPlan
                             ->  Nested Loop  (cost=1.16..6.56 rows=2 width=10) (actual time=0.119..0.119 rows=0
loops=40261)
                                   Join Filter: ("outer".status_id = "inner".id)
                                   ->  Index Scan using lead_requests_contact_id_idx on lead_requests lr
(cost=0.00..4.86rows=3 width=20) (actual time=0.079..0.083 rows=0 loops=40261) 
                                         Index Cond: ($0 = contact_id)
                                   ->  Materialize  (cost=1.16..1.24 rows=8 width=10) (actual time=0.002..0.011 rows=6
loops=12592)
                                         ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16 rows=8 width=10)
(actualtime=0.083..0.270 rows=7 loops=1) 
                                               Filter: (is_closed = 0::numeric)
                     ->  Sort  (cost=75.65..76.37 rows=290 width=64) (actual time=55.073..56.990 rows=1334 loops=1)
                           Sort Key: p.id
                           ->  Merge Join  (cost=59.24..63.79 rows=290 width=64) (actual time=31.720..41.096 rows=395
loops=1)
                                 Merge Cond: ("outer".id = "inner".sales_rep_id)
                                 ->  Sort  (cost=2.42..2.52 rows=39 width=31) (actual time=1.565..1.616 rows=39
loops=1)
                                       Sort Key: sr.id
                                       ->  Seq Scan on sales_reps sr  (cost=0.00..1.39 rows=39 width=31) (actual
time=0.043..0.581rows=39 loops=1) 
                                 ->  Sort  (cost=56.82..57.55 rows=290 width=43) (actual time=29.921..30.310 rows=395
loops=1)
                                       Sort Key: p.sales_rep_id
                                       ->  Nested Loop  (cost=24.35..44.96 rows=290 width=43) (actual
time=0.169..22.566rows=395 loops=1) 
                                             Join Filter: ("inner".classification_id = "outer".id)
                                             ->  Seq Scan on partner_classification pc  (cost=0.00..1.04 rows=2
width=10)(actual time=0.059..0.102 rows=2 loops=1) 
                                                   Filter: ((classification)::text <> 'Sales Rep'::text)
                                             ->  Materialize  (cost=24.35..28.70 rows=435 width=53) (actual
time=0.023..5.880rows=435 loops=2) 
                                                   ->  Seq Scan on partners p  (cost=0.00..24.35 rows=435 width=53)
(actualtime=0.034..8.937 rows=435 loops=1) 
 Total runtime: 6225.791 ms
(37 rows)

-----

My first question is, what is the Materialize query plan element?
It happens twice, and usually when I see it, my query is slow.

My second and more important question is, does anyone have
any ideas or suggestions as to how I can increase the speed
for this query?

Things I have already done are, modify the joins and conditions
so it starts with smaller tables, thus the join set is smaller,
modify the configuration of the server to ensure index scans
are used as they should be, ran vacuumdb and analyze on the
database.

Thank you very much in advance for any pointers for additional
places I can look.

Thanks.

JohnM

--
John Mendenhall
john@surfutopia.net
surf utopia
internet services

Re: complex query performance assistance request

From
John Mendenhall
Date:
On Sat, 20 Aug 2005, John Mendenhall wrote:

> I need to improve the performance for the following
> query.

I have run the same query in the same database under
different schemas.  Each schema is pretty much the same
tables and indices.  One has an extra backup table and
an extra index which are not used in either of the explain
analyze plans.

The first schema is a development schema, which I used
to performance tune the server so everything was great.

Here are the current results of the sql run in the development
environment:

-----
LOG:  duration: 852.275 ms  statement: explain analyze
SELECT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 p.id AS partner_id,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
 co.name AS contact_country,
 c.master_key_token
FROM
 sales_reps sr
 JOIN partners p ON (sr.id = p.sales_rep_id)
 JOIN contacts c ON (p.id = c.partner_id)
 JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
 JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
WHERE
 c.lead_deleted IS NULL
 AND EXISTS
 (
  SELECT
   lr.id
  FROM
   lead_requests lr,
   lead_request_status lrs
  WHERE
   c.id = lr.contact_id AND
   lr.status_id = lrs.id  AND
   lrs.is_closed = 0
 )
ORDER BY
 contact_company, contact_id
                                                                                                   QUERY PLAN
        


--------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------
 Sort  (cost=18238.25..18238.27 rows=11 width=102) (actual time=823.721..823.915 rows=247 loops=1)
   Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id 
   ->  Hash Join  (cost=18230.34..18238.06 rows=11 width=102) (actual time=808.042..818.427 rows=247 loops=1)
         Hash Cond: (lower(("outer".code)::text) = lower(("inner".country)::text))
         ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.032..1.208 rows=242 loops=1)
         ->  Hash  (cost=18230.31..18230.31 rows=9 width=95) (actual time=807.554..807.554 rows=0 loops=1)
               ->  Merge Join  (cost=18229.98..18230.31 rows=9 width=95) (actual time=794.413..804.855 rows=247
loops=1)
                     Merge Cond: ("outer".sales_rep_id = "inner".id)
                     ->  Sort  (cost=18227.56..18227.59 rows=9 width=95) (actual time=793.132..793.502 rows=250
loops=1)
                           Sort Key: p.sales_rep_id
                           ->  Merge Join  (cost=18227.26..18227.42 rows=9 width=95) (actual time=782.832..789.205
rows=250loops=1) 
                                 Merge Cond: ("outer".id = "inner".classification_id)
                                 ->  Sort  (cost=1.05..1.05 rows=2 width=10) (actual time=0.189..0.194 rows=2 loops=1)
                                       Sort Key: pc.id
                                       ->  Seq Scan on partner_classification pc  (cost=0.00..1.04 rows=2 width=10)
(actualtime=0.089..0.127 rows=2 loops=1) 
                                             Filter: ((classification)::text <> 'Sales Rep'::text)
                                 ->  Sort  (cost=18226.21..18226.24 rows=13 width=105) (actual time=782.525..782.818
rows=251loops=1) 
                                       Sort Key: p.classification_id
                                       ->  Merge Join  (cost=0.00..18225.97 rows=13 width=105) (actual
time=54.135..776.299rows=449 loops=1) 
                                             Merge Cond: ("outer".id = "inner".partner_id)
                                             ->  Index Scan using partners_pkey on partners p  (cost=0.00..30.80
rows=395width=30) (actual time=0.073..6.873 rows=395 loops=1) 
                                             ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) 
                                                   Filter: ((lead_deleted IS NULL) AND (subplan))
                                                   SubPlan
                                                     ->  Nested Loop  (cost=0.00..6.75 rows=2 width=10) (actual
time=0.103..0.103rows=0 loops=5576) 
                                                           Join Filter: ("outer".status_id = "inner".id)
                                                           ->  Index Scan using lead_requests_contact_id_idx on
lead_requestslr (cost=0.00..4.23 rows=2 width=20) (actual time=0.075..0.075 rows=0 loops=5576) 
                                                                 Index Cond: ($0 = contact_id)
                                                           ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16
rows=8width=10) (actual time=0.028..0.098 rows=4 loops=522) 
                                                                 Filter: (is_closed = 0::numeric)
                     ->  Sort  (cost=2.42..2.52 rows=39 width=10) (actual time=1.183..1.569 rows=268 loops=1)
                           Sort Key: sr.id
                           ->  Seq Scan on sales_reps sr  (cost=0.00..1.39 rows=39 width=10) (actual time=0.056..0.353
rows=39loops=1) 
 Total runtime: 826.425 ms
(34 rows)
-----

Here is the current run in the production environment,
which I need to figure out how to get to the performance
level of the development environment:

-----
LOG:  duration: 6447.934 ms  statement: explain analyze
SELECT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 p.id AS partner_id,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
 co.name AS contact_country,
 c.master_key_token
FROM
 sales_reps sr
 JOIN partners p ON (sr.id = p.sales_rep_id)
 JOIN contacts c ON (p.id = c.partner_id)
 JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
 JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
WHERE
 c.lead_deleted IS NULL
 AND EXISTS
 (
  SELECT
   lr.id
  FROM
   lead_requests lr,
   lead_request_status lrs
  WHERE
   c.id = lr.contact_id AND
   lr.status_id = lrs.id  AND
   lrs.is_closed = 0
 )
ORDER BY
 contact_company, contact_id
                                                                                        QUERY PLAN
        


--------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------
 Sort  (cost=40838.98..40849.08 rows=4042 width=102) (actual time=6418.732..6419.536 rows=1071 loops=1)
   Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id 
   ->  Merge Join  (cost=40442.25..40596.85 rows=4042 width=102) (actual time=6357.161..6389.616 rows=1071 loops=1)
         Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
         ->  Sort  (cost=14.00..14.61 rows=242 width=19) (actual time=9.753..10.018 rows=240 loops=1)
               Sort Key: lower((co.code)::text)
               ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.126..3.950 rows=242
loops=1)
         ->  Sort  (cost=40428.24..40436.59 rows=3340 width=95) (actual time=6347.154..6348.429 rows=1071 loops=1)
               Sort Key: lower((c.country)::text)
               ->  Merge Join  (cost=75.65..40232.76 rows=3340 width=95) (actual time=60.308..6331.266 rows=1071
loops=1)
                     Merge Cond: ("outer".partner_id = "inner".id)
                     ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..161018.18 rows=20120
width=85)(actual time=2.769..6188.886 rows=1548 loops=1) 
                           Filter: ((lead_deleted IS NULL) AND (subplan))
                           SubPlan
                             ->  Nested Loop  (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0
loops=40262)
                                   Join Filter: ("outer".status_id = "inner".id)
                                   ->  Index Scan using lead_requests_contact_id_idx on lead_requests lr
(cost=0.00..4.86rows=3 width=20) (actual time=0.086..0.092 rows=0 loops=40262) 
                                         Index Cond: ($0 = contact_id)
                                   ->  Materialize  (cost=1.16..1.24 rows=8 width=10) (actual time=0.002..0.013 rows=6
loops=12593)
                                         ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16 rows=8 width=10)
(actualtime=0.078..0.243 rows=7 loops=1) 
                                               Filter: (is_closed = 0::numeric)
                     ->  Sort  (cost=75.65..76.37 rows=290 width=20) (actual time=57.243..59.574 rows=1334 loops=1)
                           Sort Key: p.id
                           ->  Merge Join  (cost=59.24..63.79 rows=290 width=20) (actual time=33.975..42.215 rows=395
loops=1)
                                 Merge Cond: ("outer".id = "inner".sales_rep_id)
                                 ->  Sort  (cost=2.42..2.52 rows=39 width=10) (actual time=1.206..1.285 rows=39
loops=1)
                                       Sort Key: sr.id
                                       ->  Seq Scan on sales_reps sr  (cost=0.00..1.39 rows=39 width=10) (actual
time=0.028..0.365rows=39 loops=1) 
                                 ->  Sort  (cost=56.82..57.55 rows=290 width=20) (actual time=32.566..33.254 rows=395
loops=1)
                                       Sort Key: p.sales_rep_id
                                       ->  Nested Loop  (cost=24.35..44.96 rows=290 width=20) (actual
time=0.158..25.227rows=395 loops=1) 
                                             Join Filter: ("inner".classification_id = "outer".id)
                                             ->  Seq Scan on partner_classification pc  (cost=0.00..1.04 rows=2
width=10)(actual time=0.050..0.096 rows=2 loops=1) 
                                                   Filter: ((classification)::text <> 'Sales Rep'::text)
                                             ->  Materialize  (cost=24.35..28.70 rows=435 width=30) (actual
time=0.028..6.617rows=435 loops=2) 
                                                   ->  Seq Scan on partners p  (cost=0.00..24.35 rows=435 width=30)
(actualtime=0.042..9.941 rows=435 loops=1) 
 Total runtime: 6423.683 ms
(37 rows)
-----

The SQL is exactly the same.

The issue is the query plan is different, and thus,
not up to the performance we need.

We have 256meg in the machine.  Would it help if
we threw some more memory in?

Please let me know if you have *any* pointers as to
the reason for the difference.

Thank you very much in advance for any pointers or
suggestions.

JohnM

--
John Mendenhall
john@surfutopia.net
surf utopia
internet services

Re: complex query performance assistance request

From
Tom Lane
Date:
John Mendenhall <john@surfutopia.net> writes:
> The issue is the query plan is different, and thus,
> not up to the performance we need.

No, the issue is that you've got eight times as much data in the
production server; so it's hardly surprising that it takes about
eight times longer.

The production query is spending most of its time on the subplan
attached to the contacts table:

>                      ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..161018.18 rows=20120
width=85)(actual time=2.769..6188.886 rows=1548 loops=1) 
>                            Filter: ((lead_deleted IS NULL) AND (subplan))
>                            SubPlan
>                              ->  Nested Loop  (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0
loops=40262)

0.129 * 40262 = 5193.798, so about five seconds in the subplan and
another one second in the indexscan proper.  The problem is that the
subplan (the EXISTS clause) is iterated for each of 40262 rows of
contacts --- basically, every contacts row that has null lead_deleted.

On the dev server the same scan shows these numbers:

>                                              ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) 
>                                                    Filter: ((lead_deleted IS NULL) AND (subplan))
>                                                    SubPlan
>                                                      ->  Nested Loop  (cost=0.00..6.75 rows=2 width=10) (actual
time=0.103..0.103rows=0 loops=5576) 

Here the subplan is iterated only 5576 times for 574 total msec.  It's
still the bulk of the runtime though; the fact that the upper levels
of the plan are a bit different has got little to do with where the time
is going.

I'd suggest trying to get rid of the EXISTS clause --- can you refactor
that into something that joins at the top query level?

Or, if this is 7.4 or later (and you should ALWAYS mention which version
you are using in a performance question, because it matters), try to
convert the EXISTS into an IN.  "x IN (subselect)" is planned much better
than "EXISTS(subselect-using-x)" these days.

            regards, tom lane

Re: complex query performance assistance request

From
John Mendenhall
Date:
Tom,

> No, the issue is that you've got eight times as much data in the
> production server; so it's hardly surprising that it takes about
> eight times longer.
>
> The production query is spending most of its time on the subplan
> attached to the contacts table:
>
> >                      ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..161018.18 rows=20120
width=85)(actual time=2.769..6188.886 rows=1548 loops=1) 
> >                            Filter: ((lead_deleted IS NULL) AND (subplan))
> >                            SubPlan
> >                              ->  Nested Loop  (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0
loops=40262)
>
> 0.129 * 40262 = 5193.798, so about five seconds in the subplan and
> another one second in the indexscan proper.  The problem is that the
> subplan (the EXISTS clause) is iterated for each of 40262 rows of
> contacts --- basically, every contacts row that has null lead_deleted.
>
> On the dev server the same scan shows these numbers:
>
> >                                              ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) 
> >                                                    Filter: ((lead_deleted IS NULL) AND (subplan))
> >                                                    SubPlan
> >                                                      ->  Nested Loop  (cost=0.00..6.75 rows=2 width=10) (actual
time=0.103..0.103rows=0 loops=5576) 
>
> I'd suggest trying to get rid of the EXISTS clause --- can you refactor
> that into something that joins at the top query level?
>
> Or, if this is 7.4 or later (and you should ALWAYS mention which version
> you are using in a performance question, because it matters), try to
> convert the EXISTS into an IN.  "x IN (subselect)" is planned much better
> than "EXISTS(subselect-using-x)" these days.

We are using version 7.4.6.

The number of contacts in the dev env is 37080.
The number of contacts in the production env is 40307.
The amount of data is statistically about the same.

However, the number of lead_requests are much different.
The dev env has 1438 lead_requests, the production env
has 15554 lead_requests.  Each contacts row can have
multiple lead_requests, each lead_requests entry can
have an open or closed status.  We are trying to select
the contacts with an open lead_request.

Would it be best to attempt to rewrite it for IN?
Or, should we try to tie it in with a join?  I would
probably need to GROUP so I can just get a count of those
contacts with open lead_requests.  Unless you know of a
better way?

Thanks for your assistance.  This is helping a lot.
BTW, what does the Materialize query plan element mean?

Thanks again.

JohnM

--
John Mendenhall
john@surfutopia.net
surf utopia
internet services

Re: complex query performance assistance request

From
Tom Lane
Date:
John Mendenhall <john@surfutopia.net> writes:
> Would it be best to attempt to rewrite it for IN?
> Or, should we try to tie it in with a join?

Couldn't say without a deeper understanding of what you're trying to
accomplish.

> BTW, what does the Materialize query plan element mean?

Means "run the contained subplan once, and save the results aside in a
buffer; on subsequent loops, just pass back the buffer contents instead
of re-running the subplan".

            regards, tom lane

Re: complex query performance assistance request

From
John Mendenhall
Date:
Tom,

> > Would it be best to attempt to rewrite it for IN?
> > Or, should we try to tie it in with a join?
>
> Couldn't say without a deeper understanding of what you're trying to
> accomplish.

Here are the results of each SQL rewrite.

The first pass, I rewrote it as c.id IN ():
-----
LOG:  duration: 2669.682 ms  statement: explain analyze
SELECT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 p.id AS partner_id,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
 co.name AS contact_country,
 c.master_key_token
FROM
 sales_reps sr
 JOIN partners p ON (sr.id = p.sales_rep_id)
 JOIN contacts c ON (p.id = c.partner_id)
 JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
 JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
WHERE
 c.lead_deleted IS NULL
 AND c.id IN
 (
  SELECT
   lr.contact_id
  FROM
   lead_requests lr,
   lead_request_status lrs
  WHERE
   lr.status_id = lrs.id AND
   lrs.is_closed = 0
 )
ORDER BY
 contact_company, contact_id
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4413.35..4416.16 rows=1123 width=102) (actual time=2617.069..2617.719 rows=1071 loops=1)
   Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id 
   ->  Merge Join  (cost=4311.31..4356.45 rows=1123 width=102) (actual time=2549.717..2589.398 rows=1071 loops=1)
         Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
         ->  Sort  (cost=14.00..14.61 rows=242 width=19) (actual time=9.765..9.966 rows=240 loops=1)
               Sort Key: lower((co.code)::text)
               ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.142..5.118 rows=242
loops=1)
         ->  Sort  (cost=4297.31..4299.63 rows=928 width=95) (actual time=2539.685..2540.913 rows=1071 loops=1)
               Sort Key: lower((c.country)::text)
               ->  Merge IN Join  (cost=4163.02..4251.57 rows=928 width=95) (actual time=2377.539..2524.844 rows=1071
loops=1)
                     Merge Cond: ("outer".id = "inner".contact_id)
                     ->  Sort  (cost=1835.53..1851.27 rows=6296 width=95) (actual time=1843.866..1853.193 rows=6349
loops=1)
                           Sort Key: c.id
                           ->  Merge Join  (cost=75.65..1438.24 rows=6296 width=95) (actual time=51.713..1505.633
rows=6349loops=1) 
                                 Merge Cond: ("outer".partner_id = "inner".id)
                                 ->  Index Scan using contacts_partner_id_idx on contacts c  (cost=0.00..5303.84
rows=40243width=85) (actual time=0.077..584.736 rows=40267 loops=1) 
                                       Filter: (lead_deleted IS NULL)
                                 ->  Sort  (cost=75.65..76.37 rows=290 width=20) (actual time=51.508..62.288 rows=6462
loops=1)
                                       Sort Key: p.id
                                       ->  Merge Join  (cost=59.24..63.79 rows=290 width=20) (actual
time=30.152..38.281rows=395 loops=1) 
                                             Merge Cond: ("outer".id = "inner".sales_rep_id)
                                             ->  Sort  (cost=2.42..2.52 rows=39 width=10) (actual time=1.390..1.505
rows=39loops=1) 
                                                   Sort Key: sr.id
                                                   ->  Seq Scan on sales_reps sr  (cost=0.00..1.39 rows=39 width=10)
(actualtime=0.026..0.380 rows=39 loops=1) 
                                             ->  Sort  (cost=56.82..57.55 rows=290 width=20) (actual
time=28.558..29.120rows=395 loops=1) 
                                                   Sort Key: p.sales_rep_id
                                                   ->  Nested Loop  (cost=24.35..44.96 rows=290 width=20) (actual
time=0.191..21.408rows=395 loops=1) 
                                                         Join Filter: ("inner".classification_id = "outer".id)
                                                         ->  Seq Scan on partner_classification pc  (cost=0.00..1.04
rows=2width=10) (actual time=0.068..0.121 rows=2 loops=1) 
                                                               Filter: ((classification)::text <> 'Sales Rep'::text)
                                                         ->  Materialize  (cost=24.35..28.70 rows=435 width=30) (actual
time=0.029..5.380rows=435 loops=2) 
                                                               ->  Seq Scan on partners p  (cost=0.00..24.35 rows=435
width=30)(actual time=0.038..8.161 rows=435 loops=1) 
                     ->  Sort  (cost=2327.50..2351.43 rows=9573 width=11) (actual time=533.508..535.629 rows=1742
loops=1)
                           Sort Key: lr.contact_id
                           ->  Merge Join  (cost=1520.94..1694.49 rows=9573 width=11) (actual time=302.932..461.644
rows=1745loops=1) 
                                 Merge Cond: ("outer".id = "inner".status_id)
                                 ->  Sort  (cost=1.28..1.30 rows=8 width=10) (actual time=0.392..0.404 rows=7 loops=1)
                                       Sort Key: lrs.id
                                       ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16 rows=8 width=10)
(actualtime=0.117..0.280 rows=7 loops=1) 
                                             Filter: (is_closed = 0::numeric)
                                 ->  Sort  (cost=1519.66..1558.55 rows=15556 width=21) (actual time=302.423..321.939
rows=15387loops=1) 
                                       Sort Key: lr.status_id
                                       ->  Seq Scan on lead_requests lr  (cost=0.00..436.56 rows=15556 width=21)
(actualtime=0.029..164.708 rows=15559 loops=1) 
 Total runtime: 2632.987 ms
(44 rows)
-----

The second pass, I rewrote it to tie in with a JOIN, adding
a DISTINCT at the top to get rid of the duplicates:
-----
LOG:  duration: 3285.645 ms  statement: explain analyze
SELECT DISTINCT
 c.id AS contact_id,
 sr.id AS sales_rep_id,
 p.id AS partner_id,
 coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
 co.name AS contact_country,
 c.master_key_token
FROM
 sales_reps sr
 JOIN partners p ON (sr.id = p.sales_rep_id)
 JOIN contacts c ON (p.id = c.partner_id)
 JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
 JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
 JOIN lead_requests lr ON (c.id = lr.contact_id)
 JOIN lead_request_status lrs ON (lr.status_id = lrs.id AND lrs.is_closed = 0)
WHERE
 c.lead_deleted IS NULL
ORDER BY
 contact_company, contact_id
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=3039.78..3071.46 rows=1810 width=102) (actual time=3219.707..3228.637 rows=1071 loops=1)
   ->  Sort  (cost=3039.78..3044.31 rows=1810 width=102) (actual time=3219.695..3220.560 rows=1118 loops=1)
         Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) ||
(c.lastname)::text)))),c.id, sr.id, p.id, co.name, c.master_key_token 
         ->  Merge Join  (cost=2870.92..2941.85 rows=1810 width=102) (actual time=3156.788..3188.338 rows=1118 loops=1)
               Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
               ->  Sort  (cost=14.00..14.61 rows=242 width=19) (actual time=9.196..9.445 rows=240 loops=1)
                     Sort Key: lower((co.code)::text)
                     ->  Seq Scan on countries co  (cost=0.00..4.42 rows=242 width=19) (actual time=0.128..3.914
rows=242loops=1) 
               ->  Sort  (cost=2856.92..2860.66 rows=1496 width=95) (actual time=3147.340..3148.477 rows=1118 loops=1)
                     Sort Key: lower((c.country)::text)
                     ->  Merge Join  (cost=2750.88..2778.03 rows=1496 width=95) (actual time=3008.933..3132.122
rows=1118loops=1) 
                           Merge Cond: ("outer".id = "inner".status_id)
                           ->  Sort  (cost=1.28..1.30 rows=8 width=10) (actual time=0.366..0.379 rows=7 loops=1)
                                 Sort Key: lrs.id
                                 ->  Seq Scan on lead_request_status lrs  (cost=0.00..1.16 rows=8 width=10) (actual
time=0.094..0.254rows=7 loops=1) 
                                       Filter: (is_closed = 0::numeric)
                           ->  Sort  (cost=2749.60..2755.67 rows=2430 width=105) (actual time=3008.396..3023.502
rows=9992loops=1) 
                                 Sort Key: lr.status_id
                                 ->  Merge Join  (cost=1835.53..2612.95 rows=2430 width=105) (actual
time=1975.714..2912.632rows=10089 loops=1) 
                                       Merge Cond: ("outer".contact_id = "inner".id)
                                       ->  Index Scan using lead_requests_contact_id_idx on lead_requests lr
(cost=0.00..683.87rows=15556 width=21) (actual time=0.073..247.148 rows=15556 loops=1) 
                                       ->  Sort  (cost=1835.53..1851.27 rows=6296 width=95) (actual
time=1975.273..1988.664rows=10089 loops=1) 
                                             Sort Key: c.id
                                             ->  Merge Join  (cost=75.65..1438.24 rows=6296 width=95) (actual
time=56.107..1625.186rows=6349 loops=1) 
                                                   Merge Cond: ("outer".partner_id = "inner".id)
                                                   ->  Index Scan using contacts_partner_id_idx on contacts c
(cost=0.00..5303.84rows=40243 width=85) (actual time=0.047..580.311 rows=40267 loops=1) 
                                                         Filter: (lead_deleted IS NULL)
                                                   ->  Sort  (cost=75.65..76.37 rows=290 width=20) (actual
time=55.935..65.502rows=6462 loops=1) 
                                                         Sort Key: p.id
                                                         ->  Merge Join  (cost=59.24..63.79 rows=290 width=20) (actual
time=31.765..39.925rows=395 loops=1) 
                                                               Merge Cond: ("outer".id = "inner".sales_rep_id)
                                                               ->  Sort  (cost=2.42..2.52 rows=39 width=10) (actual
time=1.072..1.117rows=39 loops=1) 
                                                                     Sort Key: sr.id
                                                                     ->  Seq Scan on sales_reps sr  (cost=0.00..1.39
rows=39width=10) (actual time=0.022..0.312 rows=39 loops=1) 
                                                               ->  Sort  (cost=56.82..57.55 rows=290 width=20) (actual
time=30.489..30.893rows=395 loops=1) 
                                                                     Sort Key: p.sales_rep_id
                                                                     ->  Nested Loop  (cost=24.35..44.96 rows=290
width=20)(actual time=0.159..23.356 rows=395 loops=1) 
                                                                           Join Filter: ("inner".classification_id =
"outer".id)
                                                                           ->  Seq Scan on partner_classification pc
(cost=0.00..1.04rows=2 width=10) (actual time=0.047..0.086 rows=2 loops=1) 
                                                                                 Filter: ((classification)::text <>
'SalesRep'::text) 
                                                                           ->  Materialize  (cost=24.35..28.70 rows=435
width=30)(actual time=0.028..6.124 rows=435 loops=2) 
                                                                                 ->  Seq Scan on partners p
(cost=0.00..24.35rows=435 width=30) (actual time=0.039..9.383 rows=435 loops=1) 
 Total runtime: 3241.139 ms
(43 rows)
-----

The DISTINCT ON condition was about the same amount of time,
statistically.  Removing the DISTINCT entirely only gave a
very slight improvement in performance.

So, the bottom line is, unless there are other ideas to
improve the performance, I will most likely rewrite our
application to use the c.id IN () option.

Thank you very much for your input and suggestions.

JohnM

--
John Mendenhall
john@surfutopia.net
surf utopia
internet services