Thread: complex query performance assistance request
I need to improve the performance for the following query. Soon after I reboot my server, the following query takes 20 seconds the first time I run it. When I run it after that, it takes approximately 2 seconds. I understand the caching taking place (at the os or db level, it doesn't matter here). Here are the results of the explain analyze run: ----- LOG: duration: 6259.632 ms statement: explain analyze SELECT c.id AS contact_id, sr.id AS sales_rep_id, LTRIM(RTRIM(sr.firstname || ' ' || sr.lastname)) AS sales_rep_name, p.id AS partner_id, p.company AS partner_company, coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company, LTRIM(RTRIM(c.city || ' ' || c.state || ' ' || c.postalcode || ' ' || c.country)) AS contact_location, c.phone AS contact_phone, c.email AS contact_email, co.name AS contact_country, TO_CHAR(c.request_status_last_modified, 'mm/dd/yy hh12:mi pm') AS request_status_last_modified, TO_CHAR(c.request_status_last_modified, 'yyyymmddhh24miss') AS rqst_stat_last_mdfd_sortable, c.token_id, c.master_key_token AS token FROM sales_reps sr JOIN partners p ON (sr.id = p.sales_rep_id) JOIN contacts c ON (p.id = c.partner_id) JOIN countries co ON (LOWER(c.country) = LOWER(co.code)) JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep') WHERE c.lead_deleted IS NULL AND EXISTS ( SELECT lr.id FROM lead_requests lr, lead_request_status lrs WHERE c.id = lr.contact_id AND lr.status_id = lrs.id AND lrs.is_closed = 0 ) ORDER BY contact_company, contact_id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------- Sort (cost=39093.16..39102.80 rows=3856 width=238) (actual time=6220.481..6221.188 rows=1071 loops=1) Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))),c.id -> Merge Join (cost=38580.89..38863.48 rows=3856 width=238) (actual time=6015.751..6184.199 rows=1071 loops=1) Merge Cond: ("outer"."?column3?" = "inner"."?column19?") -> Sort (cost=14.00..14.61 rows=242 width=19) (actual time=9.250..9.500 rows=240 loops=1) Sort Key: lower((co.code)::text) -> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.132..4.498 rows=242 loops=1) -> Sort (cost=38566.89..38574.86 rows=3186 width=225) (actual time=6005.644..6006.954 rows=1071 loops=1) Sort Key: lower((c.country)::text) -> Merge Join (cost=75.65..38381.50 rows=3186 width=225) (actual time=58.086..5979.287 rows=1071 loops=1) Merge Cond: ("outer".partner_id = "inner".id) -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..160907.39 rows=20106 width=171)(actual time=2.569..5816.985 rows=1547 loops=1) Filter: ((lead_deleted IS NULL) AND (subplan)) SubPlan -> Nested Loop (cost=1.16..6.56 rows=2 width=10) (actual time=0.119..0.119 rows=0 loops=40261) Join Filter: ("outer".status_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..4.86rows=3 width=20) (actual time=0.079..0.083 rows=0 loops=40261) Index Cond: ($0 = contact_id) -> Materialize (cost=1.16..1.24 rows=8 width=10) (actual time=0.002..0.011 rows=6 loops=12592) -> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8 width=10) (actualtime=0.083..0.270 rows=7 loops=1) Filter: (is_closed = 0::numeric) -> Sort (cost=75.65..76.37 rows=290 width=64) (actual time=55.073..56.990 rows=1334 loops=1) Sort Key: p.id -> Merge Join (cost=59.24..63.79 rows=290 width=64) (actual time=31.720..41.096 rows=395 loops=1) Merge Cond: ("outer".id = "inner".sales_rep_id) -> Sort (cost=2.42..2.52 rows=39 width=31) (actual time=1.565..1.616 rows=39 loops=1) Sort Key: sr.id -> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39 width=31) (actual time=0.043..0.581rows=39 loops=1) -> Sort (cost=56.82..57.55 rows=290 width=43) (actual time=29.921..30.310 rows=395 loops=1) Sort Key: p.sales_rep_id -> Nested Loop (cost=24.35..44.96 rows=290 width=43) (actual time=0.169..22.566rows=395 loops=1) Join Filter: ("inner".classification_id = "outer".id) -> Seq Scan on partner_classification pc (cost=0.00..1.04 rows=2 width=10)(actual time=0.059..0.102 rows=2 loops=1) Filter: ((classification)::text <> 'Sales Rep'::text) -> Materialize (cost=24.35..28.70 rows=435 width=53) (actual time=0.023..5.880rows=435 loops=2) -> Seq Scan on partners p (cost=0.00..24.35 rows=435 width=53) (actualtime=0.034..8.937 rows=435 loops=1) Total runtime: 6225.791 ms (37 rows) ----- My first question is, what is the Materialize query plan element? It happens twice, and usually when I see it, my query is slow. My second and more important question is, does anyone have any ideas or suggestions as to how I can increase the speed for this query? Things I have already done are, modify the joins and conditions so it starts with smaller tables, thus the join set is smaller, modify the configuration of the server to ensure index scans are used as they should be, ran vacuumdb and analyze on the database. Thank you very much in advance for any pointers for additional places I can look. Thanks. JohnM -- John Mendenhall john@surfutopia.net surf utopia internet services
On Sat, 20 Aug 2005, John Mendenhall wrote: > I need to improve the performance for the following > query. I have run the same query in the same database under different schemas. Each schema is pretty much the same tables and indices. One has an extra backup table and an extra index which are not used in either of the explain analyze plans. The first schema is a development schema, which I used to performance tune the server so everything was great. Here are the current results of the sql run in the development environment: ----- LOG: duration: 852.275 ms statement: explain analyze SELECT c.id AS contact_id, sr.id AS sales_rep_id, p.id AS partner_id, coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company, co.name AS contact_country, c.master_key_token FROM sales_reps sr JOIN partners p ON (sr.id = p.sales_rep_id) JOIN contacts c ON (p.id = c.partner_id) JOIN countries co ON (LOWER(c.country) = LOWER(co.code)) JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep') WHERE c.lead_deleted IS NULL AND EXISTS ( SELECT lr.id FROM lead_requests lr, lead_request_status lrs WHERE c.id = lr.contact_id AND lr.status_id = lrs.id AND lrs.is_closed = 0 ) ORDER BY contact_company, contact_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------- Sort (cost=18238.25..18238.27 rows=11 width=102) (actual time=823.721..823.915 rows=247 loops=1) Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))),c.id -> Hash Join (cost=18230.34..18238.06 rows=11 width=102) (actual time=808.042..818.427 rows=247 loops=1) Hash Cond: (lower(("outer".code)::text) = lower(("inner".country)::text)) -> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.032..1.208 rows=242 loops=1) -> Hash (cost=18230.31..18230.31 rows=9 width=95) (actual time=807.554..807.554 rows=0 loops=1) -> Merge Join (cost=18229.98..18230.31 rows=9 width=95) (actual time=794.413..804.855 rows=247 loops=1) Merge Cond: ("outer".sales_rep_id = "inner".id) -> Sort (cost=18227.56..18227.59 rows=9 width=95) (actual time=793.132..793.502 rows=250 loops=1) Sort Key: p.sales_rep_id -> Merge Join (cost=18227.26..18227.42 rows=9 width=95) (actual time=782.832..789.205 rows=250loops=1) Merge Cond: ("outer".id = "inner".classification_id) -> Sort (cost=1.05..1.05 rows=2 width=10) (actual time=0.189..0.194 rows=2 loops=1) Sort Key: pc.id -> Seq Scan on partner_classification pc (cost=0.00..1.04 rows=2 width=10) (actualtime=0.089..0.127 rows=2 loops=1) Filter: ((classification)::text <> 'Sales Rep'::text) -> Sort (cost=18226.21..18226.24 rows=13 width=105) (actual time=782.525..782.818 rows=251loops=1) Sort Key: p.classification_id -> Merge Join (cost=0.00..18225.97 rows=13 width=105) (actual time=54.135..776.299rows=449 loops=1) Merge Cond: ("outer".id = "inner".partner_id) -> Index Scan using partners_pkey on partners p (cost=0.00..30.80 rows=395width=30) (actual time=0.073..6.873 rows=395 loops=1) -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) Filter: ((lead_deleted IS NULL) AND (subplan)) SubPlan -> Nested Loop (cost=0.00..6.75 rows=2 width=10) (actual time=0.103..0.103rows=0 loops=5576) Join Filter: ("outer".status_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requestslr (cost=0.00..4.23 rows=2 width=20) (actual time=0.075..0.075 rows=0 loops=5576) Index Cond: ($0 = contact_id) -> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8width=10) (actual time=0.028..0.098 rows=4 loops=522) Filter: (is_closed = 0::numeric) -> Sort (cost=2.42..2.52 rows=39 width=10) (actual time=1.183..1.569 rows=268 loops=1) Sort Key: sr.id -> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39 width=10) (actual time=0.056..0.353 rows=39loops=1) Total runtime: 826.425 ms (34 rows) ----- Here is the current run in the production environment, which I need to figure out how to get to the performance level of the development environment: ----- LOG: duration: 6447.934 ms statement: explain analyze SELECT c.id AS contact_id, sr.id AS sales_rep_id, p.id AS partner_id, coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company, co.name AS contact_country, c.master_key_token FROM sales_reps sr JOIN partners p ON (sr.id = p.sales_rep_id) JOIN contacts c ON (p.id = c.partner_id) JOIN countries co ON (LOWER(c.country) = LOWER(co.code)) JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep') WHERE c.lead_deleted IS NULL AND EXISTS ( SELECT lr.id FROM lead_requests lr, lead_request_status lrs WHERE c.id = lr.contact_id AND lr.status_id = lrs.id AND lrs.is_closed = 0 ) ORDER BY contact_company, contact_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------- Sort (cost=40838.98..40849.08 rows=4042 width=102) (actual time=6418.732..6419.536 rows=1071 loops=1) Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))),c.id -> Merge Join (cost=40442.25..40596.85 rows=4042 width=102) (actual time=6357.161..6389.616 rows=1071 loops=1) Merge Cond: ("outer"."?column3?" = "inner"."?column9?") -> Sort (cost=14.00..14.61 rows=242 width=19) (actual time=9.753..10.018 rows=240 loops=1) Sort Key: lower((co.code)::text) -> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.126..3.950 rows=242 loops=1) -> Sort (cost=40428.24..40436.59 rows=3340 width=95) (actual time=6347.154..6348.429 rows=1071 loops=1) Sort Key: lower((c.country)::text) -> Merge Join (cost=75.65..40232.76 rows=3340 width=95) (actual time=60.308..6331.266 rows=1071 loops=1) Merge Cond: ("outer".partner_id = "inner".id) -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..161018.18 rows=20120 width=85)(actual time=2.769..6188.886 rows=1548 loops=1) Filter: ((lead_deleted IS NULL) AND (subplan)) SubPlan -> Nested Loop (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0 loops=40262) Join Filter: ("outer".status_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..4.86rows=3 width=20) (actual time=0.086..0.092 rows=0 loops=40262) Index Cond: ($0 = contact_id) -> Materialize (cost=1.16..1.24 rows=8 width=10) (actual time=0.002..0.013 rows=6 loops=12593) -> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8 width=10) (actualtime=0.078..0.243 rows=7 loops=1) Filter: (is_closed = 0::numeric) -> Sort (cost=75.65..76.37 rows=290 width=20) (actual time=57.243..59.574 rows=1334 loops=1) Sort Key: p.id -> Merge Join (cost=59.24..63.79 rows=290 width=20) (actual time=33.975..42.215 rows=395 loops=1) Merge Cond: ("outer".id = "inner".sales_rep_id) -> Sort (cost=2.42..2.52 rows=39 width=10) (actual time=1.206..1.285 rows=39 loops=1) Sort Key: sr.id -> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39 width=10) (actual time=0.028..0.365rows=39 loops=1) -> Sort (cost=56.82..57.55 rows=290 width=20) (actual time=32.566..33.254 rows=395 loops=1) Sort Key: p.sales_rep_id -> Nested Loop (cost=24.35..44.96 rows=290 width=20) (actual time=0.158..25.227rows=395 loops=1) Join Filter: ("inner".classification_id = "outer".id) -> Seq Scan on partner_classification pc (cost=0.00..1.04 rows=2 width=10)(actual time=0.050..0.096 rows=2 loops=1) Filter: ((classification)::text <> 'Sales Rep'::text) -> Materialize (cost=24.35..28.70 rows=435 width=30) (actual time=0.028..6.617rows=435 loops=2) -> Seq Scan on partners p (cost=0.00..24.35 rows=435 width=30) (actualtime=0.042..9.941 rows=435 loops=1) Total runtime: 6423.683 ms (37 rows) ----- The SQL is exactly the same. The issue is the query plan is different, and thus, not up to the performance we need. We have 256meg in the machine. Would it help if we threw some more memory in? Please let me know if you have *any* pointers as to the reason for the difference. Thank you very much in advance for any pointers or suggestions. JohnM -- John Mendenhall john@surfutopia.net surf utopia internet services
John Mendenhall <john@surfutopia.net> writes: > The issue is the query plan is different, and thus, > not up to the performance we need. No, the issue is that you've got eight times as much data in the production server; so it's hardly surprising that it takes about eight times longer. The production query is spending most of its time on the subplan attached to the contacts table: > -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..161018.18 rows=20120 width=85)(actual time=2.769..6188.886 rows=1548 loops=1) > Filter: ((lead_deleted IS NULL) AND (subplan)) > SubPlan > -> Nested Loop (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0 loops=40262) 0.129 * 40262 = 5193.798, so about five seconds in the subplan and another one second in the indexscan proper. The problem is that the subplan (the EXISTS clause) is iterated for each of 40262 rows of contacts --- basically, every contacts row that has null lead_deleted. On the dev server the same scan shows these numbers: > -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) > Filter: ((lead_deleted IS NULL) AND (subplan)) > SubPlan > -> Nested Loop (cost=0.00..6.75 rows=2 width=10) (actual time=0.103..0.103rows=0 loops=5576) Here the subplan is iterated only 5576 times for 574 total msec. It's still the bulk of the runtime though; the fact that the upper levels of the plan are a bit different has got little to do with where the time is going. I'd suggest trying to get rid of the EXISTS clause --- can you refactor that into something that joins at the top query level? Or, if this is 7.4 or later (and you should ALWAYS mention which version you are using in a performance question, because it matters), try to convert the EXISTS into an IN. "x IN (subselect)" is planned much better than "EXISTS(subselect-using-x)" these days. regards, tom lane
Tom, > No, the issue is that you've got eight times as much data in the > production server; so it's hardly surprising that it takes about > eight times longer. > > The production query is spending most of its time on the subplan > attached to the contacts table: > > > -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..161018.18 rows=20120 width=85)(actual time=2.769..6188.886 rows=1548 loops=1) > > Filter: ((lead_deleted IS NULL) AND (subplan)) > > SubPlan > > -> Nested Loop (cost=1.16..6.57 rows=2 width=10) (actual time=0.129..0.129 rows=0 loops=40262) > > 0.129 * 40262 = 5193.798, so about five seconds in the subplan and > another one second in the indexscan proper. The problem is that the > subplan (the EXISTS clause) is iterated for each of 40262 rows of > contacts --- basically, every contacts row that has null lead_deleted. > > On the dev server the same scan shows these numbers: > > > -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..130157.20rows=93 width=85) (actual time=0.366..739.783 rows=453 loops=1) > > Filter: ((lead_deleted IS NULL) AND (subplan)) > > SubPlan > > -> Nested Loop (cost=0.00..6.75 rows=2 width=10) (actual time=0.103..0.103rows=0 loops=5576) > > I'd suggest trying to get rid of the EXISTS clause --- can you refactor > that into something that joins at the top query level? > > Or, if this is 7.4 or later (and you should ALWAYS mention which version > you are using in a performance question, because it matters), try to > convert the EXISTS into an IN. "x IN (subselect)" is planned much better > than "EXISTS(subselect-using-x)" these days. We are using version 7.4.6. The number of contacts in the dev env is 37080. The number of contacts in the production env is 40307. The amount of data is statistically about the same. However, the number of lead_requests are much different. The dev env has 1438 lead_requests, the production env has 15554 lead_requests. Each contacts row can have multiple lead_requests, each lead_requests entry can have an open or closed status. We are trying to select the contacts with an open lead_request. Would it be best to attempt to rewrite it for IN? Or, should we try to tie it in with a join? I would probably need to GROUP so I can just get a count of those contacts with open lead_requests. Unless you know of a better way? Thanks for your assistance. This is helping a lot. BTW, what does the Materialize query plan element mean? Thanks again. JohnM -- John Mendenhall john@surfutopia.net surf utopia internet services
John Mendenhall <john@surfutopia.net> writes: > Would it be best to attempt to rewrite it for IN? > Or, should we try to tie it in with a join? Couldn't say without a deeper understanding of what you're trying to accomplish. > BTW, what does the Materialize query plan element mean? Means "run the contained subplan once, and save the results aside in a buffer; on subsequent loops, just pass back the buffer contents instead of re-running the subplan". regards, tom lane
Tom, > > Would it be best to attempt to rewrite it for IN? > > Or, should we try to tie it in with a join? > > Couldn't say without a deeper understanding of what you're trying to > accomplish. Here are the results of each SQL rewrite. The first pass, I rewrote it as c.id IN (): ----- LOG: duration: 2669.682 ms statement: explain analyze SELECT c.id AS contact_id, sr.id AS sales_rep_id, p.id AS partner_id, coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company, co.name AS contact_country, c.master_key_token FROM sales_reps sr JOIN partners p ON (sr.id = p.sales_rep_id) JOIN contacts c ON (p.id = c.partner_id) JOIN countries co ON (LOWER(c.country) = LOWER(co.code)) JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep') WHERE c.lead_deleted IS NULL AND c.id IN ( SELECT lr.contact_id FROM lead_requests lr, lead_request_status lrs WHERE lr.status_id = lrs.id AND lrs.is_closed = 0 ) ORDER BY contact_company, contact_id QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Sort (cost=4413.35..4416.16 rows=1123 width=102) (actual time=2617.069..2617.719 rows=1071 loops=1) Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))),c.id -> Merge Join (cost=4311.31..4356.45 rows=1123 width=102) (actual time=2549.717..2589.398 rows=1071 loops=1) Merge Cond: ("outer"."?column3?" = "inner"."?column9?") -> Sort (cost=14.00..14.61 rows=242 width=19) (actual time=9.765..9.966 rows=240 loops=1) Sort Key: lower((co.code)::text) -> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.142..5.118 rows=242 loops=1) -> Sort (cost=4297.31..4299.63 rows=928 width=95) (actual time=2539.685..2540.913 rows=1071 loops=1) Sort Key: lower((c.country)::text) -> Merge IN Join (cost=4163.02..4251.57 rows=928 width=95) (actual time=2377.539..2524.844 rows=1071 loops=1) Merge Cond: ("outer".id = "inner".contact_id) -> Sort (cost=1835.53..1851.27 rows=6296 width=95) (actual time=1843.866..1853.193 rows=6349 loops=1) Sort Key: c.id -> Merge Join (cost=75.65..1438.24 rows=6296 width=95) (actual time=51.713..1505.633 rows=6349loops=1) Merge Cond: ("outer".partner_id = "inner".id) -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..5303.84 rows=40243width=85) (actual time=0.077..584.736 rows=40267 loops=1) Filter: (lead_deleted IS NULL) -> Sort (cost=75.65..76.37 rows=290 width=20) (actual time=51.508..62.288 rows=6462 loops=1) Sort Key: p.id -> Merge Join (cost=59.24..63.79 rows=290 width=20) (actual time=30.152..38.281rows=395 loops=1) Merge Cond: ("outer".id = "inner".sales_rep_id) -> Sort (cost=2.42..2.52 rows=39 width=10) (actual time=1.390..1.505 rows=39loops=1) Sort Key: sr.id -> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39 width=10) (actualtime=0.026..0.380 rows=39 loops=1) -> Sort (cost=56.82..57.55 rows=290 width=20) (actual time=28.558..29.120rows=395 loops=1) Sort Key: p.sales_rep_id -> Nested Loop (cost=24.35..44.96 rows=290 width=20) (actual time=0.191..21.408rows=395 loops=1) Join Filter: ("inner".classification_id = "outer".id) -> Seq Scan on partner_classification pc (cost=0.00..1.04 rows=2width=10) (actual time=0.068..0.121 rows=2 loops=1) Filter: ((classification)::text <> 'Sales Rep'::text) -> Materialize (cost=24.35..28.70 rows=435 width=30) (actual time=0.029..5.380rows=435 loops=2) -> Seq Scan on partners p (cost=0.00..24.35 rows=435 width=30)(actual time=0.038..8.161 rows=435 loops=1) -> Sort (cost=2327.50..2351.43 rows=9573 width=11) (actual time=533.508..535.629 rows=1742 loops=1) Sort Key: lr.contact_id -> Merge Join (cost=1520.94..1694.49 rows=9573 width=11) (actual time=302.932..461.644 rows=1745loops=1) Merge Cond: ("outer".id = "inner".status_id) -> Sort (cost=1.28..1.30 rows=8 width=10) (actual time=0.392..0.404 rows=7 loops=1) Sort Key: lrs.id -> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8 width=10) (actualtime=0.117..0.280 rows=7 loops=1) Filter: (is_closed = 0::numeric) -> Sort (cost=1519.66..1558.55 rows=15556 width=21) (actual time=302.423..321.939 rows=15387loops=1) Sort Key: lr.status_id -> Seq Scan on lead_requests lr (cost=0.00..436.56 rows=15556 width=21) (actualtime=0.029..164.708 rows=15559 loops=1) Total runtime: 2632.987 ms (44 rows) ----- The second pass, I rewrote it to tie in with a JOIN, adding a DISTINCT at the top to get rid of the duplicates: ----- LOG: duration: 3285.645 ms statement: explain analyze SELECT DISTINCT c.id AS contact_id, sr.id AS sales_rep_id, p.id AS partner_id, coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company, co.name AS contact_country, c.master_key_token FROM sales_reps sr JOIN partners p ON (sr.id = p.sales_rep_id) JOIN contacts c ON (p.id = c.partner_id) JOIN countries co ON (LOWER(c.country) = LOWER(co.code)) JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep') JOIN lead_requests lr ON (c.id = lr.contact_id) JOIN lead_request_status lrs ON (lr.status_id = lrs.id AND lrs.is_closed = 0) WHERE c.lead_deleted IS NULL ORDER BY contact_company, contact_id QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Unique (cost=3039.78..3071.46 rows=1810 width=102) (actual time=3219.707..3228.637 rows=1071 loops=1) -> Sort (cost=3039.78..3044.31 rows=1810 width=102) (actual time=3219.695..3220.560 rows=1118 loops=1) Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))),c.id, sr.id, p.id, co.name, c.master_key_token -> Merge Join (cost=2870.92..2941.85 rows=1810 width=102) (actual time=3156.788..3188.338 rows=1118 loops=1) Merge Cond: ("outer"."?column3?" = "inner"."?column9?") -> Sort (cost=14.00..14.61 rows=242 width=19) (actual time=9.196..9.445 rows=240 loops=1) Sort Key: lower((co.code)::text) -> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.128..3.914 rows=242loops=1) -> Sort (cost=2856.92..2860.66 rows=1496 width=95) (actual time=3147.340..3148.477 rows=1118 loops=1) Sort Key: lower((c.country)::text) -> Merge Join (cost=2750.88..2778.03 rows=1496 width=95) (actual time=3008.933..3132.122 rows=1118loops=1) Merge Cond: ("outer".id = "inner".status_id) -> Sort (cost=1.28..1.30 rows=8 width=10) (actual time=0.366..0.379 rows=7 loops=1) Sort Key: lrs.id -> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8 width=10) (actual time=0.094..0.254rows=7 loops=1) Filter: (is_closed = 0::numeric) -> Sort (cost=2749.60..2755.67 rows=2430 width=105) (actual time=3008.396..3023.502 rows=9992loops=1) Sort Key: lr.status_id -> Merge Join (cost=1835.53..2612.95 rows=2430 width=105) (actual time=1975.714..2912.632rows=10089 loops=1) Merge Cond: ("outer".contact_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..683.87rows=15556 width=21) (actual time=0.073..247.148 rows=15556 loops=1) -> Sort (cost=1835.53..1851.27 rows=6296 width=95) (actual time=1975.273..1988.664rows=10089 loops=1) Sort Key: c.id -> Merge Join (cost=75.65..1438.24 rows=6296 width=95) (actual time=56.107..1625.186rows=6349 loops=1) Merge Cond: ("outer".partner_id = "inner".id) -> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..5303.84rows=40243 width=85) (actual time=0.047..580.311 rows=40267 loops=1) Filter: (lead_deleted IS NULL) -> Sort (cost=75.65..76.37 rows=290 width=20) (actual time=55.935..65.502rows=6462 loops=1) Sort Key: p.id -> Merge Join (cost=59.24..63.79 rows=290 width=20) (actual time=31.765..39.925rows=395 loops=1) Merge Cond: ("outer".id = "inner".sales_rep_id) -> Sort (cost=2.42..2.52 rows=39 width=10) (actual time=1.072..1.117rows=39 loops=1) Sort Key: sr.id -> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39width=10) (actual time=0.022..0.312 rows=39 loops=1) -> Sort (cost=56.82..57.55 rows=290 width=20) (actual time=30.489..30.893rows=395 loops=1) Sort Key: p.sales_rep_id -> Nested Loop (cost=24.35..44.96 rows=290 width=20)(actual time=0.159..23.356 rows=395 loops=1) Join Filter: ("inner".classification_id = "outer".id) -> Seq Scan on partner_classification pc (cost=0.00..1.04rows=2 width=10) (actual time=0.047..0.086 rows=2 loops=1) Filter: ((classification)::text <> 'SalesRep'::text) -> Materialize (cost=24.35..28.70 rows=435 width=30)(actual time=0.028..6.124 rows=435 loops=2) -> Seq Scan on partners p (cost=0.00..24.35rows=435 width=30) (actual time=0.039..9.383 rows=435 loops=1) Total runtime: 3241.139 ms (43 rows) ----- The DISTINCT ON condition was about the same amount of time, statistically. Removing the DISTINCT entirely only gave a very slight improvement in performance. So, the bottom line is, unless there are other ideas to improve the performance, I will most likely rewrite our application to use the c.id IN () option. Thank you very much for your input and suggestions. JohnM -- John Mendenhall john@surfutopia.net surf utopia internet services