Thread: Simple but slow
from lists l, timezone tz
where l.area_code = tz.area_code
and l.client_id = 8
and tz.greenwich = '-5'
-> Unique (cost=96929.63..97822.46 rows=35713 width=36) (actual time=42477.82..43325.87 rows=100001 loops=1)
-> Sort (cost=96929.63..96929.63 rows=357132 width=36) (actual time=42477.80..42735.18 rows=102151 loops=1)
-> Hash Join (cost=9.59..41831.14 rows=357132 width=36) (actual time=25.29..16456.26 rows=352194 loops=1)
-> Seq Scan on lists l (cost=0.00..32881.18 rows=895244 width=29) (actual time=0.10..11342.50 rows=878098 loops=1)
-> Hash (cost=9.22..9.22 rows=148 width=7) (actual time=23.80..23.80 rows=0 loops=1)
-> Seq Scan on timezone tz (cost=0.00..9.22 rows=148 width=7) (actual time=21.72..23.45 rows=148 loops=1)
Total runtime: 46247.79 msec
Chad, If you take a look at the Explain content, you'll see where the slow-down is: > Limit (cost=96929.63..97822.45 rows=35713 width=36) (actual time=42477.82..43491.69 rows=100000 loops=1) > -> Unique (cost=96929.63..97822.46 rows=35713 width=36) (actual time=42477.82..43325.87 rows=100001 loops=1) > -> Sort (cost=96929.63..96929.63 rows=357132 width=36) (actual time=42477.80..42735.18 rows=102151 loops=1) These three lines are the selecting unique values and sorting and terminating the result set. This is most of your computing time; see the "actual time =42477" > -> Hash Join (cost=9.59..41831.14 rows=357132 width=36) (actual time=25.29..16456.26 rows=352194 loops=1) > -> Seq Scan on lists l (cost=0.00..32881.18 rows=895244 width=29) (actual time=0.10..11342.50 rows=878098 loops=1) > -> Hash (cost=9.22..9.22 rows=148 width=7) (actual time=23.80..23.80 rows=0 loops=1) > -> Seq Scan on timezone tz (cost=0.00..9.22 rows=148 width=7) (actual time=21.72..23.45 rows=148 loops=1) This is you join to the area codes. It's ignoring the indexes, because the number of records in timezone is so small compared to the number in lists. this isn't a problem, though, because as you can see the join operation takes only a few milliseconds at a minimum. > Total runtime: 46247.79 msec The way I read this, 95% of the time is being spent on the DISTINCT. Tom, am I reading this right? Try: 1) Indexing lists.full_phone. 2) Check the speed without the DISTINCT as a benckmark. 3) Increasing the amount of memory available to your queries by altering the postgresql.conf settings and possibly adding more RAM or improving your disk access speed. -- -Josh Berkus Aglio Database Solutions San Francisco
Thanks for your reply Josh, as usual i learn from you whenever you write. Ive been having a hard time understanding what explain is telling me. I was able to get the query down to 19 secs w/o the distinct. I think i'll move the distinct to one of my faster queries. If its not too much trouble id like you to look at another. This is really being a beast. select l.id, l.full_phone into "8_21_2002" from "temp_list_tier" l LEFT JOIN (select phonenum from call_results_fixed where client_id = 8) as cr ON l.full_phone = cr.phonenum where cr.phonenum Is Null I have indexes on cr.phonenum and l.full_phone but it doesnt seem to use them with the subselect. cr is about 1 million records, l varies, usually never more than about 30-40K of records. Merge Join (cost=265368.44..8176071.25 rows=3161144 width=44) (actual time=121889.56..126948.26 rows=11384 loops=1) -> Index Scan using temp_list_tier_idx on temp_list_tier l (cost=0.00..4431.97 rows=99997 width=25) (actual time=0.38..808.34 rows=99997 loops=1) -> Sort (cost=256113.44..256113.44 rows=1264495 width=14) (actual time=121887.71..122732.11 rows=422624 loops=1) -> Subquery Scan cr (cost=0.00..41331.00 rows=1264495 width=14) (actual time=0.10..12941.66 rows=1274987 loops=1) -> Seq Scan on call_results_fixed (cost=0.00..41331.00 rows=1264495 width=14) (actual time=0.09..9047.89 rows=1274987 loops=1) Total runtime: 127273.03 msec Thanks for your help. I have also enjoyed your "The Joy of Index". I look forward to the next issue. Thanks Chad ----- Original Message ----- From: "Josh Berkus" <josh@agliodbs.com> To: "Chad Thompson" <chad@weblinkservices.com>; "pgsql-novice" <pgsql-novice@postgresql.org> Sent: Wednesday, August 21, 2002 6:48 PM Subject: Re: [NOVICE] Simple but slow Chad, If you take a look at the Explain content, you'll see where the slow-down is: > Limit (cost=96929.63..97822.45 rows=35713 width=36) (actual time=42477.82..43491.69 rows=100000 loops=1) > -> Unique (cost=96929.63..97822.46 rows=35713 width=36) (actual time=42477.82..43325.87 rows=100001 loops=1) > -> Sort (cost=96929.63..96929.63 rows=357132 width=36) (actual time=42477.80..42735.18 rows=102151 loops=1) These three lines are the selecting unique values and sorting and terminating the result set. This is most of your computing time; see the "actual time =42477" > -> Hash Join (cost=9.59..41831.14 rows=357132 width=36) (actual time=25.29..16456.26 rows=352194 loops=1) > -> Seq Scan on lists l (cost=0.00..32881.18 rows=895244 width=29) (actual time=0.10..11342.50 rows=878098 loops=1) > -> Hash (cost=9.22..9.22 rows=148 width=7) (actual time=23.80..23.80 rows=0 loops=1) > -> Seq Scan on timezone tz (cost=0.00..9.22 rows=148 width=7) (actual time=21.72..23.45 rows=148 loops=1) This is you join to the area codes. It's ignoring the indexes, because the number of records in timezone is so small compared to the number in lists. this isn't a problem, though, because as you can see the join operation takes only a few milliseconds at a minimum. > Total runtime: 46247.79 msec The way I read this, 95% of the time is being spent on the DISTINCT. Tom, am I reading this right? Try: 1) Indexing lists.full_phone. 2) Check the speed without the DISTINCT as a benckmark. 3) Increasing the amount of memory available to your queries by altering the postgresql.conf settings and possibly adding more RAM or improving your disk access speed. -- -Josh Berkus Aglio Database Solutions San Francisco
> -----Original Message-----
> From: Chad Thompson [mailto:chad@weblinkservices.com]
>
> select l.id, l.full_phone into "8_21_2002"
> from "temp_list_tier" l LEFT JOIN
> (select phonenum
> from call_results_fixed
> where client_id = 8) as cr
> ON l.full_phone = cr.phonenum
> where cr.phonenum Is Null
>
I'm guessing you want all those in "temp_list_tier" where there is not the same phone number in call_results_fixed.
Does this (untested) run better for you:
SELECT l.id,l.full_phone INTO "8_21_2002"
FROM "temp_list_tier" AS l WHERE
WHERE NOT EXISTS
(SELECT * FROM call_results_fixed AS cr
WHERE cr.client_id=8 AND l.full_phone=cr.phonenum);
Indexes needed (I think):
cr.client_id
cr.phonenum
Maybe (I don't think so):
l.full_phone
Not sure if this is what you wanted but hope its helpfull,
- Stuart
----- Original Message -----From: Henshall, Stuart - WCPSent: Thursday, August 22, 2002 7:59 AMSubject: RE: [NOVICE] Simple but slow> -----Original Message-----
> From: Chad Thompson [mailto:chad@weblinkservices.com]
>
> select l.id, l.full_phone into "8_21_2002"
> from "temp_list_tier" l LEFT JOIN
> (select phonenum
> from call_results_fixed
> where client_id = 8) as cr
> ON l.full_phone = cr.phonenum
> where cr.phonenum Is Null
>
I'm guessing you want all those in "temp_list_tier" where there is not the same phone number in call_results_fixed.
Does this (untested) run better for you:SELECT l.id,l.full_phone INTO "8_21_2002"
FROM "temp_list_tier" AS l WHERE
WHERE NOT EXISTS
(SELECT * FROM call_results_fixed AS cr
WHERE cr.client_id=8 AND l.full_phone=cr.phonenum);Indexes needed (I think):
cr.client_id
cr.phonenum
Maybe (I don't think so):
l.full_phoneNot sure if this is what you wanted but hope its helpfull,
- Stuart
Chad, > Thanks for your reply Josh, as usual i learn from you whenever you > write. You're quite welcome! > Ive been having a hard time understanding what explain is telling me. > I was able to get the query down to 19 secs w/o the distinct. I > think i'll > move the distinct to one of my faster queries. Distinct on large result sets can be quite brutal. Here's why your query was slow with DISTINCT: 1. First the query has to sort by the DISTINCT field. 2. Then it has to "roll up" all the non-distinct entries 3. Then it has to re-sort by your output sort. This isn't much of a problem on small tables, but with 2 million records, that's 3 table scans of the whole table, which either requires a lot of patience or a server with 2gb of RAM and a really fast RAID array. > If its not too much trouble id like you to look at another. This is > really > being a beast. I think somebody already posted a solution for this. > Thanks for your help. > I have also enjoyed your "The Joy of Index". I look forward to the > next > issue. You're welcome again. According to Tom and Bruno, I need to post some corrections ... look for them early next week. -Josh Berkus "Standing on the shoulders of giants."