Thread: speed of querry?
I must be missing something important, because I am just not seeing why this query is slower on a 4 processor 8 gig machine running redhat AS4. The SQL: explain analyze SELECT a.clientnum, a.associateid, a.associatenum, a.lastname, a.firstname, jt.value AS jobtitle, l.name AS "location", l.locationid AS mainlocationid, l.divisionid, l.regionid, l.districtid, (a.lastname::text || ', '::text) || a.firstname::text AS assocname, a.isactive, a.isdeleted FROM tblassociate a left JOIN tbljobtitle jt ON a.jobtitleid = jt.id AND jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid JOIN tbllocation l ON a.locationid = l.locationid AND l.clientnum::text = a.clientnum::text where a.clientnum = 'SAKS'; Machine 1 my desktop: "Merge Join (cost=74970.51..75975.46 rows=8244 width=113) (actual time=5141.000..6363.000 rows=160593 loops=1)" " Merge Cond: ("outer".locationid = "inner".locationid)" " -> Sort (cost=656.22..657.11 rows=354 width=49) (actual time=16.000..16.000 rows=441 loops=1)" " Sort Key: l.locationid" " -> Index Scan using ix_location on tbllocation l (cost=0.00..641.23 rows=354 width=49) (actual time=0.000..0.000 rows=441 loops=1)" " Index Cond: ('SAKS'::text = (clientnum)::text)" " -> Sort (cost=74314.29..74791.06 rows=190710 width=75) (actual time=5125.000..5316.000 rows=160594 loops=1)" " Sort Key: a.locationid" " -> Merge Right Join (cost=0.00..52366.50 rows=190710 width=75) (actual time=16.000..1973.000 rows=177041 loops=1)" " Merge Cond: ((("outer".clientnum)::text = ("inner".clientnum)::text) AND ("outer".id = "inner".jobtitleid))" " -> Index Scan using ix_tbljobtitle_id on tbljobtitle jt (cost=0.00..244.75 rows=6622 width=37) (actual time=0.000..16.000 rows=5690 loops=1)" " Filter: (1 = presentationid)" " -> Index Scan using ix_tblassoc_jobtitleid on tblassociate a (cost=0.00..50523.83 rows=190710 width=53) (actual time=0.000..643.000 rows=177041 loops=1)" " Index Cond: ((clientnum)::text = 'SAKS'::text)" "Total runtime: 6719.000 ms" Test Linux machine: "Merge Join (cost=48126.04..49173.57 rows=15409 width=113) (actual time=11832.165..12678.025 rows=160593 loops=1)" " Merge Cond: ("outer".locationid = "inner".locationid)" " -> Sort (cost=807.64..808.75 rows=443 width=49) (actual time=2.418..2.692 rows=441 loops=1)" " Sort Key: l.locationid" " -> Index Scan using ix_location on tbllocation l (cost=0.00..788.17 rows=443 width=49) (actual time=0.036..1.677 rows=441 loops=1)" " Index Cond: ('SAKS'::text = (clientnum)::text)" " -> Sort (cost=47318.40..47758.44 rows=176015 width=75) (actual time=11829.660..12002.746 rows=160594 loops=1)" " Sort Key: a.locationid" " -> Merge Right Join (cost=24825.80..27512.71 rows=176015 width=75) (actual time=8743.848..9750.775 rows=177041 loops=1)" " Merge Cond: ((("outer".clientnum)::text = "inner"."?column10?") AND ("outer".id = "inner".jobtitleid))" " -> Index Scan using ix_tbljobtitle_id on tbljobtitle jt (cost=0.00..239.76 rows=6604 width=37) (actual time=0.016..11.323 rows=5690 loops=1)" " Filter: (1 = presentationid)" " -> Sort (cost=24825.80..25265.84 rows=176015 width=53) (actual time=8729.320..8945.292 rows=177041 loops=1)" " Sort Key: (a.clientnum)::text, a.jobtitleid" " -> Index Scan using ix_associate_clientnum on tblassociate a (cost=0.00..9490.20 rows=176015 width=53) (actual time=0.036..1071.867 rows=177041 loops=1)" " Index Cond: ((clientnum)::text = 'SAKS'::text)" "Total runtime: 12802.019 ms" I tried to remove the left outer thinking it would speed it up, and it used a seq search on tblassoc and ran 2 times slower. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments.
Joel Fradkin wrote: > I must be missing something important, because I am just not seeing why this > query is slower on a 4 processor 8 gig machine running redhat AS4. Well, the 4 processors aren't going to help with a single query. However, assuming the configurations for both machines are comparable, you shouldn't be seeing a doubling in query-time. I have, however, spotted something very strange towards the bottom of each explain: > Machine 1 my desktop: > " -> Merge Right Join (cost=0.00..52366.50 rows=190710 width=75) > (actual time=16.000..1973.000 rows=177041 loops=1)" > " Merge Cond: ((("outer".clientnum)::text = > ("inner".clientnum)::text) AND ("outer".id = "inner".jobtitleid))" > Test Linux machine: > " -> Merge Right Join (cost=24825.80..27512.71 rows=176015 > width=75) (actual time=8743.848..9750.775 rows=177041 loops=1)" > " Merge Cond: ((("outer".clientnum)::text = > "inner"."?column10?") AND ("outer".id = "inner".jobtitleid))" In the first, we match outer.clientnum to inner.clientnum, in the second it's "?column10?" - are you sure the query was identical in each case. I'm guessing the unidentified column in query 2 is the reason for the sort a couple of lines below it, which seems to take up a large chunk of time. -- Richard Huxton Archonet Ltd
are you sure the query was identical in each case. I just ran a second time same results ensuring that the query is the same. Not sure why it is doing a column10 thing. Any ideas what to look for? Both data bases are a restore from the same backup file. One is running redhat the other XP, I believe both are the same version of postgres except for the different platform (8.0.1 I am pretty sure). I just spent the morning with Dell hoping for some explanation from them. They said I had to have the database on the same type of OS and hardware for them to think the issue was hardware. They are escalating to the software group. I did a default Redhat install so it very well may be an issue with my lack of knowledge on Linux. He did mention by default the Perc4 do cache, so I may need to visit the data center to set the percs to not cache. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > In the first, we match outer.clientnum to inner.clientnum, in the second > it's "?column10?" - are you sure the query was identical in each case. > I'm guessing the unidentified column in query 2 is the reason for the > sort a couple of lines below it, which seems to take up a large chunk of > time. The "?column10?" is because EXPLAIN isn't excessively bright about reporting references to outputs of lower plan nodes. (Gotta fix that sometime.) The real point here is that the planner thought that a scan plus sort would be faster than scanning an index that exactly matched the sort order the Merge Join needed ... and it was wrong :-( So this is just the usual sort of question of "are your stats up to date, maybe you need to increase stats targets, or else play with random_page_cost, etc" ... regards, tom lane
I have done a vacuum and a vacuum analyze. I can try again for kicks, but it is not in production so no new records are added and vacuum analyze is ran after any mods to the indexes. I am still pursuing Dell on why the monster box is so much slower then the desktop as well. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Thursday, April 14, 2005 1:47 AM To: Richard Huxton Cc: Joel Fradkin; PostgreSQL Perform Subject: Re: [PERFORM] speed of querry? Richard Huxton <dev@archonet.com> writes: > In the first, we match outer.clientnum to inner.clientnum, in the second > it's "?column10?" - are you sure the query was identical in each case. > I'm guessing the unidentified column in query 2 is the reason for the > sort a couple of lines below it, which seems to take up a large chunk of > time. The "?column10?" is because EXPLAIN isn't excessively bright about reporting references to outputs of lower plan nodes. (Gotta fix that sometime.) The real point here is that the planner thought that a scan plus sort would be faster than scanning an index that exactly matched the sort order the Merge Join needed ... and it was wrong :-( So this is just the usual sort of question of "are your stats up to date, maybe you need to increase stats targets, or else play with random_page_cost, etc" ... regards, tom lane
On 4/14/05, Joel Fradkin <jfradkin@wazagua.com> wrote: > I have done a vacuum and a vacuum analyze. > I can try again for kicks, but it is not in production so no new records are > added and vacuum analyze is ran after any mods to the indexes. > > I am still pursuing Dell on why the monster box is so much slower then the > desktop as well. First thing: Do something like: ALTER TABLE tbljobtitle ALTER COLUMN clientnum SET STATISTICS 50; make it for each column used, make it even higher than 50 for many-values columns. THEN make VACUUM ANALYZE; Then do a query couple of times (EXPLAIN ANALYZE also :)), then do: SET enable_seqscan = off; and rerun the query -- if it was significantly faster, you will want to do: SET enable_seqscan = on; and tweak: SET random_page_cost = 2.1; ...and play with values. When you reach the random_page_cost which suits your data, you will want to put it into postgresql.conf I am sorry if it is already known to you. :) Also, it is a rather simplistic approach to tuning PostgreSQL but it is worth doing. Especially the statistics part. :) Regards, Dawid
Josh from commandprompt.com had me alter the config to have default_statistics_target = 250 Is this somehow related to what your asking me to do? I did do an analyze, but have only ran the viw a few times. Joel Fradkin -----Original Message----- From: Dawid Kuroczko [mailto:qnex42@gmail.com] Sent: Thursday, April 14, 2005 9:21 AM To: Joel Fradkin Cc: PostgreSQL Perform Subject: Re: [PERFORM] speed of querry? On 4/14/05, Joel Fradkin <jfradkin@wazagua.com> wrote: > I have done a vacuum and a vacuum analyze. > I can try again for kicks, but it is not in production so no new records are > added and vacuum analyze is ran after any mods to the indexes. > > I am still pursuing Dell on why the monster box is so much slower then the > desktop as well. First thing: Do something like: ALTER TABLE tbljobtitle ALTER COLUMN clientnum SET STATISTICS 50; make it for each column used, make it even higher than 50 for many-values columns. THEN make VACUUM ANALYZE; Then do a query couple of times (EXPLAIN ANALYZE also :)), then do: SET enable_seqscan = off; and rerun the query -- if it was significantly faster, you will want to do: SET enable_seqscan = on; and tweak: SET random_page_cost = 2.1; ...and play with values. When you reach the random_page_cost which suits your data, you will want to put it into postgresql.conf I am sorry if it is already known to you. :) Also, it is a rather simplistic approach to tuning PostgreSQL but it is worth doing. Especially the statistics part. :) Regards, Dawid
On 4/14/05, Joel Fradkin <jfradkin@wazagua.com> wrote: > Josh from commandprompt.com had me alter the config to have > default_statistics_target = 250 > > Is this somehow related to what your asking me to do? > I did do an analyze, but have only ran the viw a few times. well, he did suggest the right thing. However this parameter applies to newly created tables, so either recreate the tables or do the ALTER TABLE I've sent eariler. Basically it tells postgres how many values should it keep for statistics per column. The config default_statistics_target is the default (= used when creating table) and ALTER... is a way to change it later. The more statistics PostgreSQL has means it can better predict how much data will be returned -- and this directly leads to a choice how to handle the data (order in which tables should be read, whether to use index or not, which algorithm use for join, etc.). The more statistics, the better PostgreSQL is able to predict. The more statistics, the slower planner is able to do the analysis. So you have to find a value which will be as much as is needed to accurately predict the results but not more! PostgreSQL's default of 10 is a bit conservative, hence the suggestions to increase it. :) [ and so is random_page_cost or some people have found that in their cases it is beneficial to reduce the value, even as much as below 2. ] Hope this clairifies things a bit. Regards, Dawid
Dawid Kuroczko <qnex42@gmail.com> writes: > Basically it tells postgres how many values should it keep for > statistics per column. The config default_statistics_target > is the default (= used when creating table) and ALTER... is > a way to change it later. Not quite. default_statistics_target is the value used by ANALYZE for any column that hasn't had an explicit ALTER SET STATISTICS done on it. So you can change default_statistics_target and that will affect existing tables. (It used to work the way you are saying, but that was a few releases back...) regards, tom lane
I did as described to alter table and did not see any difference in speed. I am trying to undo the symbolic link to the data array and set it up on raid 5 disks in the machine just to test if there is an issue with the config of the raid 10 array or a problem with the controller. I am kinda lame at Linux so not sure I have got it yet still testing. Still kind puzzled why it chose tow different option, but one is running windows version of postgres, so maybe that has something to do with it. The data bases and configs (as far as page cost) are the same. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Thursday, April 14, 2005 11:21 AM To: Dawid Kuroczko Cc: Joel Fradkin; PERFORM Subject: Re: [PERFORM] speed of querry? Dawid Kuroczko <qnex42@gmail.com> writes: > Basically it tells postgres how many values should it keep for > statistics per column. The config default_statistics_target > is the default (= used when creating table) and ALTER... is > a way to change it later. Not quite. default_statistics_target is the value used by ANALYZE for any column that hasn't had an explicit ALTER SET STATISTICS done on it. So you can change default_statistics_target and that will affect existing tables. (It used to work the way you are saying, but that was a few releases back...) regards, tom lane
Well so far I have 1.5 hours with commandpromt.com and 8 + hours with Dell and have not seen any changes in the speed of my query. I did move the data base to the raid 5 drives and did see a 1 second improvement from 13 secs to 12 secs (keep in mind it runs in 6 on the optiplex). The dell guy ran Bonie and found 40meg per sec read/write speed for the arrays. He also installed version 8.0.2 (went fine on AS4 he had to uninstall 8.0.1 first). He is going to get a 6650 in his test lab to see what he can fugure out. I will say both commandprompt.com and Dell have been very professional and I am impressed at the level of support available for Redhat from Dell and postgres. As always I still feel this list has been my most useful asset, but I am glad there are folks to call on. I am trying to go live soon and need to get this resolved. I told the guy from Dell it makes no sense that a windows 2.4 single proc with 750 meg of ram can go faster then a 4 proc (3.ghz) 8 gig machine. Both databases were restored from the same file. Same view etc. Config files are set the same except for amount of cached ram, although Commandprompt.com had me adjust a few items that should help going into production, put planning stuff is basicly the same. This view returns in 3 secs on MSSQL server on the optiplex (750 meg 2.4 box); and 6 secs using postgres on windows and 12-13 secs on the 4 processor box. Needless to say I am very frustrated. Maybe Dell will turn up something testing in their lab. It took a bit of perseverance to get to the right guy at Dell (the first guy actually told me to load it all on a like machine and if it was very much slower on my original they would pursue it otherwise it was not an issue. I was like the machine cost 30K you going to send me one to test that. But seriously I am open to trying anything (loading AS3, using postgres 7.4)? The fellow at Dell does not think it is a hardware problem, so if it is Linux (could very well be, but he seemed very sharp and did not come up with anything yet) or postgres config (again Josh at commandprompt.com was very sharp) then what do I do now to isolate the issue? At least they are loading one in the lab (in theory, I cant send them my database, so who knows what they will test). Dell changed the file system to ext2 is that going to bite me in the butt? It did not seem to change the speed of my explain analyze. Joel Fradkin Dawid Kuroczko <qnex42@gmail.com> writes: > Basically it tells postgres how many values should it keep for > statistics per column. The config default_statistics_target > is the default (= used when creating table) and ALTER... is > a way to change it later. Not quite. default_statistics_target is the value used by ANALYZE for any column that hasn't had an explicit ALTER SET STATISTICS done on it. So you can change default_statistics_target and that will affect existing tables. (It used to work the way you are saying, but that was a few releases back...) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
> -----Original Message----- > From: Joel Fradkin [mailto:jfradkin@wazagua.com] > Sent: Thursday, April 14, 2005 11:39 AM > To: 'Tom Lane'; 'Dawid Kuroczko' > Cc: 'PERFORM' > Subject: Re: [PERFORM] speed of querry? > > > I did as described to alter table and did not see any > difference in speed. I am trying to undo the symbolic > link to the data array and set it up on raid 5 disks in > the machine just to test if there is an issue with the > config of the raid 10 array or a problem with the controller. > > I am kinda lame at Linux so not sure I have got it yet still > testing. Still kind puzzled why it chose tow different option, > but one is running windows version of postgres, so maybe that > has something to do with it. That sounds like a plausible explanation. However, it could simply be that the statistics gathered on each box are sufficiently different to cause different plans. > The data bases and configs (as far as page cost) are the same. Did you do as Dawid suggested? > [...] > Then do a query couple of times (EXPLAIN ANALYZE also :)), then > do: > SET enable_seqscan = off; > and rerun the query -- if it was significantly faster, you will > want to do: > SET enable_seqscan = on; > and tweak: > SET random_page_cost = 2.1; > ...and play with values. When you reach the random_page_cost > which suits your data, you will want to put it into > postgresql.conf > [...] This is above and beyond toying with the column statistics. You are basically telling the planner to use an index. Try this, and post the EXPLAIN ANALYZE for the seqscan = off case on the slow box if it doesn't speed things up for you. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
It is still slower on the Linux box. (included is explain with SET enable_seqscan = off; explain analyze select * from viwassoclist where clientnum ='SAKS') See below. I did a few other tests (changing drive arrays helped by 1 second was slower on my raid 10 on the powervault). Pulling just raw data is much faster on the Linux box. "Seq Scan on tblresponse_line (cost=100000000.00..100089717.78 rows=4032078 width=67) (actual time=0.028..4600.431 rows=4032078 loops=1)" "Total runtime: 6809.399 ms" Windows box "Seq Scan on tblresponse_line (cost=0.00..93203.68 rows=4031968 width=67) (actual time=16.000..11316.000 rows=4031968 loops=1)" "Total runtime: 16672.000 ms" I am going to reload the data bases, just to see what I get. I am thinking I may have to flatten the files for postgres (eliminate joins of any kind for reporting etc). Might make a good deal more data, but I think from the app's point of view it is a good idea anyway, just not sure how to handle editing. Joel Fradkin "Merge Join (cost=49697.60..50744.71 rows=14987 width=113) (actual time=11301.160..12171.072 rows=160593 loops=1)" " Merge Cond: ("outer".locationid = "inner".locationid)" " -> Sort (cost=788.81..789.89 rows=432 width=49) (actual time=3.318..3.603 rows=441 loops=1)" " Sort Key: l.locationid" " -> Index Scan using ix_location on tbllocation l (cost=0.00..769.90 rows=432 width=49) (actual time=0.145..2.283 rows=441 loops=1)" " Index Cond: ('SAKS'::text = (clientnum)::text)" " -> Sort (cost=48908.79..49352.17 rows=177352 width=75) (actual time=11297.774..11463.780 rows=160594 loops=1)" " Sort Key: a.locationid" " -> Merge Right Join (cost=26247.95..28942.93 rows=177352 width=75) (actual time=8357.010..9335.362 rows=177041 loops=1)" " Merge Cond: ((("outer".clientnum)::text = "inner"."?column10?") AND ("outer".id = "inner".jobtitleid))" " -> Index Scan using ix_tbljobtitle_id on tbljobtitle jt (cost=0.00..243.76 rows=6604 width=37) (actual time=0.122..12.049 rows=5690 loops=1)" " Filter: (1 = presentationid)" " -> Sort (cost=26247.95..26691.33 rows=177352 width=53) (actual time=8342.271..8554.943 rows=177041 loops=1)" " Sort Key: (a.clientnum)::text, a.jobtitleid" " -> Index Scan using ix_associate_clientnum on tblassociate a (cost=0.00..10786.17 rows=177352 width=53) (actual time=0.166..1126.052 rows=177041 loops=1)" " Index Cond: ((clientnum)::text = 'SAKS'::text)" "Total runtime: 12287.502 ms" This is above and beyond toying with the column statistics. You are basically telling the planner to use an index. Try this, and post the EXPLAIN ANALYZE for the seqscan = off case on the slow box if it doesn't speed things up for you. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
"Joel Fradkin" <jfradkin@wazagua.com> writes: > "Merge Join (cost=49697.60..50744.71 rows=14987 width=113) (actual > time=11301.160..12171.072 rows=160593 loops=1)" > " Merge Cond: ("outer".locationid = "inner".locationid)" > " -> Sort (cost=788.81..789.89 rows=432 width=49) (actual > time=3.318..3.603 rows=441 loops=1)" > " Sort Key: l.locationid" > " -> Index Scan using ix_location on tbllocation l > (cost=0.00..769.90 rows=432 width=49) (actual time=0.145..2.283 rows=441 > loops=1)" > " Index Cond: ('SAKS'::text = (clientnum)::text)" > " -> Sort (cost=48908.79..49352.17 rows=177352 width=75) (actual > time=11297.774..11463.780 rows=160594 loops=1)" > " Sort Key: a.locationid" > " -> Merge Right Join (cost=26247.95..28942.93 rows=177352 > width=75) (actual time=8357.010..9335.362 rows=177041 loops=1)" > " Merge Cond: ((("outer".clientnum)::text = > "inner"."?column10?") AND ("outer".id = "inner".jobtitleid))" > " -> Index Scan using ix_tbljobtitle_id on tbljobtitle jt > (cost=0.00..243.76 rows=6604 width=37) (actual time=0.122..12.049 rows=5690 > loops=1)" > " Filter: (1 = presentationid)" > " -> Sort (cost=26247.95..26691.33 rows=177352 width=53) > (actual time=8342.271..8554.943 rows=177041 loops=1)" > " Sort Key: (a.clientnum)::text, a.jobtitleid" > " -> Index Scan using ix_associate_clientnum on > tblassociate a (cost=0.00..10786.17 rows=177352 width=53) (actual > time=0.166..1126.052 rows=177041 loops=1)" > " Index Cond: ((clientnum)::text = 'SAKS'::text)" > "Total runtime: 12287.502 ms" It strikes me as odd that the thing isn't considering hash joins for at least some of these steps. Can you force it to (by setting enable_mergejoin off)? If not, what are the datatypes of the join columns exactly? regards, tom lane
Joel Fradkin Turning off merg joins seems to of done it but what do I need to do so I am not telling the system explicitly not to use them, I must be missing some setting? On linux box. explain analyze select * from viwassoclist where clientnum ='SAKS' "Hash Join (cost=988.25..292835.36 rows=15773 width=113) (actual time=23.514..3024.064 rows=160593 loops=1)" " Hash Cond: ("outer".locationid = "inner".locationid)" " -> Hash Left Join (cost=185.57..226218.77 rows=177236 width=75) (actual time=21.147..2221.098 rows=177041 loops=1)" " Hash Cond: (("outer".jobtitleid = "inner".id) AND (("outer".clientnum)::text = ("inner".clientnum)::text))" " -> Seq Scan on tblassociate a (cost=0.00..30851.25 rows=177236 width=53) (actual time=0.390..1095.385 rows=177041 loops=1)" " Filter: ((clientnum)::text = 'SAKS'::text)" " -> Hash (cost=152.55..152.55 rows=6604 width=37) (actual time=20.609..20.609 rows=0 loops=1)" " -> Seq Scan on tbljobtitle jt (cost=0.00..152.55 rows=6604 width=37) (actual time=0.033..12.319 rows=6603 loops=1)" " Filter: (1 = presentationid)" " -> Hash (cost=801.54..801.54 rows=454 width=49) (actual time=2.196..2.196 rows=0 loops=1)" " -> Index Scan using ix_location on tbllocation l (cost=0.00..801.54 rows=454 width=49) (actual time=0.111..1.755 rows=441 loops=1)" " Index Cond: ('SAKS'::text = (clientnum)::text)" "Total runtime: 3120.366 ms" here are the table defs and view if that helps. I posted the config a while back, but can do it again if you need to see it. CREATE OR REPLACE VIEW viwassoclist AS SELECT a.clientnum, a.associateid, a.associatenum, a.lastname, a.firstname, jt.value AS jobtitle, l.name AS "location", l.locationid AS mainlocationid, l.divisionid, l.regionid, l.districtid, (a.lastname::text || ', '::text) || a.firstname::text AS assocname, a.isactive, a.isdeleted FROM tblassociate a LEFT JOIN tbljobtitle jt ON a.jobtitleid = jt.id AND jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid JOIN tbllocation l ON a.locationid = l.locationid AND l.clientnum::text = a.clientnum::text; CREATE TABLE tblassociate ( clientnum varchar(16) NOT NULL, associateid int4 NOT NULL, associatenum varchar(10), firstname varchar(50), middleinit varchar(5), lastname varchar(50), ssn varchar(18), dob timestamp, address varchar(100), city varchar(50), state varchar(50), country varchar(50), zip varchar(10), homephone varchar(14), cellphone varchar(14), pager varchar(14), associateaccount varchar(50), doh timestamp, dot timestamp, rehiredate timestamp, lastdayworked timestamp, staffexecid int4, jobtitleid int4, locationid int4, deptid int4, positionnum int4, worktypeid int4, sexid int4, maritalstatusid int4, ethnicityid int4, weight float8, heightfeet int4, heightinches int4, haircolorid int4, eyecolorid int4, isonalarmlist bool NOT NULL DEFAULT false, isactive bool NOT NULL DEFAULT true, ismanager bool NOT NULL DEFAULT false, issecurity bool NOT NULL DEFAULT false, createdbyid int4, isdeleted bool NOT NULL DEFAULT false, militarybranchid int4, militarystatusid int4, patrontypeid int4, identificationtypeid int4, workaddress varchar(200), testtypeid int4, testscore int4, pin int4, county varchar(50), CONSTRAINT pk_tblassociate PRIMARY KEY (clientnum, associateid), CONSTRAINT ix_tblassociate UNIQUE (clientnum, associatenum) ) CREATE TABLE tbljobtitle ( clientnum varchar(16) NOT NULL, id int4 NOT NULL, value varchar(50), code varchar(16), isdeleted bool DEFAULT false, presentationid int4 NOT NULL DEFAULT 1, CONSTRAINT pk_tbljobtitle PRIMARY KEY (clientnum, id, presentationid) ) CREATE TABLE tbllocation ( clientnum varchar(16) NOT NULL, locationid int4 NOT NULL, districtid int4 NOT NULL, regionid int4 NOT NULL, divisionid int4 NOT NULL, locationnum varchar(8), name varchar(50), clientlocnum varchar(50), address varchar(100), address2 varchar(100), city varchar(50), state varchar(2) NOT NULL DEFAULT 'zz'::character varying, zip varchar(10), countryid int4, phone varchar(15), fax varchar(15), payname varchar(40), contact char(36), active bool NOT NULL DEFAULT true, coiprogram text, coilimit text, coiuser varchar(255), coidatetime varchar(32), ec_note_field varchar(1050), locationtypeid int4, open_time timestamp, close_time timestamp, insurance_loc_id varchar(50), lpregionid int4, sic int4, CONSTRAINT pk_tbllocation PRIMARY KEY (clientnum, locationid), CONSTRAINT ix_tbllocation_1 UNIQUE (clientnum, locationnum, name), CONSTRAINT ix_tbllocation_unique_number UNIQUE (clientnum, divisionid, regionid, districtid, locationnum) ) It strikes me as odd that the thing isn't considering hash joins for at least some of these steps. Can you force it to (by setting enable_mergejoin off)? If not, what are the datatypes of the join columns exactly? regards, tom lane
"Joel Fradkin" <jfradkin@wazagua.com> writes: > Turning off merg joins seems to of done it but what do I need to do so I am > not telling the system explicitly not to use them, I must be missing some > setting? > " -> Hash Left Join (cost=185.57..226218.77 rows=177236 width=75) (actual > time=21.147..2221.098 rows=177041 loops=1)" > " Hash Cond: (("outer".jobtitleid = "inner".id) AND > (("outer".clientnum)::text = ("inner".clientnum)::text))" It's overestimating the cost of this join for some reason ... and I think I see why. It's not accounting for the combined effect of the two hash clauses, only for the "better" one. What are the statistics for tbljobtitle.id and tbljobtitle.clientnum --- how many distinct values of each, and are the distributions skewed to a few popular values? regards, tom lane
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > Joel Fradkin > Sent: 18 April 2005 14:02 > To: PostgreSQL Perform > Subject: FW: [PERFORM] speed of querry? > > Another odd thing is when I tried turning off merge joins on > the XP desktop > It took 32 secs to run compared to the 6 secs it was taking. > On the Linux (4proc box) it is now running in 3 secs with the > mergejoins > turned off. > > Unfortunately it takes over 2 minutes to actually return the > 160,000+ rows. > I am guessing that is either network (I have gig cards on a > LAN) or perhaps > the ODBC driver (using PGADMIN III to do the select). pgAdmin III uses libpq, not the ODBC driver. Regards, Dave
pgAdmin III uses libpq, not the ODBC driver. Sorry I am not too aware of all the semantics. I guess the question is if it is normal to take 2 mins to get 160K of records, or is there something else I can do (I plan on limiting the query screens using limit and offset; I realize this will only be effective for the early part of the returned record set, but I believe they don't page through a bunch of records, they probably add search criteria). But for reporting I will need to return all the records and this seems slow to me (but it might be in line with what I get now; I will have to do some benchmarking). The application is a mixture of .net and asp and will soon have java. So I am using the .net library for the .net pages and the ODBC driver for the asp pages. I did find using a view for the location join sped up the query a great deal, I will have to see if there are other places I can use that thinking (instead of joining on the associate table and its dependants I can just join on a view of that data, etc). Basically I have a view that does a join from location to district, region and division tables. The old viwassoclist had those joined to the assoc table in the viwassoclist, I changed it to use the view I created where the tables were joined to the location table and in assoclist I just join to the location view. This really made a huge difference in speed. Regards, Dave