Thread: same question little different test MSSQL vrs Postgres
I also tried a simple select * from tblcase where clientum = ‘SAKS’
On both MSSQL and Postgres.
MSSQL was 3 secs, Postgres was 27 secs.
There is a key for clientnum, but it appeared on both systems (identical Dell Desktops Postgres is running Linux MSSQL is XP) it did not do a indexed search.
Joel Fradkin
Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel. 941-753-7111 ext 305
jfradkin@wazagua.com
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
© 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments.
Joel Fradkin wrote: <blockquote cite="mid000901c5035a$3b619cc0$797ba8c0@jfradkin" type="cite"><style> <!--/* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"TimesNew Roman";} a:link, span.MsoHyperlink{color:blue;text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed{color:purple;text-decoration:underline;} span.EmailStyle17{font-family:Arial;color:windowtext;} @page Section1{size:8.5in 11.0in;margin:1.0in 1.25in 1.0in 1.25in;} div.Section1{page:Section1;} --> </style><div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family:Arial;">I also tried a simple select * from tblcase where clientum = ‘SAKS’</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">On both MSSQL and Postgres.</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span style="font-size: 10pt; font-family: Arial;">MSSQLwas 3 secs, Postgres was 27 secs.</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10pt; font-family: Arial;"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10pt; font-family: Arial;">There is a key for clientnum, but it appeared on both systems (identical DellDesktops Postgres is running Linux MSSQL is XP) it did not do a indexed search.</span></font></div></blockquote><br />One of the things you'll want to do regularly is run a "vacuum analyze". You can read up on this in the postgresql docs.This is essential to the indexes being used properly. At a bare minimum, after you import a large amount of data, you'llwant to run vacuum analyze.<br /><br /> Dennis Sacks<br /><a class="moz-txt-link-abbreviated" href="mailto:dennis@illusions.com">dennis@illusions.com</a><br/>
"Joel Fradkin" <jfradkin@wazagua.com> writes: > I also tried a simple select * from tblcase where clientum = 'SAKS' Try: explain analyze select * from tblcase where clientum = 'SAKS' Send the output. -- greg
QUERY PLAN "Seq Scan on tblcase (cost=0.00..30066.21 rows=37401 width=996) (actual time=0.344..962.260 rows=22636 loops=1)" " Filter: ((clientnum)::text = 'SAKS'::text)" "Total runtime: 1034.434 ms" Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, IncThis email message is for the use of the intended recipient(s) andmay contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: gsstark@mit.edu [mailto:gsstark@mit.edu] Sent: Wednesday, January 26, 2005 1:27 AM To: Joel Fradkin Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] same question little different test MSSQL vrs Postgres "Joel Fradkin" <jfradkin@wazagua.com> writes: > I also tried a simple select * from tblcase where clientum = 'SAKS' Try: explain analyze select * from tblcase where clientum = 'SAKS' Send the output. -- greg
Joel Fradkin wrote: > QUERY PLAN > "Seq Scan on tblcase (cost=0.00..30066.21 rows=37401 width=996) (actual > time=0.344..962.260 rows=22636 loops=1)" > " Filter: ((clientnum)::text = 'SAKS'::text)" > "Total runtime: 1034.434 ms" That's only 1 second - to return 22,636 rows. Not 27 seconds, as in the original post. You'll never persuade PG to use the index when some 75% of your rows match the filter - it just doesn't make sense. -- Richard Huxton Archonet Ltd
Well last evening (did not try it this morning) it was taking the extra time. I have made some adjustments to the config file per a few web sites that you all recommended my looking at. It is now using 137 of 756 meg avail. it is now taking 8 secs to return 22,000 rows (using pgadminIII in a sql edit window). The EXPLAIN ANALYSE still shows the same as below, but the table has 344,000 recs of which only 22636 are clientnum = 'SAKS' I am still doing a seq search (this applies to the view question where if it is a small result set it used a index search but on a larger return set it did a seq search) in my view, but with the adjustments to the kernel I get a result in 140 secs (MSSQL was 135 secs). This is not production, I am still very worried that I have to do all this tweeking to use this, MSSQL worked out of the box as it does (not saying its great, but I never had to adjust a kernel setting etc). Since we cannot afford the 70,000 dollars they want to license it I am not implying I can use MSSQL, but I could look at other DB's like MYSQL, or Firebird, etc. I have a lot of time now (two weeks) in this conversion and do not wish to give up, I will see if I can learn what is needed to get the maximum performance. I have seen much information available and this list has been a huge resource. I really appreciate all the help. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 > QUERY PLAN > "Seq Scan on tblcase (cost=0.00..30066.21 rows=37401 width=996) (actual > time=0.344..962.260 rows=22636 loops=1)" > " Filter: ((clientnum)::text = 'SAKS'::text)" > "Total runtime: 1034.434 ms" That's only 1 second - to return 22,636 rows. Not 27 seconds, as in the original post. You'll never persuade PG to use the index when some 75% of your rows match the filter - it just doesn't make sense. -- Richard Huxton Archonet Ltd
Joel Fradkin wrote: > Well last evening (did not try it this morning) it was taking the extra > time. > > I have made some adjustments to the config file per a few web sites that you > all recommended my looking at. The crucial one I'd say is the performance guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The first half-dozen settings are the crucial ones. > It is now using 137 of 756 meg avail. > it is now taking 8 secs to return 22,000 rows (using pgadminIII in a sql > edit window). That might be too much RAM. Don't forget PG likes to work with your operating-system (unlike many other DBs). Make sure Windows is using enough RAM to cache diskspace. I'm curious as to how this takes 8secs whereas you had 1 second earlier. Are you sure some of this isn't pgadmin's overhead to display the rows? > The EXPLAIN ANALYSE still shows the same as below, but the table has 344,000 > recs of which only 22636 are clientnum = 'SAKS' That sounds like it's about the borderline between using an index and not (depending on cache-size, disk speeds etc). > I am still doing a seq search (this applies to the view question where if it > is a small result set it used a index search but on a larger return set it > did a seq search) in my view, but with the adjustments to the kernel I get a > result in 140 secs (MSSQL was 135 secs). If you want to check whether the index would help, try issuing the following before running your query: SET ENABLE_SEQSCAN=FALSE; This will force PG to use any index it can regardless of whether it thinks it will help. > This is not production, I am still very worried that I have to do all this > tweeking to use this, MSSQL worked out of the box as it does (not saying its > great, but I never had to adjust a kernel setting etc). Since we cannot > afford the 70,000 dollars they want to license it I am not implying I can > use MSSQL, but I could look at other DB's like MYSQL, or Firebird, etc. I'm a little curious what kernel settings you are changing on Windows. I wasn't aware there was much to be done there. I'm afraid you do have to change half a dozen settings in postgresql.conf to match your workload, but PG runs on a much wider range of machines than MSSQL so it's difficult to come up with a "reasonable" default. Takes me about 5 minutes when I setup an installation to make sure the figures are reasonable (rather than the best they can be). > I have a lot of time now (two weeks) in this conversion and do not wish to > give up, I will see if I can learn what is needed to get the maximum > performance. I have seen much information available and this list has been a > huge resource. I really appreciate all the help. -- Richard Huxton Archonet Ltd
I've tested in a relation of mine, with about 20 attributes, and here are the results:<br /><br /> test=# select count(*)from gestionestareas;<br /> count<br /> --------<br /> 447681<br /> (1 row)<br /><br /> test=# explain analyzeselect * from gestionestareas where agrupable;<br /> QUERYPLAN<br /> ----------------------------------------------------------------------------------------------------------------------------<br /> Seq Scan on gestionestareas (cost=0.00..12334.81 rows=155495 width=372) (actual time=0.603..1176.177 rows=153530 loops=1)<br/> Filter: agrupable<br /> Total runtime: 1380.113 ms<br /> (3 rows)<br /><br /> So, doing a seq scan on a450,000 rows table and fetching 150,000 rows takes only 1.3 secs. This a 900Mhz PIII, 1GB mem (133 Mhz), 7200RPM ide disk,running freebsd. This machine is also a webserver (apache & tomcat), mail server, file server (smb & nfs), ldapserver, etc.<br /><br /> I don't use pgadmin, I use psql (postgresql console client). It took 2 minutes to display theresults of the above query (without the EXPLAIN ANALIZE).... this makes me think, couldn't be the problem that pgadmin,psql, etc. takes too much time to display all the rows? It seems a client software problem, not a server problem.<br/><br /> My advice is, use EXPLAIN ANALYZE to test both servers performance. If you want to know which the finalresults will be, you test both databases from a self programmed application (java, php, C++, etc.).<br /><br /> Hopethis helped.<br /><br /> Joel Fradkin wrote: <blockquote cite="mid000001c503b7$6ea08800$797ba8c0@jfradkin" type="cite"><prewrap="">Well last evening (did not try it this morning) it was taking the extra time. I have made some adjustmenNots to the config file per a few web sites that you all recommended my looking at. It is now using 137 of 756 meg avail. it is now taking 8 secs to return 22,000 rows (using pgadminIII in a sql edit window). The EXPLAIN ANALYSE still shows the same as below, but the table has 344,000 recs of which only 22636 are clientnum = 'SAKS' I am still doing a seq search (this applies to the view question where if it is a small result set it used a index search but on a larger return set it did a seq search) in my view, but with the adjustments to the kernel I get a result in 140 secs (MSSQL was 135 secs). This is not production, I am still very worried that I have to do all this tweeking to use this, MSSQL worked out of the box as it does (not saying its great, but I never had to adjust a kernel setting etc). Since we cannot afford the 70,000 dollars they want to license it I am not implying I can use MSSQL, but I could look at other DB's like MYSQL, or Firebird, etc. I have a lot of time now (two weeks) in this conversion and do not wish to give up, I will see if I can learn what is needed to get the maximum performance. I have seen much information available and this list has been a huge resource. I really appreciate all the help. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 </pre><blockquote type="cite"><pre wrap="">QUERY PLAN "Seq Scan on tblcase (cost=0.00..30066.21 rows=37401 width=996) (actual time=0.344..962.260 rows=22636 loops=1)" " Filter: ((clientnum)::text = 'SAKS'::text)" "Total runtime: 1034.434 ms" </pre></blockquote><pre wrap=""> That's only 1 second - to return 22,636 rows. Not 27 seconds, as in the original post. You'll never persuade PG to use the index when some 75% of your rows match the filter - it just doesn't make sense. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? <a class="moz-txt-link-freetext" href="http://www.postgresql.org/docs/faq">http://www.postgresql.org/docs/faq</a> </pre></blockquote><br />
The postgres is running on Linux Fedora core 3 (production will be redhat on Dell 4 proc 8 gig box). My client pgadminIII is running on XP. Sorry I was not clearer on this. I am playing with the settings now, I got it to return in 100 secs (the view that is that took 135 on MSSQL). My testing is using identical Dell desktops for the MSSQL and the Linux, with a third machine for the clients. I do not mind getting up to speed on the proper setting to optimize the hardware, I am worried that as production environment can be somewhat dynamic that I will have issues getting a optimized environment and that it will work for our needs. My whole reason for being here is that our duel proc production MSSQL server is just no longer keeping up with the demand, so it is important that whatever I implement is going to up to the challenge. I am still convinced Postgres was the correct choice, especially with all the guidance I have been able to get here. 100 seconds will be fine compared to the 135 of MSSQL, I just was getting worse responses before adjusting. At the moment I think I went too far as I see it using swap and going slower, but it never used much of the 756 meg (137 max was all I ever saw it use). I guess the swap buffers and cache are the important settings (least that seems to be what is affecting the memory). Not sure exactly what would cause it to use seq vrs index, but I will try the force and see if it helps the speed. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, IncThis email message is for the use of the intended recipient(s) andmay contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: Richard Huxton [mailto:dev@archonet.com] Sent: Wednesday, January 26, 2005 10:21 AM To: Joel Fradkin Cc: gsstark@mit.edu; pgsql-sql@postgresql.org Subject: Re: [SQL] same question little different test MSSQL vrs Postgres Joel Fradkin wrote: > Well last evening (did not try it this morning) it was taking the extra > time. > > I have made some adjustments to the config file per a few web sites that you > all recommended my looking at. The crucial one I'd say is the performance guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The first half-dozen settings are the crucial ones. > It is now using 137 of 756 meg avail. > it is now taking 8 secs to return 22,000 rows (using pgadminIII in a sql > edit window). That might be too much RAM. Don't forget PG likes to work with your operating-system (unlike many other DBs). Make sure Windows is using enough RAM to cache diskspace. I'm curious as to how this takes 8secs whereas you had 1 second earlier. Are you sure some of this isn't pgadmin's overhead to display the rows? > The EXPLAIN ANALYSE still shows the same as below, but the table has 344,000 > recs of which only 22636 are clientnum = 'SAKS' That sounds like it's about the borderline between using an index and not (depending on cache-size, disk speeds etc). > I am still doing a seq search (this applies to the view question where if it > is a small result set it used a index search but on a larger return set it > did a seq search) in my view, but with the adjustments to the kernel I get a > result in 140 secs (MSSQL was 135 secs). If you want to check whether the index would help, try issuing the following before running your query: SET ENABLE_SEQSCAN=FALSE; This will force PG to use any index it can regardless of whether it thinks it will help. > This is not production, I am still very worried that I have to do all this > tweeking to use this, MSSQL worked out of the box as it does (not saying its > great, but I never had to adjust a kernel setting etc). Since we cannot > afford the 70,000 dollars they want to license it I am not implying I can > use MSSQL, but I could look at other DB's like MYSQL, or Firebird, etc. I'm a little curious what kernel settings you are changing on Windows. I wasn't aware there was much to be done there. I'm afraid you do have to change half a dozen settings in postgresql.conf to match your workload, but PG runs on a much wider range of machines than MSSQL so it's difficult to come up with a "reasonable" default. Takes me about 5 minutes when I setup an installation to make sure the figures are reasonable (rather than the best they can be). > I have a lot of time now (two weeks) in this conversion and do not wish to > give up, I will see if I can learn what is needed to get the maximum > performance. I have seen much information available and this list has been a > huge resource. I really appreciate all the help. -- Richard Huxton Archonet Ltd
"Joel Fradkin" <jfradkin@wazagua.com> writes: > QUERY PLAN > "Seq Scan on tblcase (cost=0.00..30066.21 rows=37401 width=996) (actual > time=0.344..962.260 rows=22636 loops=1)" > " Filter: ((clientnum)::text = 'SAKS'::text)" > "Total runtime: 1034.434 ms" Well that says it only took 1s. So it seems this is highly dependent on whether the data is in cache. Perhaps it was in cache on MSSQL when you profiled it there and not on postgres? You could put an index on clientnum, but if the data is usually in cache like this it might not even be necessary. -- greg
Joel Fradkin wrote: > The postgres is running on Linux Fedora core 3 (production will be redhat on > Dell 4 proc 8 gig box). > > My client pgadminIII is running on XP. > > Sorry I was not clearer on this. Ah! you're the gent who had the problems with SE-Linux on Fedora 3. Sorry - should have made the connection, but there's so much traffic on the lists it's easy to miss. > I am playing with the settings now, I got it to return in 100 secs (the view > that is that took 135 on MSSQL). My testing is using identical Dell desktops > for the MSSQL and the Linux, with a third machine for the clients. > > I do not mind getting up to speed on the proper setting to optimize the > hardware, I am worried that as production environment can be somewhat > dynamic that I will have issues getting a optimized environment and that it > will work for our needs. My whole reason for being here is that our duel > proc production MSSQL server is just no longer keeping up with the demand, > so it is important that whatever I implement is going to up to the > challenge. You might want to look at the overall design of the database at some point too. Also, don't forget the compromises you made when designing for MSSQL might not be useful (or even harmful) with PG. > I am still convinced Postgres was the correct choice, especially > with all the guidance I have been able to get here. 100 seconds will be fine > compared to the 135 of MSSQL, I just was getting worse responses before > adjusting. At the moment I think I went too far as I see it using swap and > going slower, but it never used much of the 756 meg (137 max was all I ever > saw it use). If you're on Linux then 135MB sounds like too much (for one client, far too much). > I guess the swap buffers and cache are the important settings (least that > seems to be what is affecting the memory). Not sure exactly what would cause > it to use seq vrs index, but I will try the force and see if it helps the > speed. Try starting with your shared-buffers at say 4000-8000 (32MB to 64MB), sort-mem/work-mem at 8000-32000 (8MB-32MB), random-page-cost somewhere between 2 and 4. Then, judge how much RAM your box is using to cache disk-space (free -m) and set effective-cache-size accordingly. That's it - you may want to play around with the figures slightly, but pick the lowest numbers above and restart PG and it'll run OK. -- Richard Huxton Archonet Ltd
I tried the SET ENABLE_SEQSCAN=FALSE; And the result took 29 secs instead of 117. After playing around with the cache and buffers etc I see I am no longer doing any swapping (not sure how I got the 100 sec response might have been shared buffers set higher, been goofing around with it all morning). My worry here is it should obviously use an index scan so something is not setup correctly yet. I don't want to second guess the analyzer (or is this a normal thing?) Least it is blowing the doors off MSSQL (which is what I touted to my boss and was pretty upset when I got no result last night). The 117 was before I forced the seq off so even doing a seq I am getting results now that are better then MSSQL. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, IncThis email message is for the use of the intended recipient(s) andmay contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: Richard Huxton [mailto:dev@archonet.com] Sent: Wednesday, January 26, 2005 10:21 AM To: Joel Fradkin Cc: gsstark@mit.edu; pgsql-sql@postgresql.org Subject: Re: [SQL] same question little different test MSSQL vrs Postgres Joel Fradkin wrote: > Well last evening (did not try it this morning) it was taking the extra > time. > > I have made some adjustments to the config file per a few web sites that you > all recommended my looking at. The crucial one I'd say is the performance guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The first half-dozen settings are the crucial ones. > It is now using 137 of 756 meg avail. > it is now taking 8 secs to return 22,000 rows (using pgadminIII in a sql > edit window). That might be too much RAM. Don't forget PG likes to work with your operating-system (unlike many other DBs). Make sure Windows is using enough RAM to cache diskspace. I'm curious as to how this takes 8secs whereas you had 1 second earlier. Are you sure some of this isn't pgadmin's overhead to display the rows? > The EXPLAIN ANALYSE still shows the same as below, but the table has 344,000 > recs of which only 22636 are clientnum = 'SAKS' That sounds like it's about the borderline between using an index and not (depending on cache-size, disk speeds etc). > I am still doing a seq search (this applies to the view question where if it > is a small result set it used a index search but on a larger return set it > did a seq search) in my view, but with the adjustments to the kernel I get a > result in 140 secs (MSSQL was 135 secs). If you want to check whether the index would help, try issuing the following before running your query: SET ENABLE_SEQSCAN=FALSE; This will force PG to use any index it can regardless of whether it thinks it will help. > This is not production, I am still very worried that I have to do all this > tweeking to use this, MSSQL worked out of the box as it does (not saying its > great, but I never had to adjust a kernel setting etc). Since we cannot > afford the 70,000 dollars they want to license it I am not implying I can > use MSSQL, but I could look at other DB's like MYSQL, or Firebird, etc. I'm a little curious what kernel settings you are changing on Windows. I wasn't aware there was much to be done there. I'm afraid you do have to change half a dozen settings in postgresql.conf to match your workload, but PG runs on a much wider range of machines than MSSQL so it's difficult to come up with a "reasonable" default. Takes me about 5 minutes when I setup an installation to make sure the figures are reasonable (rather than the best they can be). > I have a lot of time now (two weeks) in this conversion and do not wish to > give up, I will see if I can learn what is needed to get the maximum > performance. I have seen much information available and this list has been a > huge resource. I really appreciate all the help. -- Richard Huxton Archonet Ltd
"Joel Fradkin" <jfradkin@wazagua.com> writes: > I tried the SET ENABLE_SEQSCAN=FALSE; > And the result took 29 secs instead of 117. > > After playing around with the cache and buffers etc I see I am no longer > doing any swapping (not sure how I got the 100 sec response might have been > shared buffers set higher, been goofing around with it all morning). If it's swapping you're definitely going to get bad results. You really want the *majority* of RAM left free for the OS to cache disk data. > My worry here is it should obviously use an index scan so something is not > setup correctly yet. I don't want to second guess the analyzer (or is this a > normal thing?) No that's not obvious. 22k out of 344k is a selectivity of 6.6% which is probably about borderline. The optimizer is estimating even worse at 10.9% which isn't far off but puts it well out of the range for an index scan. If you really want to get postgres using an index scan you'll have to a) improve the estimate using "alter table tblcase alter column clientnum set statistics" to raise the statistics target for that column and reanalyze. And b) lower random_page_cost. random_page_cost tells postgres how much slower indexes are than table scans and at the default setting it accurately represents most disk hardware. If your database fits within RAM and is often cached then you might have to lower it to model that fact. But you shouldn't do it based on a single query like this. -- greg
Thank you I will look at that info. I did do an EXPLAIN ANALYSE on the view and could see it was doing the seq scan on 3 fields, so I did an index for the three fields and it then chose an index scan and ran in 27 seconds. I also did adjust my defaults to much smaller numbers on shared buffers (per the tidbits page recommendation like 8 meg for my memory size). I looked at http://www.desknow.com/kb/idx/0/061/article/ which recommended doing a vacuum verbose to determine the exact max_fsm_pages and I set the cache to use 25% of my available memory per the recommendation on tid bits. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, IncThis email message is for the use of the intended recipient(s) andmay contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: gsstark@mit.edu [mailto:gsstark@mit.edu] Sent: Wednesday, January 26, 2005 11:50 AM To: Joel Fradkin Cc: 'Richard Huxton'; gsstark@mit.edu; pgsql-sql@postgresql.org; ac@wazagua.com; Steve Goldsmith Subject: Re: [SQL] same question little different test MSSQL vrs Postgres "Joel Fradkin" <jfradkin@wazagua.com> writes: > I tried the SET ENABLE_SEQSCAN=FALSE; > And the result took 29 secs instead of 117. > > After playing around with the cache and buffers etc I see I am no longer > doing any swapping (not sure how I got the 100 sec response might have been > shared buffers set higher, been goofing around with it all morning). If it's swapping you're definitely going to get bad results. You really want the *majority* of RAM left free for the OS to cache disk data. > My worry here is it should obviously use an index scan so something is not > setup correctly yet. I don't want to second guess the analyzer (or is this a > normal thing?) No that's not obvious. 22k out of 344k is a selectivity of 6.6% which is probably about borderline. The optimizer is estimating even worse at 10.9% which isn't far off but puts it well out of the range for an index scan. If you really want to get postgres using an index scan you'll have to a) improve the estimate using "alter table tblcase alter column clientnum set statistics" to raise the statistics target for that column and reanalyze. And b) lower random_page_cost. random_page_cost tells postgres how much slower indexes are than table scans and at the default setting it accurately represents most disk hardware. If your database fits within RAM and is often cached then you might have to lower it to model that fact. But you shouldn't do it based on a single query like this. -- greg
Maybe you should tweak the cpu_index_tuple_cost parameter instead of disabling sequential scans. De default value is 0.001,you should change it to a lower value (0.0005 or something).<br /><br /> Joel Fradkin wrote: <blockquote cite="mid000001c503c4$816b1880$797ba8c0@jfradkin"type="cite"><pre wrap="">I tried the SET ENABLE_SEQSCAN=FALSE; And the result took 29 secs instead of 117. After playing around with the cache and buffers etc I see I am no longer doing any swapping (not sure how I got the 100 sec response might have been shared buffers set higher, been goofing around with it all morning). My worry here is it should obviously use an index scan so something is not setup correctly yet. I don't want to second guess the analyzer (or is this a normal thing?) Least it is blowing the doors off MSSQL (which is what I touted to my boss and was pretty upset when I got no result last night). The 117 was before I forced the seq off so even doing a seq I am getting results now that are better then MSSQL. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 <a class="moz-txt-link-abbreviated" href="mailto:jfradkin@wazagua.com">jfradkin@wazagua.com</a> <a class="moz-txt-link-abbreviated" href="http://www.wazagua.com">www.wazagua.com</a> Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, IncThis email message is for the use of the intended recipient(s) andmay contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: Richard Huxton [<a class="moz-txt-link-freetext" href="mailto:dev@archonet.com">mailto:dev@archonet.com</a>] Sent: Wednesday, January 26, 2005 10:21 AM To: Joel Fradkin Cc: <a class="moz-txt-link-abbreviated" href="mailto:gsstark@mit.edu">gsstark@mit.edu</a>; <a class="moz-txt-link-abbreviated"href="mailto:pgsql-sql@postgresql.org">pgsql-sql@postgresql.org</a> Subject: Re: [SQL] same question little different test MSSQL vrs Postgres Joel Fradkin wrote: </pre><blockquote type="cite"><pre wrap="">Well last evening (did not try it this morning) it was takingthe extra time. I have made some adjustments to the config file per a few web sites that </pre></blockquote><pre wrap="">you </pre><blockquotetype="cite"><pre wrap="">all recommended my looking at. </pre></blockquote><pre wrap=""> The crucial one I'd say is the performance guide at: <a class="moz-txt-link-freetext" href="http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php">http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php</a> The first half-dozen settings are the crucial ones. </pre><blockquote type="cite"><pre wrap="">It is now using 137 of 756 meg avail. it is now taking 8 secs to return 22,000 rows (using pgadminIII in a sql edit window). </pre></blockquote><pre wrap=""> That might be too much RAM. Don't forget PG likes to work with your operating-system (unlike many other DBs). Make sure Windows is using enough RAM to cache diskspace. I'm curious as to how this takes 8secs whereas you had 1 second earlier. Are you sure some of this isn't pgadmin's overhead to display the rows? </pre><blockquote type="cite"><pre wrap="">The EXPLAIN ANALYSE still shows the same as below, but the table has </pre></blockquote><prewrap="">344,000 </pre><blockquote type="cite"><pre wrap="">recs of which only 22636 are clientnum= 'SAKS' </pre></blockquote><pre wrap=""> That sounds like it's about the borderline between using an index and not (depending on cache-size, disk speeds etc). </pre><blockquote type="cite"><pre wrap="">I am still doing a seq search (this applies to the view question where if </pre></blockquote><prewrap="">it </pre><blockquote type="cite"><pre wrap="">is a small result set it used a index searchbut on a larger return set it did a seq search) in my view, but with the adjustments to the kernel I get </pre></blockquote><pre wrap="">a </pre><blockquotetype="cite"><pre wrap="">result in 140 secs (MSSQL was 135 secs). </pre></blockquote><pre wrap=""> If you want to check whether the index would help, try issuing the following before running your query: SET ENABLE_SEQSCAN=FALSE; This will force PG to use any index it can regardless of whether it thinks it will help. </pre><blockquote type="cite"><pre wrap="">This is not production, I am still very worried that I have to do all this tweeking to use this, MSSQL worked out of the box as it does (not saying </pre></blockquote><pre wrap="">its </pre><blockquotetype="cite"><pre wrap="">great, but I never had to adjust a kernel setting etc). Since we cannot afford the 70,000 dollars they want to license it I am not implying I can use MSSQL, but I could look at other DB's like MYSQL, or Firebird, etc. </pre></blockquote><pre wrap=""> I'm a little curious what kernel settings you are changing on Windows. I wasn't aware there was much to be done there. I'm afraid you do have to change half a dozen settings in postgresql.conf to match your workload, but PG runs on a much wider range of machines than MSSQL so it's difficult to come up with a "reasonable" default. Takes me about 5 minutes when I setup an installation to make sure the figures are reasonable (rather than the best they can be). </pre><blockquote type="cite"><pre wrap="">I have a lot of time now (two weeks) in this conversion and do not wish to give up, I will see if I can learn what is needed to get the maximum performance. I have seen much information available and this list has been </pre></blockquote><pre wrap="">a </pre><blockquotetype="cite"><pre wrap="">huge resource. I really appreciate all the help. </pre></blockquote><pre wrap=""> -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? <a class="moz-txt-link-freetext" href="http://www.postgresql.org/docs/faq">http://www.postgresql.org/docs/faq</a> </pre></blockquote><br />
Joel Fradkin wrote: > Thank you I will look at that info. > I did do an EXPLAIN ANALYSE on the view and could see it was doing the seq > scan on 3 fields, so I did an index for the three fields and it then chose > an index scan and ran in 27 seconds. > > I also did adjust my defaults to much smaller numbers on shared buffers (per > the tidbits page recommendation like 8 meg for my memory size). I looked at > http://www.desknow.com/kb/idx/0/061/article/ which recommended doing a > vacuum verbose to determine the exact max_fsm_pages and I set the cache to > use 25% of my available memory per the recommendation on tid bits. Note that the effective_cache_size (if I've spelt it right) just tells PG what your cache size is. You should set it based on what "free" tells you about your system's use of memory. -- Richard Huxton Archonet Ltd
On Tue, Jan 25, 2005 at 21:21:08 -0700, Dennis Sacks <dennis@illusions.com> wrote: > > One of the things you'll want to do regularly is run a "vacuum analyze". > You can read up on this in the postgresql docs. This is essential to the > indexes being used properly. At a bare minimum, after you import a large > amount of data, you'll want to run vacuum analyze. Note that there is no need to vacuum after inserts (only updates and deletes), so you can just do an analyze in that case.
Bruno Wolff III wrote: >On Tue, Jan 25, 2005 at 21:21:08 -0700, > Dennis Sacks <dennis@illusions.com> wrote: > > >>One of the things you'll want to do regularly is run a "vacuum analyze". >>You can read up on this in the postgresql docs. This is essential to the >>indexes being used properly. At a bare minimum, after you import a large >>amount of data, you'll want to run vacuum analyze. >> >> > >Note that there is no need to vacuum after inserts (only updates and deletes), >so you can just do an analyze in that case. > > Good point! Analyze after bulk inserts, vacuum analyze after updates/deletes and inserts. :) Dennis Sacks
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We've done some pretty extensive benchmarking and load testing on a couple of platforms including the Xeon and Opteron. You may have already bought that Dell box, but I'll say it anyway. Xeon quad processors are a terrible platform for postgres. Trying to use more than 4GB of memory on a 32 bit machine is a waste of money. If you want performance, get a quad Opteron with the same amount of memory. I guarantee you'll see at least an order of magnitude performance improvement and substantially more under highly concurrent loads. If you decide to go this way, HP sells a very nice box. I also strongly recommend you investigate SuSE instead of RedHat. Fedora core is good technology, but SuSE offers equally good technology with better support. Also make sure that your SCSI HBA is actually using the 64 bit PCI bus. There are cards out there which plug into 64 bit PCI but only actually address 32 bits (Qlogic's QLA2340 / 2342 for example). You make no mention of the disk subsystem you plan to use. This is most critical part of your system. Database performance is almost always bound by IO. Usually disk IO. Briefly, put PGDATA on the widest RAID 10 array of disks you can manage. It's not worth spending the extra money to get 15kRPM disks for this. The size of the disks involved is pretty much irrelevant, only the number of them matters. Put the WAL files on a dedicated RAID 1 pair of 15kRPM disks. Put the postgres log files (or syslog) on a seperate filesystem. - -- Andrew Hammond 416-673-4138 ahammond@ca.afilias.info Database Administrator, Afilias Canada Corp. CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A Joel Fradkin wrote: | The postgres is running on Linux Fedora core 3 (production will be redhat on | Dell 4 proc 8 gig box). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFB+BaPgfzn5SevSpoRAgirAKDBbedScL3leQVidZjmsGmxoph8wQCgvhoW 2ZznEkxOMA3btZEBdzHd8TU= =eg7h -----END PGP SIGNATURE-----
Now you tell me. We had a fellow working here kept screaming AMD, but I am a very paranoid person and was not aware Linux and Postgres have been running on the new chips. I don't like to be a first. We have bought the Dell and I cant tell you if the controller uses 64bits, I just got what they had on their page for their 4 proc rack mount. Part of my reason for going Dell was we already have Dell equipment and the Linux support is offered from Dell as well, so I have one vendor to worry about. Being a developer and Director of IT I want the fastest best, but sometimes I flavor my opinions with safest and easiest. The RPM delivery is something I understand (it's easy). What is SU like? Is there any difference in the performance between the two Vendors? I am sure we will be buying more Postgres servers in the near future (One of the big reasons we are taking the time to convert from MSSQL was so we could afford to invest in more servers MSSQL was cost prohibitive even for one server). As easy as Fedura was I still had several issues getting to where I am now, so I am paranoid of something that requires even more knowledge to pull it off; that being said I never minded getting into the details to get a better end result. As you said we have made the investment in the Dell (25K). I feel pretty stupid if it is as you say a waste of money to get 8 gigs on this platform as I just made that same mistake a year ago when I bought the 2 processor boxes with standard addition MSSQL and 4 gigs (It only uses 2 gig). I was under the impression this machine would utilize all 8 gigs. Are you saying only 4 will be available for caching etc, or just the chipset cant deal with numbers 8 gig and will be slower to access them? If it is the later then I would imagine it would still outperform a similar box with 4 gig assuming my demand on cache is larger then 4 gig. Just to confirm you have these quad Opteron (I am assuming a 4 processor config?) in a production environment running su and postgres with hardware support from HP and software from su? You indicate three separate physical drives will give best performance (one for data 10K speeds, one for admin, one for wall 15 speed)? I am not too sophisticated at knowing how to irder this arrangement and set it up in Linux, any chance you could detail (1 card with 2 channels 4 10k drives on one channel, 2 15k drives on the second, do I need another channel and drive(s) for admin files?), drive layout when installing config in postgres to utilize? If need be maybe we can get you to do this as a consultant as I do understand how important the hardware and the proper config is. I found out too late with MSSQL that I should have used two seprate drive arrays, one for data, one for log (this would have required the split back plane). So not to plug a specific vendor but if you have production environment example with real equipment suggestions I would be very appreciative. I know that's a lot to ask so if you don't have time that's cool, thanks so much for bringing this up so that my next purchase I will seriously look at quad Opteron technology if it is a tried and true solution for this OS and Postgres. Joel Fradkin -----Original Message----- From: Andrew Hammond [mailto:ahammond@ca.afilias.info] Sent: Wednesday, January 26, 2005 5:16 PM To: Joel Fradkin Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] same question little different test MSSQL vrs Postgres -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We've done some pretty extensive benchmarking and load testing on a couple of platforms including the Xeon and Opteron. You may have already bought that Dell box, but I'll say it anyway. Xeon quad processors are a terrible platform for postgres. Trying to use more than 4GB of memory on a 32 bit machine is a waste of money. If you want performance, get a quad Opteron with the same amount of memory. I guarantee you'll see at least an order of magnitude performance improvement and substantially more under highly concurrent loads. If you decide to go this way, HP sells a very nice box. I also strongly recommend you investigate SuSE instead of RedHat. Fedora core is good technology, but SuSE offers equally good technology with better support. Also make sure that your SCSI HBA is actually using the 64 bit PCI bus. There are cards out there which plug into 64 bit PCI but only actually address 32 bits (Qlogic's QLA2340 / 2342 for example). You make no mention of the disk subsystem you plan to use. This is most critical part of your system. Database performance is almost always bound by IO. Usually disk IO. Briefly, put PGDATA on the widest RAID 10 array of disks you can manage. It's not worth spending the extra money to get 15kRPM disks for this. The size of the disks involved is pretty much irrelevant, only the number of them matters. Put the WAL files on a dedicated RAID 1 pair of 15kRPM disks. Put the postgres log files (or syslog) on a seperate filesystem. - -- Andrew Hammond 416-673-4138 ahammond@ca.afilias.info Database Administrator, Afilias Canada Corp. CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A Joel Fradkin wrote: | The postgres is running on Linux Fedora core 3 (production will be redhat on | Dell 4 proc 8 gig box). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFB+BaPgfzn5SevSpoRAgirAKDBbedScL3leQVidZjmsGmxoph8wQCgvhoW 2ZznEkxOMA3btZEBdzHd8TU= =eg7h -----END PGP SIGNATURE-----
Quoting Dennis Sacks <dennis@illusions.com>: > Bruno Wolff III wrote: > >On Tue, Jan 25, 2005 at 21:21:08 -0700, > > Dennis Sacks <dennis@illusions.com> wrote: > >>One of the things you'll want to do regularly is run a "vacuum analyze". > >>You can read up on this in the postgresql docs. This is essential to the > >>indexes being used properly. At a bare minimum, after you import a large > >>amount of data, you'll want to run vacuum analyze. > > > Good point! Analyze after bulk inserts, vacuum analyze after > updates/deletes and inserts. :) Hmmm ... in performance tests of bulk inserts into a table with six indexes, I found that without vacuum analyze (rather than analyze), insertion slowed down albeit something less than linearly. Table of 6M rows, about 3GB (including index files). This is 7.4.1 on SuSE Linux, RAID5 Xeon(sigh) 2.8GHz 4GB nothing else running. The inserts were always done with an existing record check (LEFT JOIN ... WHERE joinkey IS NULL). -- "Dreams come true, not free."