Thread: Compared MS SQL 2000 to Postgresql 9.0 on Windows
We are in the process of deciding on how to proceed on a database upgrade. We currently have MS SQL 2000 running on Windows 2003 (on my test server). I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license). I started comparing DB’s and came across postgresql. It seemed to be exactly what I was after. All of our programming is in ASP.net. Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern.
I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are running on Windows 2003. What I found was quite surprising and I am wondering if anyone can point out what is going on here.
Here is the test I ran.
I created 2 tables, the main table had 5 fields with a serial ID field. The second table linked to table 1 for a state field.
I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete.
Next I had ASP.net via Postgresql create 1,000 records. Took .65625 seconds.
Postgresql smoked MS SQL server on that test.
Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out.
MS SQL took 0.76 seconds to display
select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid
Then I did the same test via Postgresql and it took 8.85 seconds! I tried it again as I thought I did something wrong. I did a few tweaks such as increasing the shared buffers. Still the best I could get it to was 7.5 seconds. This is insanely slow compared to MSSQL 2000. What am I missing. Here is my SQL statement for postgresql:
select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid
Any ideas on why the Postgres server is soooo much slower on the joins? I am trying to understand what is going on here so please don’t flame me. Any advice is appreciated.
Thanks,
Tom Polak
Rockford Area Association of Realtors
The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you.
Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results.
Attachment
Tom Polak <tom@rockfordarearealtors.org> wrote: > the best I could get it to was 7.5 seconds. > select name,address,city,state,statename,stateid,other from > pgtemp1 left join pgtemp2 on state=stateid We'd need a lot more information. Please read this and post again: http://wiki.postgresql.org/wiki/SlowQueryQuestions Be sure to include hardware info, postgresql.conf settings (excluding comments), table layouts including indexes and constraints, and the results of: EXPLAIN ANALYZE select ... -Kevin
On 12/7/2010 11:34 AM, Tom Polak wrote: > We are in the process of deciding on how to proceed on a database > upgrade. We currently have MS SQL 2000 running on Windows 2003 (on my > test server). I was shocked at the cost for MS SQL 2008 R2 for a new > server (2 CPU license). I started comparing DB’s and came across > postgresql. It seemed to be exactly what I was after. All of our > programming is in ASP.net. Since I am running MSSQL 2000 I have no > benefit for .Net integration, so it is not a concern. > > I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are > running on Windows 2003. What I found was quite surprising and I am > wondering if anyone can point out what is going on here. > Here is the test I ran. > I created 2 tables, the main table had 5 fields with a serial ID field. > The second table linked to table 1 for a state field. > > I had ASP.net via MSSQL create 1,000 records in the main table. Took > 9.85 seconds to complete. > Next I had ASP.net via Postgresql create 1,000 records. Took .65625 > seconds. > Postgresql smoked MS SQL server on that test. did you play with the postgresql.conf file? Maybe turn off fsync? I'd guess the above is mssql is flushing to disk while PG isnt. > > Next test is to use ASP.net and join all 1,000 rows with table 2 and > then display the text out. > > MS SQL took 0.76 seconds to display > select name,address,city,state,statename,stateid,other from pgtemp1 left > join pgtemp2 on state=stateid > > Then I did the same test via Postgresql and it took 8.85 seconds! I > tried it again as I thought I did something wrong. I did a few tweaks > such as increasing the shared buffers. Still the best I could get it to > was 7.5 seconds. This is insanely slow compared to MSSQL 2000. What am > I missing. Here is my SQL statement for postgresql: > select name,address,city,state,statename,stateid,other from pgtemp1 left > join pgtemp2 on state=stateid > > Any ideas on why the Postgres server is soooo much slower on the > joins? I am trying to understand what is going on here so please don’t > flame me. Any advice is appreciated. > Did you create an index? That'd be my first guess. Also, can you run the sql from the command line client (psql) and see if it takes that long? While your in psql, stick a 'explain analyze' infront of your query, and let's see its output. Also, as a fair warning: mssql doesn't really care about transactions, but PG really does. Make sure all your code is properly starting and commiting transactions. -Andy
On Tuesday 07 December 2010 18:34:25 Tom Polak wrote: > Then I did the same test via Postgresql and it took 8.85 seconds! I tried > it again as I thought I did something wrong. I did a few tweaks such as > increasing the shared buffers. Still the best I could get it to was 7.5 > seconds. This is insanely slow compared to MSSQL 2000. What am I missing. > Here is my SQL statement for postgresql: > select name,address,city,state,statename,stateid,other from pgtemp1 left > join pgtemp2 on state=stateid I think you would at least provide the exact schema and possibly some example data (pg_dump) to get us somewhere. I would suggest you post the output of EXPLAIN ANALYZE $yourquery - that gives us information about how that query was executed. Greetings, Andres
On 12/7/10 9:34 AM, Tom Polak wrote: > We are in the process of deciding on how to proceed on a database upgrade. We currently have MS SQL 2000 running on Windows2003 (on my test server). I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license). I startedcomparing DB’s and came across postgresql. It seemed to be exactly what I was after. All of our programming is inASP.net. Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern. > > I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are running on Windows 2003. What I found was quitesurprising and I am wondering if anyone can point out what is going on here. > Here is the test I ran. > I created 2 tables, the main table had 5 fields with a serial ID field. The second table linked to table 1 for a statefield. Did you run ANALYZE on the database after creating it and loading the data? If not, do it and try again (along with theother suggestions you'll get here). ANALYZE gathers the statistics that allow the planner to do its job. Without statistics,all bets are off. Craig > I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete. > Next I had ASP.net via Postgresql create 1,000 records. Took .65625 seconds. > Postgresql smoked MS SQL server on that test. > > Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out. > > MS SQL took 0.76 seconds to display > select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid > > Then I did the same test via Postgresql and it took 8.85 seconds! I tried it again as I thought I did something wrong. I did a few tweaks such as increasing the shared buffers. Still the best I could get it to was 7.5 seconds. Thisis insanely slow compared to MSSQL 2000. What am I missing. Here is my SQL statement for postgresql: > select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid > > Any ideas on why the Postgres server is soooo much slower on the joins? I am trying to understand what is going on hereso please don’t flame me. Any advice is appreciated. > > *Thanks, > Tom Polak > Rockford Area Association of Realtors > */ > The information contained in this email message is intended only for the use of the individual or entity named. If thereader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intendedrecipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you./ > > /Although this email and any attachments are believed to be free of any viruses or other defects that might affect anycomputer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is freeof viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage thatresults./ >
On 12/7/2010 1:22 PM, Justin Pitts wrote: >> >> Also, as a fair warning: mssql doesn't really care about transactions, but >> PG really does. Make sure all your code is properly starting and commiting >> transactions. >> >> -Andy > > I do not understand that statement. Can you explain it a bit better? In mssql you can write code that connects to the db, fire off updates and inserts, and then disconnects. I believe mssql will keep all your changes, and the transaction stuff is done for you. In PG the first statement you fire off (like an "insert into" for example) will start a transaction. If you dont commit before you disconnect that transaction will be rolled back. Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. There are differences in the way mssql and pg do transactions. mssql uses a transaction log and keeps current data in the table. In mssql if you open a transaction and write a bunch of stuff, the table contains that new stuff. Everyone can see it. (I think default transaction isolation level is read commited). But if you set your isolation level to something with repeatable read, then your program will block and have to wait on every little change to the table. (or, probably page.. I think mssql has page level locking?) anyway, in PG, multiple versions of the same row are kept, and when you open, and keep open a transaction, PG has to keep a version of the row for every change that other people make. So a long lasting transaction could create hundreds of versions of one row. Then when somebody goes to select against that table, it has to scan not only the rows, but every version of every row! So my point is, in PG, use transactions as they were meant to be used, as single atomic operations. Start, do some work, commit. mssql made it easy to ignore transactions by doing it for you. Ignoring transaction in PG will hurt you. you can google MVCC and "postgres idle in transaction" for more. -Andy
On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson <andy@squeakycode.net> wrote: > In PG the first statement you fire off (like an "insert into" for example) > will start a transaction. If you dont commit before you disconnect that > transaction will be rolled back. Even worse, if your program does not > commit, but keeps the connection to the db open, the transaction will stay > open too. Huh - is this new? I always thought that every statement was wrapped in its own transaction unless you explicitly start your own. So you shouldn't need to commit before closing a connection if you never opened a transaction to begin with. -- Regards, Richard Broersma Jr.
On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: > On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson <andy@squeakycode.net> wrote: > > > In PG the first statement you fire off (like an "insert into" for example) > > will start a transaction. ?If you dont commit before you disconnect that > > transaction will be rolled back. ?Even worse, if your program does not > > commit, but keeps the connection to the db open, the transaction will stay > > open too. > > Huh - is this new? I always thought that every statement was wrapped > in its own transaction unless you explicitly start your own. So you > shouldn't need to commit before closing a connection if you never > opened a transaction to begin with. > > > -- > Regards, > Richard Broersma Jr. > The default of autocommit unless explicitly starting a transaction with BEGIN is the normal behavior that I have seen as well. Cheers, Ken
On 07/12/2010 7:43 PM, Andy Colson wrote: > On 12/7/2010 1:22 PM, Justin Pitts wrote: >>> >>> Also, as a fair warning: mssql doesn't really care about >>> transactions, but >>> PG really does. Make sure all your code is properly starting and >>> commiting >>> transactions. >>> >>> -Andy >> >> I do not understand that statement. Can you explain it a bit better? > > In mssql you can write code that connects to the db, fire off updates > and inserts, and then disconnects. I believe mssql will keep all your > changes, and the transaction stuff is done for you. > > In PG the first statement you fire off (like an "insert into" for > example) will start a transaction. If you dont commit before you > disconnect that transaction will be rolled back. Even worse, if your > program does not commit, but keeps the connection to the db open, the > transaction will stay open too. As far as I know both MS SQL and and Postgres work just the same as regards explicit and implicit (autocommit) transactions, only the underlying storage/logging mechanisms are different. Transactions shouldn't make ay real difference to the select/join performance being complained about though. It's already stated that the insert performance of postgres far exceeds SQL Server, which is my experience also. As already suggested, until we see the exact table definitions including indexes etc. there's no real way to tell what the problem is. How many rows are in the second table? It really shouldn't take that much time to read 1000 rows unless you have a bizarrely slow hard disk. It would be nice to eliminate any programmatic or driver influence too. How does the SQL select execute in enterprise manager for mssql and psql or pgadmin for postgres? Cheers, Gary.
Tom Polak wrote: > > We are in the process of deciding on how to proceed on a database > upgrade. We currently have MS SQL 2000 running on Windows 2003 (on my > test server). I was shocked at the cost for MS SQL 2008 R2 for a new > server (2 CPU license). I started comparing DB’s and came across > postgresql. It seemed to be exactly what I was after. All of our > programming is in ASP.net. Since I am running MSSQL 2000 I have no > benefit for .Net integration, so it is not a concern. > > > > I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are > running on Windows 2003. What I found was quite surprising and I am > wondering if anyone can point out what is going on here. > Here is the test I ran. > I created 2 tables, the main table had 5 fields with a serial ID > field. The second table linked to table 1 for a state field. > > I had ASP.net via MSSQL create 1,000 records in the main table. Took > 9.85 seconds to complete. > Next I had ASP.net via Postgresql create 1,000 records. Took .65625 > seconds. > Postgresql smoked MS SQL server on that test. > > > > Next test is to use ASP.net and join all 1,000 rows with table 2 and > then display the text out. > > MS SQL took 0.76 seconds to display > select name,address,city,state,statename,stateid,other from pgtemp1 > left join pgtemp2 on state=stateid > > > > Then I did the same test via Postgresql and it took 8.85 seconds! I > tried it again as I thought I did something wrong. I did a few tweaks > such as increasing the shared buffers. Still the best I could get it > to was 7.5 seconds. This is insanely slow compared to MSSQL 2000. > What am I missing. Here is my SQL statement for postgresql: > select name,address,city,state,statename,stateid,other from pgtemp1 > left join pgtemp2 on state=stateid > > > > Any ideas on why the Postgres server is soooo much slower on the > joins? I am trying to understand what is going on here so please > don’t flame me. Any advice is appreciated. > > > > > > Are all structures the same? Are all indexes the same? What does "explain analyze" tell you? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
On 12/7/2010 2:10 PM, Kenneth Marshall wrote: > On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: >> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson<andy@squeakycode.net> wrote: >> >>> In PG the first statement you fire off (like an "insert into" for example) >>> will start a transaction. ?If you dont commit before you disconnect that >>> transaction will be rolled back. ?Even worse, if your program does not >>> commit, but keeps the connection to the db open, the transaction will stay >>> open too. >> >> Huh - is this new? I always thought that every statement was wrapped >> in its own transaction unless you explicitly start your own. So you >> shouldn't need to commit before closing a connection if you never >> opened a transaction to begin with. >> >> >> -- >> Regards, >> Richard Broersma Jr. >> > > The default of autocommit unless explicitly starting a transaction with > BEGIN is the normal behavior that I have seen as well. > > Cheers, > Ken Crikey! You're right. I need to be more careful with my assumptions. I maintain that people need to be more careful with pg transactions. I've seen several posts about "idle in transaction". But its not as bad as I made out. My confusion comes from the library I use to hit PG, which fires off a "begin" for me, and if I dont explicitly commit, it gets rolled back. sorry, it was confused between framework and PG. -Andy
What I was really after was a quick comparison between the two. I did not create anything special, just the two tables. One table SQL generated the records for me. I did not tweak anything after installing either system. There was a primary key on the ID field of both tables, no indexes though in either system. The second table had 1 record in it. The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. Btw, the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus, http://www.cdw.com/shop/products/default.aspx?EDC=2167810 . That is why I am pursuing this. :) Here is the ASP.net code that I was running Dim starttime As Date = Date.Now Dim endtime As Date Dim reader As NpgsqlDataReader Dim output2 As String = "" Dim oConn As New NpgsqlConnection("Server=192.168.1.5;Port=5432;Userid=postgres;Password=12 345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp") oConn.Open() Dim x As Integer = 0 'For x = 0 To 1000 'uncomment to insert records. 'Dim command As New NpgsqlCommand("insert into pgtemp1(name, address, city, state) values ('Tom" & x & "','123" & x & " main st','rockford',1) ", oConn) 'meant for loop to put in 1,000 records in pgtemp1 table 'Dim command As New NpgsqlCommand("insert into pgtemp2(statename, stateid, other) values ('Illinois',1,'This is a lot of fun') ", oConn) 'only sends 1 record into the table pgtemp2 'command.ExecuteNonQuery() 'Next 'join table and read 1000 rows. Dim command As New NpgsqlCommand("select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid", oConn) reader = command.ExecuteReader() While reader.read() output2 += "<tr><td>" & reader("name") & "</td><td>" & reader("address") & "</td><td>" & reader("city") & "</td><td>" & reader("statename") & "</td><td>" & reader("other") & "</td></tr>" End While oConn.Close() readeroutput.text = "<table><tr><td>Name:</td><td>Address:</td><td>City:</td><td>State</td><td >Other</td></tr>" & output2 & "</table>" endtime = Date.Now Dim runtime As String runtime = endtime.Subtract(starttime).TotalSeconds output.text = starttime.ToString & " " & runtime The SQL is a straight convert from MS SQL code. I did not tweak either system. From EXPLAIN ANALYZE I can see the query ran much faster. "Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual time=0.036..4.679 rows=1001 loops=1)" " Join Filter: (pgtemp1.state = pgtemp2.stateid)" " -> Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) (actual time=0.010..0.764 rows=1001 loops=1)" " -> Materialize (cost=0.00..1.01 rows=1 width=510) (actual time=0.000..0.001 rows=1 loops=1001)" " -> Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) (actual time=0.006..0.008 rows=1 loops=1)" "Total runtime: 5.128 ms" The general question comes down to, can I expect decent perfomance from Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL 2000 since MS 2000 is over 10 years old. Thanks, Tom Polak Rockford Area Association of Realtors 815-395-6776 x203 The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you. Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andy Colson Sent: Tuesday, December 07, 2010 2:23 PM To: Kenneth Marshall Cc: Richard Broersma; Justin Pitts; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows On 12/7/2010 2:10 PM, Kenneth Marshall wrote: > On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: >> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson<andy@squeakycode.net> wrote: >> >>> In PG the first statement you fire off (like an "insert into" for example) >>> will start a transaction. ?If you dont commit before you disconnect that >>> transaction will be rolled back. ?Even worse, if your program does not >>> commit, but keeps the connection to the db open, the transaction will stay >>> open too. >> >> Huh - is this new? I always thought that every statement was wrapped >> in its own transaction unless you explicitly start your own. So you >> shouldn't need to commit before closing a connection if you never >> opened a transaction to begin with. >> >> >> -- >> Regards, >> Richard Broersma Jr. >> > > The default of autocommit unless explicitly starting a transaction with > BEGIN is the normal behavior that I have seen as well. > > Cheers, > Ken Crikey! You're right. I need to be more careful with my assumptions. I maintain that people need to be more careful with pg transactions. I've seen several posts about "idle in transaction". But its not as bad as I made out. My confusion comes from the library I use to hit PG, which fires off a "begin" for me, and if I dont explicitly commit, it gets rolled back. sorry, it was confused between framework and PG. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On 12/7/10 1:29 PM, Tom Polak wrote: > What I was really after was a quick comparison between the two. I did not > create anything special, just the two tables. One table SQL generated the > records for me. I did not tweak anything after installing either system. That's not a valid test. Postgres is NOT intended to be used out of the box. The default parameters aren't useful. > There was a primary key on the ID field of both tables, no indexes though > in either system. The second table had 1 record in it. The hardware it > is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. Btw, > the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus, > http://www.cdw.com/shop/products/default.aspx?EDC=2167810 . That is why I > am pursuing this. :) > > Here is the ASP.net code that I was running > Dim starttime As Date = Date.Now > Dim endtime As Date > Dim reader As NpgsqlDataReader > Dim output2 As String = "" > > > Dim oConn As New > NpgsqlConnection("Server=192.168.1.5;Port=5432;Userid=postgres;Password=12 > 345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin > g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp") > oConn.Open() > Dim x As Integer = 0 > 'For x = 0 To 1000 'uncomment to insert records. > 'Dim command As New NpgsqlCommand("insert into pgtemp1(name, > address, city, state) values ('Tom"& x& "','123"& x& " main > st','rockford',1) ", oConn) 'meant for loop to put in 1,000 records in > pgtemp1 table > 'Dim command As New NpgsqlCommand("insert into pgtemp2(statename, > stateid, other) values ('Illinois',1,'This is a lot of fun') ", oConn) > 'only sends 1 record into the table pgtemp2 > 'command.ExecuteNonQuery() > 'Next You still haven't done an ANALYZE sql statement after filling your tables with data. You should execute "analyze pgtemp1"and "analyze pgtemp2" before you do any performance tests. Otherwise your results are meaningless. Craig > > 'join table and read 1000 rows. > Dim command As New NpgsqlCommand("select > name,address,city,state,statename,stateid,other from pgtemp1 left join > pgtemp2 on state=stateid", oConn) > reader = command.ExecuteReader() > While reader.read() > output2 += "<tr><td>"& reader("name")& "</td><td>"& > reader("address")& "</td><td>"& reader("city")& "</td><td>"& > reader("statename")& "</td><td>"& reader("other")& "</td></tr>" > End While > oConn.Close() > readeroutput.text = > "<table><tr><td>Name:</td><td>Address:</td><td>City:</td><td>State</td><td >> Other</td></tr>"& output2& "</table>" > > endtime = Date.Now > Dim runtime As String > runtime = endtime.Subtract(starttime).TotalSeconds > output.text = starttime.ToString& " "& runtime > > The SQL is a straight convert from MS SQL code. I did not tweak either > system. > >> From EXPLAIN ANALYZE I can see the query ran much faster. > "Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual > time=0.036..4.679 rows=1001 loops=1)" > " Join Filter: (pgtemp1.state = pgtemp2.stateid)" > " -> Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) > (actual time=0.010..0.764 rows=1001 loops=1)" > " -> Materialize (cost=0.00..1.01 rows=1 width=510) (actual > time=0.000..0.001 rows=1 loops=1001)" > " -> Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) > (actual time=0.006..0.008 rows=1 loops=1)" > "Total runtime: 5.128 ms" > > The general question comes down to, can I expect decent perfomance from > Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL > 2000 since MS 2000 is over 10 years old. > > Thanks, > Tom Polak > Rockford Area Association of Realtors > 815-395-6776 x203 > > The information contained in this email message is intended only for the > use of the individual or entity named. If the reader of this email is not > the intended recipient or the employee or agent responsible for delivering > it to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this email is strictly > prohibited. If you have received this email in error, please immediately > notify us by telephone and reply email. Thank you. > > Although this email and any attachments are believed to be free of any > viruses or other defects that might affect any computer system into which > it is received and opened, it is the responsibility of the recipient to > ensure that it is free of viruses, and the Rockford Area Association of > Realtors hereby disclaims any liability for any loss or damage that > results. > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Andy Colson > Sent: Tuesday, December 07, 2010 2:23 PM > To: Kenneth Marshall > Cc: Richard Broersma; Justin Pitts; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows > > On 12/7/2010 2:10 PM, Kenneth Marshall wrote: >> On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: >>> On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson<andy@squeakycode.net> > wrote: >>> >>>> In PG the first statement you fire off (like an "insert into" for > example) >>>> will start a transaction. ?If you dont commit before you disconnect > that >>>> transaction will be rolled back. ?Even worse, if your program does not >>>> commit, but keeps the connection to the db open, the transaction will > stay >>>> open too. >>> >>> Huh - is this new? I always thought that every statement was wrapped >>> in its own transaction unless you explicitly start your own. So you >>> shouldn't need to commit before closing a connection if you never >>> opened a transaction to begin with. >>> >>> >>> -- >>> Regards, >>> Richard Broersma Jr. >>> >> >> The default of autocommit unless explicitly starting a transaction with >> BEGIN is the normal behavior that I have seen as well. >> >> Cheers, >> Ken > > Crikey! You're right. I need to be more careful with my assumptions. > > I maintain that people need to be more careful with pg transactions. > I've seen several posts about "idle in transaction". But its not as bad > as I made out. My confusion comes from the library I use to hit PG, > which fires off a "begin" for me, and if I dont explicitly commit, it > gets rolled back. > > sorry, it was confused between framework and PG. > > -Andy >
On 07/12/2010 9:29 PM, Tom Polak wrote: > > From EXPLAIN ANALYZE I can see the query ran much faster. > "Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual > time=0.036..4.679 rows=1001 loops=1)" > " Join Filter: (pgtemp1.state = pgtemp2.stateid)" > " -> Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) > (actual time=0.010..0.764 rows=1001 loops=1)" > " -> Materialize (cost=0.00..1.01 rows=1 width=510) (actual > time=0.000..0.001 rows=1 loops=1001)" > " -> Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) > (actual time=0.006..0.008 rows=1 loops=1)" > "Total runtime: 5.128 ms" > > The general question comes down to, can I expect decent perfomance from > Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL > 2000 since MS 2000 is over 10 years old. > So postgres actually executed the select in around 5 miiliseconds. Pretty good I would say. The problem therefore lies not with postgres itself, but what is done with the results afterwards? Assuming that this is pure local and therefore no network issues, perhaps there is a performance issue in this case with the Npgsql driver? Someone who knows more about this driver could perhaps shed some light on this? I have used .NET (C#) with postgres before, but only using the odbc driver. Perhaps you could try that instead (using OdbcCommand, OdbcDataReader etc.). I mainly use ruby (jruby) with postgres both under linux and Windows, but I can certainly process 1000 records of similar structure in well under 1 second. Cheers, Gary.
Tom Polak <tom@rockfordarearealtors.org> wrote: > I did not tweak anything after installing either system. PostgreSQL is set up with defaults such that it will start up and run on the most ancient an underpowered system people are likely to have lying around. It is expected that people will tune it for serious production use, although people often run for years before they hit a case where the tuning makes enough of a difference that they do something about it. For guidelines see this page: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server You can get a quick comparison without doing any tuning, but it won't tell you much about how something else compares to PostgreSQL when it is configured for production use. > The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB > of RAM, Raid 5. For comparison, I would set shared_buffers to at least 200 MB, effective_cache_size to 2 to 3 GB, and I would probably drop both seq_page_cost and random_page_cost to 0.1, unless you actually expect to be using a database large enough that the active portion won't be cached. (In that case, a test with tiny tables *really* means nothing, though.) There are other settings that will also help. > "Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) > (actual time=0.036..4.679 rows=1001 loops=1)" > "Total runtime: 5.128 ms" The 0.036 ms is how long it took to produce the first row of the result once it started running, 4.679 ms is the total run time, and 5.128 includes miscellaneous other time, such as planning time. Of course, the EXPLAIN ANALYZE adds some overhead, so the actual run time would normally be faster, and with tuning it might be still faster. > The general question comes down to, can I expect decent perfomance > from Postgresql compared to MSSQL. That has been my experience. There's something about your runtime environment which isn't playing well with PostgreSQL. If it were me, I would make sure that as little of my stack as possible depended on products provided by anyone with an interest in seeing PostgreSQL look bad compared to the alternative. I can think of at least one company with fourteen thousand reasons to do so. -Kevin
> The hardware it > is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. For a database you'd want to consider replacing the RAID1 with a RAID1 (or RAID10). RAID5 is slow for small random updates, which are common in databases. Since you probably have enough harddisks anyway, this won't cost you. Linux or freebsd would also be better choices for postgres rather than windows. Also, as said, your issue looks very much like a problem in the way your application communicates with postgres : if it takes postgres 5 ms to process the query and your application gets the result 8 seconds later, there is a problem. Note that SQL Server probably takes just a few ms for such a simple query, too, so your not really benchmarking SQL server either.
So, I am back on this topic again. I have a related question, but this might be the correct thread (and please let me know that). The boss is pressing the issue because of the cost of MSSQL. What kind of performance can I expect out of Postgres compare to MSSQL? Let's assume that Postgres is running on Cent OS x64 and MSSQL is running on Windows 2008 x64, both are on identical hardware running RAID 5 (for data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs, 24 GB of RAM. I have searched around and I do not see anyone ever really compare the two in terms of performance. I have learned from this thread that Postgres needs a lot of configuration to perform the best. We provide the MLS service to our members. Our data goes back to 1997 and nothing is ever deleted. Here is a general overview of our current MSSQL setup. We have over 10GB of data in a couple of tables (no pictures are stored in SQL server). Our searches do a lot of joins to combine data to display a listing, history, comparables, etc. We probably do 3 or 4 reads for every write in the database. Any comparisons in terms of performance would be great. If not, how can I quickly truly compare the two systems myself without coding everything to work for both? Thoughts? Opinions? Thanks, Tom Polak Rockford Area Association of Realtors 815-395-6776 x203 The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you. Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results. -----Original Message----- From: Pierre C [mailto:lists@peufeu.com] Sent: Wednesday, December 08, 2010 5:36 PM To: pgsql-performance@postgresql.org; Tom Polak Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows > The hardware it > is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. For a database you'd want to consider replacing the RAID1 with a RAID1 (or RAID10). RAID5 is slow for small random updates, which are common in databases. Since you probably have enough harddisks anyway, this won't cost you. Linux or freebsd would also be better choices for postgres rather than windows. Also, as said, your issue looks very much like a problem in the way your application communicates with postgres : if it takes postgres 5 ms to process the query and your application gets the result 8 seconds later, there is a problem. Note that SQL Server probably takes just a few ms for such a simple query, too, so your not really benchmarking SQL server either.
On 12/17/10 9:08 AM, Tom Polak wrote: > So, I am back on this topic again. > I have a related question, but this might be the correct thread (and > please let me know that). The boss is pressing the issue because of the > cost of MSSQL. You need to analyze the total cost of the system. For the price of MSSQL and Windows, you can probably buy a couple morereally nice servers, or one Really Big Server that would walk all over a Windows/MSSQL system of the same total cost(hardware+software). But that said, if Postgres is properly tuned and your application tuned to make good use of Postgres' features, it will comparewell with any modern database. > What kind of performance can I expect out of Postgres compare to MSSQL? > Let's assume that Postgres is running on Cent OS x64 and MSSQL is running > on Windows 2008 x64, both are on identical hardware running RAID 5 (for > data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs, > 24 GB of RAM. RAID5 is a Really Bad Idea for any database. It is S...L...O...W. It does NOT give better redundancy and security; RAID10 with a battery-backed RAID controller card is massively better for performance and just as good for redundancy andsecurity. Craig
On Fri, Dec 17, 2010 at 9:08 AM, Tom Polak <tom@rockfordarearealtors.org> wrote: > Any comparisons in terms of performance would be great. If not, how can I > quickly truly compare the two systems myself without coding everything to > work for both? Thoughts? Opinions? I can only offer anecdotal information. If you strictly have an OLTP workload, with lots of simultaneous connections issuing queries across small chunks of data, then PostgreSQL would be a good match for SQL server. On the other-hand, if some of your work load is OLAP with a few connections issuing complicated queries across large chunks of data, then PostgreSQL will not perform as well as SQL server. SQL server can divide processing load of complicated queries across several processor, while PostgreSQL cannot. So, I guess it depends upon your workload. -- Regards, Richard Broersma Jr.
On Fri, Dec 17, 2010 at 10:08 AM, Tom Polak <tom@rockfordarearealtors.org> wrote: > What kind of performance can I expect out of Postgres compare to MSSQL? You should take any generalizations with a grain of salt. I suggest that you do a POC. > Let's assume that Postgres is running on Cent OS x64 and MSSQL is running > on Windows 2008 x64, both are on identical hardware running RAID 5 (for > data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs, > 24 GB of RAM. RAID-5 = suckage for databases. Things to think about: How big is your data set and how big is your working set? Do you have a raid card? Is it properly configured? -- Rob Wultsch wultsch@gmail.com
On Fri, Dec 17, 2010 at 12:08 PM, Tom Polak <tom@rockfordarearealtors.org> wrote: > What kind of performance can I expect out of Postgres compare to MSSQL? > Let's assume that Postgres is running on Cent OS x64 and MSSQL is running > on Windows 2008 x64, both are on identical hardware running RAID 5 (for > data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs, > 24 GB of RAM. I have searched around and I do not see anyone ever really > compare the two in terms of performance. I have learned from this thread > that Postgres needs a lot of configuration to perform the best. I think this is a pretty difficult question to answer. There are certainly people who are running databases on hardware like that - even databases much bigger than yours - on PostgreSQL - and getting acceptable performance. But it does take some work. In all fairness, I think that if you started on PostgreSQL and moved to MS SQL (or any other product), you'd probably need to make some adjustments going the other direction to get good performance, too. You're not going to compare two major database systems across the board and find that one of them is just twice as fast, across the board. They have different advantages and disadvantages. When you're using one product, you naturally do things in a way that works well for that product, and moving to a different product means starting over. Oh, putting this in a stored procedure was faster on MS SQL, but it's slower on PostgreSQL. Using a view here was terrible on MS SQL, but much faster under PostgreSQL. The real answer here is that anything could be true for your workload, and asking people on a mailing list to guess is a recipe for disappointment. You probably need to do some real benchmarking, and PostgreSQL will be slower at first, and you'll tune it, and it's LIKELY that you'll be able to achieve parity, or close enough that it's worth it to save the $$$. But you won't really know until you try it, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/17/2010 11:08 AM, Tom Polak wrote: > So, I am back on this topic again. > I have a related question, but this might be the correct thread (and > please let me know that). The boss is pressing the issue because of the > cost of MSSQL. > > What kind of performance can I expect out of Postgres compare to MSSQL? > Let's assume that Postgres is running on Cent OS x64 and MSSQL is running > on Windows 2008 x64, both are on identical hardware running RAID 5 (for > data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs, > 24 GB of RAM. I have searched around and I do not see anyone ever really > compare the two in terms of performance. I have learned from this thread > that Postgres needs a lot of configuration to perform the best. > > We provide the MLS service to our members. Our data goes back to 1997 and > nothing is ever deleted. Here is a general overview of our current MSSQL > setup. We have over 10GB of data in a couple of tables (no pictures are > stored in SQL server). Our searches do a lot of joins to combine data to > display a listing, history, comparables, etc. We probably do 3 or 4 reads > for every write in the database. > > Any comparisons in terms of performance would be great. If not, how can I > quickly truly compare the two systems myself without coding everything to > work for both? Thoughts? Opinions? > > Thanks, > Tom Polak > Rockford Area Association of Realtors > 815-395-6776 x203 > > The information contained in this email message is intended only for the > use of the individual or entity named. If the reader of this email is not > the intended recipient or the employee or agent responsible for delivering > it to the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this email is strictly > prohibited. If you have received this email in error, please immediately > notify us by telephone and reply email. Thank you. > > Although this email and any attachments are believed to be free of any > viruses or other defects that might affect any computer system into which > it is received and opened, it is the responsibility of the recipient to > ensure that it is free of viruses, and the Rockford Area Association of > Realtors hereby disclaims any liability for any loss or damage that > results. Most of the time, the database is not the bottle neck. So find the spot where your current database IS the bottleneck. Then write a test that kinda matches that situation. Lets say its 20 people doing an mls lookup at the exact same time, while and update is running in the background to copy in new data. Then write a simple test (I use perl for my simple tests) for both databases. If PG can hold up to your worst case situation, then maybe you'll be alright. Also: Are you pegged right now? Do you have slowness problems? Even if PG is a tad slower, will anybody even notice? Maybe its not worth worrying about? If your database isnt pegging the box, I'd bet you wont even notice a switch. The other's that have answered have sound advice... but I thought I'd say: I'm using raid-5! Gasp! Its true, I'm hosting maps with PostGIS, and the slowest part of the process is the arial imagery, which is HUGE. The database query's sit around 1% of my cpu. I needed the disk space for the imagery. The imagery code uses more cpu that PG does. The database is 98% read, though, so my setup is different that yours. My maps get 100K hits a day. The cpu's never use more than 20%. I'm running on a $350 computer, AMD Dual core, with 4 IDE disks in software raid-5. On Slackware Linux, of course! -Andy
On 12/17/2010 11:37 AM, Robert Haas wrote: > On Fri, Dec 17, 2010 at 12:08 PM, Tom Polak > <tom@rockfordarearealtors.org> wrote: > other direction to get good performance, too. You're not going to > compare two major database systems across the board and find that one > of them is just twice as fast, across the board. They have different > advantages and disadvantages. When you're using one product, you > naturally do things in a way that works well for that product, and > moving to a different product means starting over. Oh, putting this > in a stored procedure was faster on MS SQL, but it's slower on > PostgreSQL. Using a view here was terrible on MS SQL, but much faster > under PostgreSQL. > Yeah, totally agree with that. Every database has its own personality, and you have to work with it. Its way. Dont expect one bit of code to work great on all the different databases. You need 5 different bits of code, one for each database. In the end, can PG be fast? Yes. Very. But only when you treat is as PG. If you try to use PG as if it were mssql, you wont be a happy camper. -Andy
>The real answer here is that anything could be true for your workload, and >asking people on a mailing list to guess is a recipe for disappointment. >You probably need to do some real benchmarking, and PostgreSQL will be >slower at first, and you'll tune it, and it's LIKELY that you'll be able to >achieve parity, or close enough that it's worth it to save the $$$. But >you won't really know until you try it, I think. That is what I am really after. I know that it will be a lot of work, but at $15,000 for MSSQL server that is a lot of man hours. Before I invest a lot of time to do some real benchmarking I need to make sure it would be worth my time. I realize going into this that we will need to change almost everything expect maybe the simplest Select statements. > How big is your data set and how big is your working set? > Do you have a raid card? Is it properly configured? The data set can get large. Just think of a real estate listing. When we display a Full View, EVERYTHING must be pulled from the database. Sometimes we are talking about 75-100 fields if not more. We can have up to 300 members logged (we usually peak at about 30-50 requests per second) in the system at one time doing various tasks. The servers would be running on a RAID hardware solution, so it would all be offloaded from the CPU. I will have to check out RAID 10 for the next server. Thanks for all your help and opinions. Thanks, Tom Polak Rockford Area Association of Realtors The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you. Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results. -----Original Message----- From: Robert Haas [mailto:robertmhaas@gmail.com] Sent: Friday, December 17, 2010 11:38 AM To: Tom Polak Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows On Fri, Dec 17, 2010 at 12:08 PM, Tom Polak <tom@rockfordarearealtors.org> wrote: > What kind of performance can I expect out of Postgres compare to MSSQL? > Let's assume that Postgres is running on Cent OS x64 and MSSQL is running > on Windows 2008 x64, both are on identical hardware running RAID 5 (for > data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs, > 24 GB of RAM. I have searched around and I do not see anyone ever really > compare the two in terms of performance. I have learned from this thread > that Postgres needs a lot of configuration to perform the best. I think this is a pretty difficult question to answer. There are certainly people who are running databases on hardware like that - even databases much bigger than yours - on PostgreSQL - and getting acceptable performance. But it does take some work. In all fairness, I think that if you started on PostgreSQL and moved to MS SQL (or any other product), you'd probably need to make some adjustments going the other direction to get good performance, too. You're not going to compare two major database systems across the board and find that one of them is just twice as fast, across the board. They have different advantages and disadvantages. When you're using one product, you naturally do things in a way that works well for that product, and moving to a different product means starting over. Oh, putting this in a stored procedure was faster on MS SQL, but it's slower on PostgreSQL. Using a view here was terrible on MS SQL, but much faster under PostgreSQL. The real answer here is that anything could be true for your workload, and asking people on a mailing list to guess is a recipe for disappointment. You probably need to do some real benchmarking, and PostgreSQL will be slower at first, and you'll tune it, and it's LIKELY that you'll be able to achieve parity, or close enough that it's worth it to save the $$$. But you won't really know until you try it, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 17, 2010 at 12:49 PM, Tom Polak <tom@rockfordarearealtors.org> wrote: > That is what I am really after. I know that it will be a lot of work, but > at $15,000 for MSSQL server that is a lot of man hours. Before I invest a > lot of time to do some real benchmarking I need to make sure it would be > worth my time. I realize going into this that we will need to change > almost everything expect maybe the simplest Select statements. I doubt it will be as bad as all that. I think you'll need to spend some time getting the database configured properly (you can ask for help here, or buy support) and then I'd guess that much of it will just work. 60%? 80%? 95%? And then there will be some number of problem cases that you'll need to spend time beating into submission. I've had really good luck with PG over the years, and in fact switched to it originally because I was having problems with another database and when I switched to PG they just... went away. Now your data set is bigger than the ones I've worked with, so that tends to make things a bit more complicated, but the important thing is to have some patience and don't assume that any problems you run into are insoluble. They probably aren't. Run EXPLAIN ANALYZE a lot, read the documentation, ask questions, and if all else fails pay somebody a few bucks to help you get through it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 17, 2010 at 10:32 AM, Craig James <craig_james@emolecules.com> wrote: > RAID5 is a Really Bad Idea for any database. It is S...L...O...W. It does > NOT give better redundancy and security; RAID 10 with a battery-backed RAID > controller card is massively better for performance and just as good for > redundancy and security. The real performance problem with RAID 5 won't show up until a drive dies and it starts rebuilding, at which point it's then WAYYYY slower, and while it's rebuilding you don't have redundancy. If you HAVE to use stripes with redundancy, use RAID-6. It's no faster when working right, but with a single dead drive it's still ok on performance and can rebuild at leisure since there's till redundancy in the system. But really if you're running a db on anything other than RAID-10 you need to reassess your priorities.
Hello Scott! Fri, 17 Dec 2010 19:06:15 -0700, you wrote: > On Fri, Dec 17, 2010 at 10:32 AM, Craig James > <craig_james@emolecules.com> wrote: >> RAID5 is a Really Bad Idea for any database. It is S...L...O...W. It does >> NOT give better redundancy and security; RAID 10 with a battery-backed RAID >> controller card is massively better for performance and just as good for >> redundancy and security. > The real performance problem with RAID 5 won't show up until a drive > dies and it starts rebuilding I don't agree with that. RAID5 is very slow for random writes, since it needs to : 1. Read a copy of the old sector you are writing (usually in cache, but not always) ; 2. Read a copy of the parity sector conresponding to it ; 3. Recompute the parity ; 4. Write the new data on the sector you are writing ; 5. Write the new parity data. Operation 3. is fast, but that's still 2 reads and 2 writes for writing a sector, and the writes have to come after the reads, so it can't even be fully parallelised. And if the database has heavy indexes, any INSERT/UPDATE will trigger random writes to update the indexes. Increasing checkpointing interval can group some of the random writes, but they'll still occur. A RAID controller with a lot of cache can mitigate the random write slowness, but with enough random writes, the cache will be saturated anyway. As other people commented, RAID10 is much more efficient for databases, even if it "costs" a bit more (if you put 4 disks in RAID10, you've 2x the capacity of one disk, if you put them in RAID5 you've 3x the capacity of one disk). -- Gaël Le Mignot - gael@pilotsystems.net Pilot Systems - 9, rue Desargues - 75011 Paris Tel : +33 1 44 53 05 55 - www.pilotsystems.net Gérez vos contacts et vos newsletters : www.cockpit-mailing.com
2010/12/18 Gael Le Mignot <gael@pilotsystems.net>: > Hello Scott! > > Fri, 17 Dec 2010 19:06:15 -0700, you wrote: > > > On Fri, Dec 17, 2010 at 10:32 AM, Craig James > > <craig_james@emolecules.com> wrote: > >> RAID5 is a Really Bad Idea for any database. It is S...L...O...W. It does > >> NOT give better redundancy and security; RAID 10 with a battery-backed RAID > >> controller card is massively better for performance and just as good for > >> redundancy and security. > > > The real performance problem with RAID 5 won't show up until a drive > > dies and it starts rebuilding > > I don't agree with that. RAID5 is very slow for random writes, since > it needs to : Trust me I'm well aware of how bad RAID 5 is for write performance. But as bad as that is, when the array is degraded it's 100 times worse. For a lot of workloads, the meh-grade performance of a working RAID-5 is ok. "Not a lot of write" data warehousing often runs just fine on RAID-5. Until the array degrades. Then it's much much slower than even a single drive would be.
> > The real performance problem with RAID 5 won't show up until a drive > > dies and it starts rebuilding > > I don't agree with that. RAID5 is very slow for random writes, since > it needs to : "The real problem" is when RAID5 loses a drive and goes from "acceptable" kind of slow, to "someone's fired" kind of slow. Then of course in the middle the rebuild, a bad sector is discovered in some place the filesystem has never visited yet on one of the remaining drives, and all hell breaks loose. RAID6 is only one extra disk...
> If you strictly have an OLTP workload, with lots of simultaneous > connections issuing queries across small chunks of data, then > PostgreSQL would be a good match for SQL server. This matches my observations. In fact, PostgreSQL's MVCC seems to work heavily in my favor in OLTP workloads. > On the other-hand, if some of your work load is OLAP with a few > connections issuing complicated queries across large chunks of data, > then PostgreSQL will not perform as well as SQL server. SQL server > can divide processing load of complicated queries across several > processor, while PostgreSQL cannot. While I agree with this in theory, it may or may not have a big impact in practice. If you're not seeing multi-cpu activity spike up on your MSSQL box during complex queries, you aren't likely to benefit much. You can test by timing a query with and without a query hint of MAXDOP 1 select * from foo with (MAXDOP = 1) which limits it to one processor. If it runs just as fast on one processor, then this feature isn't something you'll miss. Another set of features that could swing performance in MSSQL's favor are covering indexes and clustered indexes. You can sort-of get around clustered indexes being unavailable in PostgreSQL - especially on low-churn tables, by scheduling CLUSTER commands. I've seen discussions recently that one or both of these features are being looked at pretty closely for inclusion in PostgreSQL.