Thread: Benchmark-Comparison PostGreSQL vs. SQL Server
Hi, this is my first question here, and also, it's somewhat delicate. So please be patient.
My question is, CAN PostGreSQL perform in the SQL Server area when it comes to speed?
In other words, are there explanations for the results I found (see below)?
Thanks,
Robert
-----
Background:
1. I read people were using PostGreSQL with TeraBytes of data sometimes, or thousands of users. These are things that could easily break SQL Server. - So I thought PostGreSQL might be similar fast to SQL Server.
2. I did some tests:
Windows XP SP2
Several GIGs free harddisk, ~400 MB free RAM
Java 1.5 / JDBC
PostGreSQL 8.0 beta (through Windows Installer), default configuration, default driver
SQL Server 2000 SP3a, default configuration, JDTS driver
Tablespaces of both databases on the same partition
Write-Test: Creating tables (slightly modified TCP-W benchmark)
Read-Test: Simple SELECT statements on all tables, returning the first 1000 rows (cursor variants: read-only and non-locking, resp. updatable and locking)
Results:
Writing: SQL Server 25 times faster.
Reading: SQL Server 100 times faster.
Robert Soeding wrote: > Hi, this is my first question here, and also, it's somewhat delicate. > So please be patient. > > My question is, CAN PostGreSQL perform in the SQL Server area when it > comes to speed? In other words, are there explanations for the > results I found (see below)? Faster in some cases, slower in others in my experience. Oh, and publishing performance comparisons with another database might be in breach of your ms-sql server licencing. > Thanks, Robert > > ----- Background: 1. I read people were using PostGreSQL with > TeraBytes of data sometimes, or thousands of users. These are things > that could easily break SQL Server. - So I thought PostGreSQL might > be similar fast to SQL Server. Some people have very large installations. This obviously isn't on Windows, and not necessarily on x86 hardware. > 2. I did some tests: Windows XP SP2 Several GIGs free harddisk, ~400 > MB free RAM Java 1.5 / JDBC PostGreSQL 8.0 beta (through Windows > Installer), default configuration, default driver SQL Server 2000 > SP3a, default configuration, JDTS driver Tablespaces of both > databases on the same partition Write-Test: Creating tables (slightly > modified TCP-W benchmark) Read-Test: Simple SELECT statements on all > tables, returning the first 1000 rows (cursor variants: read-only and > non-locking, resp. updatable and locking) > > Results: Writing: SQL Server 25 times faster. Reading: SQL Server 100 > times faster. The figures sound wrong. The Windows port isn't likely to be as fast as the *nix versions (certainly not yet) but those figures don't match for my experience with PG on Linux. Unfortunately, although you provide a lot of information, almost none of it tells us what the problem is. So - 1. What configuration changes have you made? 2. How many concurrent connections was this? 3. Were you selecting 1000 rows (LIMIT 1000), selecting all the rows (and only fetching 1000) or actually defining an SQL cursor. 4. What was the load on the machine - CPU or DISK peaking? 5. What was the RAM usage like? -- Richard Huxton Archonet Ltd
> 1. What configuration changes have you made? None, both installations are default configured. > 2. How many concurrent connections was this? One. > 3. Were you selecting 1000 rows (LIMIT 1000), selecting all the rows > (and only fetching 1000) or actually defining an SQL cursor. I used "LIMIT 1000", resp. "TOP 1000" statements. > 4. What was the load on the machine - CPU or DISK peaking? > 5. What was the RAM usage like? CPU (< 10%) and RAM (<10%) usage were very low. - I guess (as you mentioned below) it's the NT file system. When running PostgreSQL queries I can hear the harddisk buzzing, but not with SQL Server queries. On the other hand, if an application has to "fight" against the file system, I would suppose it to increase RAM and CPU usagesignificantly. -----Original Message----- From: Richard Huxton [mailto:dev@archonet.com] Sent: Thursday, November 25, 2004 12:34 PM To: Robert Soeding Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Benchmark-Comparison PostGreSQL vs. SQL Server Robert Soeding wrote: > Hi, this is my first question here, and also, it's somewhat delicate. > So please be patient. > > My question is, CAN PostGreSQL perform in the SQL Server area when it > comes to speed? In other words, are there explanations for the results > I found (see below)? Faster in some cases, slower in others in my experience. Oh, and publishing performance comparisons with another databasemight be in breach of your ms-sql server licencing. > Thanks, Robert > > ----- Background: 1. I read people were using PostGreSQL with > TeraBytes of data sometimes, or thousands of users. These are things > that could easily break SQL Server. - So I thought PostGreSQL might be > similar fast to SQL Server. Some people have very large installations. This obviously isn't on Windows, and not necessarily on x86 hardware. > 2. I did some tests: Windows XP SP2 Several GIGs free harddisk, ~400 > MB free RAM Java 1.5 / JDBC PostGreSQL 8.0 beta (through Windows > Installer), default configuration, default driver SQL Server 2000 > SP3a, default configuration, JDTS driver Tablespaces of both databases > on the same partition Write-Test: Creating tables (slightly modified > TCP-W benchmark) Read-Test: Simple SELECT statements on all tables, > returning the first 1000 rows (cursor variants: read-only and > non-locking, resp. updatable and locking) > > Results: Writing: SQL Server 25 times faster. Reading: SQL Server 100 > times faster. The figures sound wrong. The Windows port isn't likely to be as fast as the *nix versions (certainly not yet) but those figuresdon't match for my experience with PG on Linux. Unfortunately, although you provide a lot of information, almost none of it tells us what the problem is. So - 1. What configurationchanges have you made? 2. How many concurrent connections was this? 3. Were you selecting 1000 rows (LIMIT 1000), selecting all the rows (and only fetching 1000) or actually defining an SQLcursor. 4. What was the load on the machine - CPU or DISK peaking? 5. What was the RAM usage like? -- Richard Huxton Archonet Ltd
Robert Soeding wrote: >> 1. What configuration changes have you made? > > None, both installations are default configured. You'll want to do at least some tuning on PG. Try the URL below for a quick introduction - just the basic stuff is a good start. http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html >> 2. How many concurrent connections was this? > One. OK. That keeps things nice and simple. >> 3. Were you selecting 1000 rows (LIMIT 1000), selecting all the >> rows (and only fetching 1000) or actually defining an SQL cursor. > > > I used "LIMIT 1000", resp. "TOP 1000" statements. OK. The reason I asked was that if you ask for 1000000 rows, then PG will find them all and return them all in a bunch. Most other servers return the first row once it's available. >> 4. What was the load on the machine - CPU or DISK peaking? 5. What >> was the RAM usage like? > > CPU (< 10%) and RAM (<10%) usage were very low. > > - I guess (as you mentioned below) it's the NT file system. When > running PostgreSQL queries I can hear the harddisk buzzing, but not > with SQL Server queries. If you're seeing hard-disk activity that means the data isn't cached. > On the other hand, if an application has to "fight" against the file > system, I would suppose it to increase RAM and CPU usage > significantly. It shouldn't be fighting the file system, but it does use it, and rely on it for caching (rather than bypassing your filesystem cache). For MS-SQL server I'm guessing you're allocating a lot of memory to SQL server and not much to the file-cache. For PG you'll want it the other way around. -- Richard Huxton Archonet Ltd
On Thursday 25 Nov 2004 6:22 pm, Richard Huxton wrote: > > On the other hand, if an application has to "fight" against the file > > system, I would suppose it to increase RAM and CPU usage > > significantly. > > It shouldn't be fighting the file system, but it does use it, and rely > on it for caching (rather than bypassing your filesystem cache). For > MS-SQL server I'm guessing you're allocating a lot of memory to SQL > server and not much to the file-cache. For PG you'll want it the other > way around. Do we have some sort of document about how caching on windows works? It is very simple on linux and BSDs but for other OSs, I haven't seen many suggestions. I am sure OS specific hints would help OP a lot. Shridhar
Shridhar Daithankar wrote: > > Do we have some sort of document about how caching on windows works? It is > very simple on linux and BSDs but for other OSs, I haven't seen many > suggestions. > > I am sure OS specific hints would help OP a lot. Microsoft's own resources are good for this sort of stuff (apologies if URLs wrap): http://www.microsoft.com/technet/prodtechnol/windows2000serv/maintain/optimize/wperfch7.mspx http://www.microsoft.com/resources/documentation/Windows/2000/server/reskit/en-us/Default.asp?url=/resources/documentation/Windows/2000/server/reskit/en-us/core/fnec_evl_ACKS.asp The simplest setting is the simple application/file-sharing switch detailed in article #2 (which was introduced in NT4 iirc). -- Richard Huxton Archonet Ltd
New question: i have tables like table_20041124, table_20041125, etc... i'm trying to make function (for example): ===================================== CREATE FUNCTION get_count(text, text) RETURNS int2 AS ' DECLARE cnt int4; BEGIN SELECT INTO cnt COUNT(*) FROM table_$1 -- That doesn't work WHERE key = $2; RETURN cnt; END;' LANGUAGE 'plpgsql'; ===================================== call this function by: ===================================== SELECT get_count("20041124", "something"); ===================================== string in funstion - FROM table_$1 how could i get a final correct table name here? Thanx!
I think you would have to do it something like this, although whether the SELECT INTO works in an EXECUTE context I am not sure (note, completely untested code!) CREATE FUNCTION get_count(text, text) RETURNS int2 AS ' DECLARE cnt int4; BEGIN EXECUTE ''SELECT INTO cnt COUNT(*) FROM table_'' || $1 || '' WHERE key = '' || $2; RETURN cnt; END;' LANGUAGE 'plpgsql'; > New question: > > i have tables like > table_20041124, > table_20041125, > etc... > > i'm trying to make function (for example): > ===================================== > CREATE FUNCTION get_count(text, text) > RETURNS int2 AS ' > DECLARE > cnt int4; > BEGIN > SELECT INTO cnt COUNT(*) > FROM table_$1 -- That doesn't work > WHERE key = $2; > > RETURN cnt; > END;' > LANGUAGE 'plpgsql'; > ===================================== > > call this function by: > > ===================================== > SELECT get_count("20041124", "something"); > ===================================== > > string in funstion - FROM table_$1 > > how could i get a final correct table name here? > > Thanx! > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Adam Witney wrote: > I think you would have to do it something like this, although whether the > SELECT INTO works in an EXECUTE context I am not sure (note, completely > untested code!) > > CREATE FUNCTION get_count(text, text) RETURNS int2 AS ' > DECLARE > cnt int4; > BEGIN > > EXECUTE ''SELECT INTO cnt COUNT(*) FROM table_'' || $1 || '' WHERE key = > '' || $2; That won't work either, you'll need to user FOR..IN..EXECUTE: CREATE TABLE exectest (a integer, b text, PRIMARY KEY (a)); COPY exectest FROM stdin; 1 aaa 2 bbb 3 ccc \. CREATE FUNCTION demo_exec_fn() RETURNS boolean AS ' DECLARE r RECORD; BEGIN FOR r IN EXECUTE ''SELECT * FROM exectest'' LOOP RAISE NOTICE ''a=%, b=%'', r.a, r.b; END LOOP; RETURN true; END ' LANGUAGE plpgsql; SELECT demo_exec_fn(); -- Richard Huxton Archonet Ltd
Hi, Am Donnerstag, den 25.11.2004, 19:42 +0300 schrieb ON.KG: > New question: > > i have tables like > table_20041124, > table_20041125, > etc... > > i'm trying to make function (for example): > ===================================== > CREATE FUNCTION get_count(text, text) > RETURNS int2 AS ' > DECLARE > cnt int4; > BEGIN > SELECT INTO cnt COUNT(*) > FROM table_$1 -- That doesn't work > WHERE key = $2; > > RETURN cnt; > END;' > LANGUAGE 'plpgsql'; > ===================================== > > call this function by: > > ===================================== > SELECT get_count("20041124", "something"); > ===================================== > > string in funstion - FROM table_$1 > > how could i get a final correct table name here? You need to build your query in your function as a big string and pass it to EXECUTE (See http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN and http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING ) however encoding data into table names does not sound so elegant for me - why not make it a real column? The advantage would be you'd have a real query and let postgres compile a plan for it - which is not possible for execute. If you are concerned on index size you can always use partial indices based on your "table number". Regards Tino
hi, ON.KG wrote: > New question: > > i have tables like > table_20041124, > table_20041125, > etc... > > i'm trying to make function (for example): > ===================================== > CREATE FUNCTION get_count(text, text) > RETURNS int2 AS ' > DECLARE > cnt int4; > BEGIN > SELECT INTO cnt COUNT(*) > FROM table_$1 -- That doesn't work > WHERE key = $2; > > RETURN cnt; > END;' > LANGUAGE 'plpgsql'; > ===================================== > > call this function by: > > ===================================== > SELECT get_count("20041124", "something"); > ===================================== > > string in funstion - FROM table_$1 > > how could i get a final correct table name here? You can use execute for dynamic sql. CREATE FUNCTION get_count(text, text) RETURNS int2 AS ' declare rec record; begin for rec in execute ''select COUNT(*) as num from table_''||$1||'' where key=''''||$2'''' ''; loop return rec.num; end loop; return; end; PS: anyway, you want returns int2 , but you declared int4 :) C.
On Thu, 25 Nov 2004 11:36:33 +0100 "Robert Soeding" <robert.soeding@lisocon.de> wrote: > Hi, this is my first question here, and also, it's somewhat delicate. > So please be patient. > My question is, CAN PostGreSQL perform in the SQL Server area when it > comes to speed? In other words, are there explanations for the > results I found (see below)? > Thanks, > Robert > > ----- > Background: > 1. I read people were using PostGreSQL with TeraBytes of data > sometimes, or thousands of users. These are things that could easily > break SQL Server. - So I thought PostGreSQL might be similar fast to > SQL Server. 2. I did some tests: Windows XP SP2 > Several GIGs free harddisk, ~400 MB free RAM > Java 1.5 / JDBC > PostGreSQL 8.0 beta (through Windows Installer), default > configuration, default driver SQL Server 2000 SP3a, default > configuration, Here's your problem right here. You're never going to get a fair comparison unless you tune the crap out of *both* contenders. PostgreSQL's default configuration is extremely conservative to allow it to be run on very limited resources. > JDTS driver Tablespaces of both databases on the same > partition Write-Test: Creating tables (slightly modified TCP-W > benchmark) Read-Test: Simple SELECT statements on all tables, > returning the first 1000 rows (cursor variants: read-only and > non-locking, resp. updatable and locking) Results: > Writing: SQL Server 25 times faster. > Reading: SQL Server 100 times faster. -- Russ
> > Background: > > 1. I read people were using PostGreSQL with TeraBytes of data > > sometimes, or thousands of users. These are things that could easily > > break SQL Server. - So I thought PostGreSQL might be similar fast to > > SQL Server. 2. I did some tests: Windows XP SP2 > > Several GIGs free harddisk, ~400 MB free RAM > > Java 1.5 / JDBC > > PostGreSQL 8.0 beta (through Windows Installer), default > > configuration, default driver SQL Server 2000 SP3a, default > > configuration, SQL Server like mysql, is very 'noob friendly'...it doesn't require a lot of configuration to be quick is forgiving of lousy access strategies to the database (these are good things btw). PostgreSQL 8.0b was still using old style win32 fsync which maps to flushfilebuffers()...this is an an ultra conservative way of flushing ot disk so this is not apples to apples comparison. That said, SQL Server will be very hard to beat with write performance on windows because it is very tuned to the win32 api and has very efficient disk access. In the mean time for a fairer test you have to turn fsync off in postgresql.conf and you definately want to be running pg 8.1 which has tons of improvements for win32 and a more efficient syncer...although in windows for production database I would highly suggest targeting a caching raid controller. I am right now in the development planning phase of a PostgreSQL databse which has to scale well into the terabyte range. I can tell you without hesitation that the database can handle it and deserves the reputation of a workhorse that it has. Administrating pg is a bit harder to get into but well help you understand what a database is really all about and how to make your data access elegant and efficient. Merlin