Thread: performance comparission postgresql/ms-sql server
hi list, i want to convince people to use postgresql instead of ms-sql server, so i set up a kind of comparission insert data / select data from postgresql / ms-sql server the table i use was pretty basic, id bigserial dist float8 x float8 y float8 z float8 i filled the table with a function which filled x,y,z with incremental increasing values (1,2,3,4,5,6...) and computing from that the dist value for every tupel (sqrt((x*x)+(y*y)+(z*z))). this works fine for both dbms postgresql needs 13:37 min for 10.000.000 tupel, ms-sql needs 1:01:27 h for 10.000.000 tupel. so far so good. i attached an index on the dist row and started to query the dbs with scripts which select a serial row of 100.000,200.000,500.000 tupels based on the dist row. i randomizly compute the start and the end distance and made a "select avg(dist) from table where dist > startdist and dist < enddist" Did the same with a table with 50.000.000 tupel in ms-sql and postgres. the outcome so far: 100.000 from 50.000.000: postgres: 0.88 sec ms-sql: 0.38 sec 200.000 from 50.000.000: postgres: 1.57 sec ms-sql: 0.54 sec 500.000 from 50.000.000: postgres: 3.66 sec ms-sql: 1.18 sec i try a lot of changes to the postgresql.conf regarding "Tuning PostgreSQL for performance" by Shridhar Daithankar, Josh Berkus which did not make a big diffrence to the answering times from postgresql. i'm pretty fine with the insert time... do you have any hints like compiler-flags and so on to get the answering time from postgresql equal to ms-sql? (btw both dbms were running on exactly the same hardware) i use suse 8.1 postgresql 7.2 compiled from the rpms for using postgis, but that is anothe story... 1.5 gig ram 1.8 mhz intel cpu every help welcome best regards heiko
Heiko, > 100.000 from 50.000.000: > > postgres: 0.88 sec > ms-sql: 0.38 sec > > 200.000 from 50.000.000: > > postgres: 1.57 sec > ms-sql: 0.54 sec > > 500.000 from 50.000.000: > > postgres: 3.66 sec > ms-sql: 1.18 sec Questions: 1. Is this the time to return *all rows* or just the first row? Given the different way that PostgreSQL fetches rows to the client from MSSQL, it makes a difference. 2. What are your sort-mem and shared-mem settings? 3. Have you tried clustering the table? 4. Have you done a comparison of selecting random or scattered, instead of serial rows? MSSQL has a tendency to physically store rows in "order" which gives it a certain advantage in this kind of query. -- Josh Berkus Aglio Database Solutions San Francisco
Heiko Kehlenbrink wrote: > hi list, > > i want to convince people to use postgresql instead of ms-sql server, so i > set up a kind of comparission insert data / select data from postgresql / > ms-sql server > > the table i use was pretty basic, > > id bigserial > dist float8 > x float8 > y float8 > z float8 > > i filled the table with a function which filled x,y,z with incremental > increasing values (1,2,3,4,5,6...) and computing from that the dist value > for every tupel (sqrt((x*x)+(y*y)+(z*z))). > > this works fine for both dbms > > postgresql needs 13:37 min for 10.000.000 tupel, > ms-sql needs 1:01:27 h for 10.000.000 tupel. > > so far so good. > > i attached an index on the dist row and started to query the dbs with > scripts which select a serial row of 100.000,200.000,500.000 tupels based > on the dist row. > i randomizly compute the start and the end distance and made a "select > avg(dist) from table where dist > startdist and dist < enddist" Some basics to check quickly. 1. vacuum analyze the table before you start selecting. 2. for slow running queries, check explain analyze output and find out who takes maximum time. 3. Check for typecasting. You need to typecast the query correctly e.g. select avg(dist) from table where dist >startdist::float8 and dist<enddist::float8.. This might still end up with sequential scan depending upon the plan. but if index scan is picked up, it might be plenty fast.. Post explain analyze for the queries if things don't improve. HTH Shridhar
"Heiko Kehlenbrink" <Heiko.Kehlenbrink@vermes.fh-oldenburg.de> writes: > i use suse 8.1 > postgresql 7.2 compiled from the rpms for using postgis, but that is > anothe story... 7.4 might be a little quicker; but in any case you should be doing this sort of comparison using the current release, no? regards, tom lane
Heiko Kehlenbrink wrote: >i use suse 8.1 > postgresql 7.2 compiled from the rpms for using postgis, but that is > > > Try v7.4, there are many performance improvements. It may not make up all the differences but it should help.
Heiko Kehlenbrink wrote: > hkehlenbrink@lin0493l:~> psql -d test -c 'explain analyse select avg(dist) > from massive2 where dist > (1000000*sqrt(3.0))::float8 and dist < > (1500000*sqrt(3.0))::float8;' > NOTICE: QUERY PLAN: > > Aggregate (cost=14884.61..14884.61 rows=1 width=8) (actual > time=3133.24..3133.24 rows=1 loops=1) > -> Index Scan using massive2_dist on massive2 (cost=0.00..13648.17 > rows=494573 width=8) (actual time=0.11..2061.38 rows=499999 loops=1) > Total runtime: 3133.79 msec > > EXPLAIN > > seems to me that most time was needed for the index scanning... Hmm... I would suggest if you are testing, you should try 7.4.2. 7.4 has some good optimisation for hash agregates though I am not sure if it apply to averaging. Also try forcing a seq. scan by turning off index scan. I guess index scan for so many rows is not exactly good thing even if tuple size if pretty small. Shridhar
Heiko Kehlenbrink wrote: >>Hmm... I would suggest if you are testing, you should try 7.4.2. 7.4 has >>some >>good optimisation for hash agregates though I am not sure if it apply to >>averaging. > would be the last option till we are runing other applications on that 7.2 > system I can understand.. >>Also try forcing a seq. scan by turning off index scan. I guess index scan >>for >>so many rows is not exactly good thing even if tuple size if pretty small. > a sequential scann gives me the following result: > > HKehlenbrink@lin0493l:~> time psql -d test -c 'explain analyse select > avg(dist) from massive2 where dist > 1000000*sqrt(3.0)::float8 and dist < > 1500000*sqrt(3.0)::float8 ;' > NOTICE: QUERY PLAN: > > Aggregate (cost=1193714.43..1193714.43 rows=1 width=8) (actual > time=166718.54..166718.54 rows=1 loops=1) > -> Seq Scan on massive2 (cost=0.00..1192478.00 rows=494573 width=8) > (actual time=3233.22..165576.40 rows=499999 loops=1) > Total runtime: 166733.73 msec Certainly bad and not an option.. I can't think of anything offhand to speed this up.. Shridhar
Heiko Kehlenbrink wrote: >i want to convince people to use postgresql instead of ms-sql server, so i >set up a kind of comparission insert data / select data from postgresql / >ms-sql server > > [...] >do you have any hints like compiler-flags and so on to get the answering >time from postgresql equal to ms-sql? > >(btw both dbms were running on exactly the same hardware) > >i use suse 8.1 > postgresql 7.2 compiled from the rpms for using postgis, but that is >anothe story... > 1.5 gig ram > 1.8 mhz intel cpu > > >every help welcome > > Suse 8.1 comes with 2.4 series kernel I suppose. Many have witnessed a speed increase when using 2.6 series kernel. Might consider this too besides the newer PostgreSQL version already suggested. 2.6 has some scheduling options that are not enabled by default but may enhance database performance (http://story.news.yahoo.com/news?tmpl=story&cid=75&e=2&u=/nf/20040405/tc_nf/23603). Kaarel
hi shridhar, > Heiko Kehlenbrink wrote: > >> hi list, >> >> i want to convince people to use postgresql instead of ms-sql server, so i >> set up a kind of comparission insert data / select data from postgresql / >> ms-sql server >> >> the table i use was pretty basic, >> >> id bigserial >> dist float8 >> x float8 >> y float8 >> z float8 >> >> i filled the table with a function which filled x,y,z with incremental increasing values (1,2,3,4,5,6...) and computing from that the dist value >> for every tupel (sqrt((x*x)+(y*y)+(z*z))). >> >> this works fine for both dbms >> >> postgresql needs 13:37 min for 10.000.000 tupel, >> ms-sql needs 1:01:27 h for 10.000.000 tupel. >> >> so far so good. >> >> i attached an index on the dist row and started to query the dbs with scripts which select a serial row of 100.000,200.000,500.000 tupels based >> on the dist row. >> i randomizly compute the start and the end distance and made a "select avg(dist) from table where dist > startdist and dist < enddist" > > Some basics to check quickly. > > 1. vacuum analyze the table before you start selecting. was done, > 2. for slow running queries, check explain analyze output and find out who takes > maximum time. hkehlenbrink@lin0493l:~> psql -d test -c 'explain analyse select avg(dist) from massive2 where dist > (1000000*sqrt(3.0))::float8 and dist < (1500000*sqrt(3.0))::float8;' NOTICE: QUERY PLAN: Aggregate (cost=14884.61..14884.61 rows=1 width=8) (actual time=3133.24..3133.24 rows=1 loops=1) -> Index Scan using massive2_dist on massive2 (cost=0.00..13648.17 rows=494573 width=8) (actual time=0.11..2061.38 rows=499999 loops=1) Total runtime: 3133.79 msec EXPLAIN seems to me that most time was needed for the index scanning... > 3. Check for typecasting. You need to typecast the query correctly e.g. > > select avg(dist) from table where dist >startdist::float8 and > dist<enddist::float8.. > > This might still end up with sequential scan depending upon the plan. but if > index scan is picked up, it might be plenty fast.. > nope, the dist row is float8 and the query-borders are float8 too, also the explain says that an index scann was done. > Post explain analyze for the queries if things don't improve. > see above.. > HTH > > Shridhar > best regards heiko > >