Thread: performance comparission postgresql/ms-sql server

performance comparission postgresql/ms-sql server

From
"Heiko Kehlenbrink"
Date:
hi list,

i want to convince people to use postgresql instead of ms-sql server, so i
set up a kind of comparission insert data / select data from postgresql /
ms-sql server

the table i use was pretty basic,

id   bigserial
dist  float8
x     float8
y     float8
z     float8

i filled the table with a function which filled x,y,z with incremental
increasing values (1,2,3,4,5,6...) and computing from that the dist value
for every tupel (sqrt((x*x)+(y*y)+(z*z))).

this works fine for both dbms

postgresql needs 13:37 min for 10.000.000 tupel,
ms-sql     needs 1:01:27 h for 10.000.000 tupel.

so far so good.

i attached an index on the dist row and started to query the dbs with
scripts which select a serial row of 100.000,200.000,500.000 tupels based
on the dist row.
i randomizly compute the start and the end distance and made a "select
avg(dist) from table where dist > startdist and dist < enddist"

Did the same with a table with 50.000.000 tupel in ms-sql and postgres.

the outcome so far:

100.000 from 50.000.000:

postgres: 0.88 sec
ms-sql: 0.38 sec

200.000 from 50.000.000:

postgres: 1.57 sec
ms-sql: 0.54 sec

500.000 from 50.000.000:

postgres: 3.66 sec
ms-sql: 1.18 sec

i try a lot of changes to the postgresql.conf regarding  "Tuning
PostgreSQL for performance"
by
Shridhar Daithankar, Josh Berkus

which did not make a big diffrence to the answering times from postgresql.

i'm pretty fine with the insert time...

do you have any hints like compiler-flags and so on to get the answering
time from postgresql equal to ms-sql?

(btw both dbms were running on exactly the same hardware)

i use suse 8.1
      postgresql 7.2 compiled from the rpms for using postgis, but that is
anothe story...
      1.5 gig ram
      1.8 mhz intel cpu


every help welcome

best regards heiko




Re: performance comparission postgresql/ms-sql server

From
Josh Berkus
Date:
Heiko,

> 100.000 from 50.000.000:
>
> postgres: 0.88 sec
> ms-sql: 0.38 sec
>
> 200.000 from 50.000.000:
>
> postgres: 1.57 sec
> ms-sql: 0.54 sec
>
> 500.000 from 50.000.000:
>
> postgres: 3.66 sec
> ms-sql: 1.18 sec

Questions:

1. Is this the time to return *all rows* or just the first row?   Given the
different way that PostgreSQL fetches rows to the client from MSSQL, it makes
a difference.

2. What are your sort-mem and shared-mem settings?

3. Have you tried clustering the table?

4. Have you done a comparison of selecting random or scattered, instead of
serial rows?   MSSQL has a tendency to physically store rows in "order" which
gives it a certain advantage in this kind of query.


--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: performance comparission postgresql/ms-sql server

From
Shridhar Daithankar
Date:
Heiko Kehlenbrink wrote:

> hi list,
>
> i want to convince people to use postgresql instead of ms-sql server, so i
> set up a kind of comparission insert data / select data from postgresql /
> ms-sql server
>
> the table i use was pretty basic,
>
> id   bigserial
> dist  float8
> x     float8
> y     float8
> z     float8
>
> i filled the table with a function which filled x,y,z with incremental
> increasing values (1,2,3,4,5,6...) and computing from that the dist value
> for every tupel (sqrt((x*x)+(y*y)+(z*z))).
>
> this works fine for both dbms
>
> postgresql needs 13:37 min for 10.000.000 tupel,
> ms-sql     needs 1:01:27 h for 10.000.000 tupel.
>
> so far so good.
>
> i attached an index on the dist row and started to query the dbs with
> scripts which select a serial row of 100.000,200.000,500.000 tupels based
> on the dist row.
> i randomizly compute the start and the end distance and made a "select
> avg(dist) from table where dist > startdist and dist < enddist"

Some basics to check quickly.

1. vacuum analyze the table before you start selecting.
2. for slow running queries, check explain analyze output and find out who takes
maximum time.
3. Check for typecasting. You need to typecast the query correctly e.g.

select avg(dist) from table where dist >startdist::float8 and dist<enddist::float8..

This might still end up with sequential scan depending upon the plan. but if
index scan is picked up, it might be plenty fast..

Post explain analyze for the queries if things don't improve.

  HTH

  Shridhar


Re: performance comparission postgresql/ms-sql server

From
Tom Lane
Date:
"Heiko Kehlenbrink" <Heiko.Kehlenbrink@vermes.fh-oldenburg.de> writes:
> i use suse 8.1
>       postgresql 7.2 compiled from the rpms for using postgis, but that is
> anothe story...

7.4 might be a little quicker; but in any case you should be doing this
sort of comparison using the current release, no?

            regards, tom lane

Re: performance comparission postgresql/ms-sql server

From
"Matthew T. O'Connor"
Date:
Heiko Kehlenbrink wrote:

>i use suse 8.1
>      postgresql 7.2 compiled from the rpms for using postgis, but that is
>
>
>
Try v7.4, there are many performance improvements.  It may not make up
all the differences but it should help.

Re: performance comparission postgresql/ms-sql server

From
Shridhar Daithankar
Date:
Heiko Kehlenbrink wrote:

> hkehlenbrink@lin0493l:~> psql -d test -c 'explain analyse select avg(dist)
> from massive2 where dist > (1000000*sqrt(3.0))::float8 and dist <
> (1500000*sqrt(3.0))::float8;'
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=14884.61..14884.61 rows=1 width=8) (actual
> time=3133.24..3133.24 rows=1 loops=1)
>   ->  Index Scan using massive2_dist on massive2  (cost=0.00..13648.17
> rows=494573 width=8) (actual time=0.11..2061.38 rows=499999 loops=1)
> Total runtime: 3133.79 msec
>
> EXPLAIN
>
> seems to me that most time was needed for the index scanning...

Hmm... I would suggest if you are testing, you should try 7.4.2. 7.4 has some
good optimisation for hash agregates though I am not sure if it apply to averaging.

Also try forcing a seq. scan by turning off index scan. I guess index scan for
so many rows is not exactly good thing even if tuple size if pretty small.

  Shridhar

Re: performance comparission postgresql/ms-sql server

From
Shridhar Daithankar
Date:
Heiko Kehlenbrink wrote:
>>Hmm... I would suggest if you are testing, you should try 7.4.2. 7.4 has
>>some
>>good optimisation for hash agregates though I am not sure if it apply to
>>averaging.
> would be the last option till we are runing other applications on that 7.2
> system

I can understand..

>>Also try forcing a seq. scan by turning off index scan. I guess index scan
>>for
>>so many rows is not exactly good thing even if tuple size if pretty small.
> a sequential scann gives me the following result:
>
> HKehlenbrink@lin0493l:~> time psql -d test -c 'explain analyse select
> avg(dist) from massive2 where dist > 1000000*sqrt(3.0)::float8 and dist <
> 1500000*sqrt(3.0)::float8 ;'
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=1193714.43..1193714.43 rows=1 width=8) (actual
> time=166718.54..166718.54 rows=1 loops=1)
>   ->  Seq Scan on massive2  (cost=0.00..1192478.00 rows=494573 width=8)
> (actual time=3233.22..165576.40 rows=499999 loops=1)
> Total runtime: 166733.73 msec

Certainly bad and not an option.. I can't think of anything offhand to speed
this up..

  Shridhar

Re: performance comparission postgresql/ms-sql server

From
Kaarel
Date:
Heiko Kehlenbrink wrote:

>i want to convince people to use postgresql instead of ms-sql server, so i
>set up a kind of comparission insert data / select data from postgresql /
>ms-sql server
>
>
[...]

>do you have any hints like compiler-flags and so on to get the answering
>time from postgresql equal to ms-sql?
>
>(btw both dbms were running on exactly the same hardware)
>
>i use suse 8.1
>      postgresql 7.2 compiled from the rpms for using postgis, but that is
>anothe story...
>      1.5 gig ram
>      1.8 mhz intel cpu
>
>
>every help welcome
>
>
Suse 8.1 comes with 2.4 series kernel I suppose. Many have witnessed a
speed increase when using 2.6 series kernel. Might consider this too
besides the newer PostgreSQL version already suggested. 2.6 has some
scheduling options that are not enabled by default but may enhance
database performance
(http://story.news.yahoo.com/news?tmpl=story&cid=75&e=2&u=/nf/20040405/tc_nf/23603).

Kaarel

Re: performance comparission postgresql/ms-sql server

From
"Heiko Kehlenbrink"
Date:
hi shridhar,


> Heiko Kehlenbrink wrote:
>
>> hi list,
>>
>> i want to convince people to use postgresql instead of ms-sql server, so i
>> set up a kind of comparission insert data / select data from postgresql /
>> ms-sql server
>>
>> the table i use was pretty basic,
>>
>> id   bigserial
>> dist  float8
>> x     float8
>> y     float8
>> z     float8
>>
>> i filled the table with a function which filled x,y,z with incremental
increasing values (1,2,3,4,5,6...) and computing from that the dist
value
>> for every tupel (sqrt((x*x)+(y*y)+(z*z))).
>>
>> this works fine for both dbms
>>
>> postgresql needs 13:37 min for 10.000.000 tupel,
>> ms-sql     needs 1:01:27 h for 10.000.000 tupel.
>>
>> so far so good.
>>
>> i attached an index on the dist row and started to query the dbs with
scripts which select a serial row of 100.000,200.000,500.000 tupels
based
>> on the dist row.
>> i randomizly compute the start and the end distance and made a "select
avg(dist) from table where dist > startdist and dist < enddist"
>
> Some basics to check quickly.
>
> 1. vacuum analyze the table before you start selecting.

was done,

> 2. for slow running queries, check explain analyze output and find out
who takes
> maximum time.

hkehlenbrink@lin0493l:~> psql -d test -c 'explain analyse select avg(dist)
from massive2 where dist > (1000000*sqrt(3.0))::float8 and dist <
(1500000*sqrt(3.0))::float8;'
NOTICE:  QUERY PLAN:

Aggregate  (cost=14884.61..14884.61 rows=1 width=8) (actual
time=3133.24..3133.24 rows=1 loops=1)
  ->  Index Scan using massive2_dist on massive2  (cost=0.00..13648.17
rows=494573 width=8) (actual time=0.11..2061.38 rows=499999 loops=1) Total
runtime: 3133.79 msec

EXPLAIN

seems to me that most time was needed for the index scanning...

> 3. Check for typecasting. You need to typecast the query correctly e.g.
>
> select avg(dist) from table where dist >startdist::float8 and
> dist<enddist::float8..
>
> This might still end up with sequential scan depending upon the plan.
but if
> index scan is picked up, it might be plenty fast..
>
nope, the dist row is float8 and the query-borders are float8 too, also
the explain says that an index scann was done.

> Post explain analyze for the queries if things don't improve.
>
see above..

>   HTH
>
>   Shridhar
>
best regards
heiko


>
>