prelimiary performance comparison pgsql vs mysql - Mailing list pgsql-general

From Rick Schumeyer
Subject prelimiary performance comparison pgsql vs mysql
Date
Msg-id 010001c5288c$5e3b3c40$0200a8c0@dell8200
Whole thread Raw
Responses Re: prelimiary performance comparison pgsql vs mysql  (Miroslav Šulc <miroslav.sulc@startnet.cz>)
Re: prelimiary performance comparison pgsql vs mysql  (Richard Huxton <dev@archonet.com>)
Re: prelimiary performance comparison pgsql vs mysql  (Mark Rae <mrae@purplebat.com>)
Re: prelimiary performance comparison pgsql vs mysql  (Scott Marlowe <smarlowe@g2switchworks.com>)
List pgsql-general
Below are some PRELIMINARY results in comparing the performance of pgsql and
mysql.

These results are for a single process populating a table with 934k rows,
and then performing some selects.  I also compared the effect of creating
indexes on some of the columns.

I have not yet done any testing of transactions, multiple concurrent
processes, etc.

I did not make any changes to the default config settings.  I can do
so if someone has some suggestions.

My machine is a 3.0 GHz P4 with 1 GB ram, running FC 3.

I used pg 8.0.1 and mysql 5.0.2 alpha.

I compiled pg from source, but I downloaded an binary for mysql.  If
someone thinks this is significant, and can point me to a good
binary for pg, I will give it a try.

All timings are as reported by the db.

I included the pg script below..

Finally, I don't have an axe to grind either way.  I'm trying to be
fair, but I am the first to admit I am not an expert in db tuning.
I welcome constructive comments and advice.

**************** data and queries

The data comes from some experimental data I have been working with.

I load data into a table with relavant columns fid, rid, x.
The combination of fid,rid is unique.
x is a location, and is not unique.

I loaded the data using COPY (pg) and LOAD (mysql).

The queries were:

select count(*) from data where fid=2 and rid=6;             count = 100
select count(*) from data where x > 5000 and x < 5500;       count = 35986
select count(*) from data where x > 5000 and x < 5020;       count = 1525

***************** preliminary conclusions

As suspected, MyISAM is very fast.  In the tested case (only
one process, only one big load and some selects) MyISAM tables are
much faster than pg or InnoDB.

For queries, InnoDB and pg are roughly equivalent.  In some cases
one or the other is a little faster, but they are mostly in the
same ballpark.  The one exception seems to be that pg has an edge
in seq scans.

pg is slower loading data when it has to create an index.  Also,
I found that is is critical to run "vacuum analyze" in pg.  Running
"analyze" in mysql did not seem to make much difference.  I'm guessing
that mysql builds statistics while it is loading data, and does not
actually run an analyze since the table has not changed.

******************* preliminary results ***************************
******************* all times in seconds **************************

note: input table has 934500 rows.

                                                 mysql 5.0.2 alpha
                                   PG 8.0.1      MyISAM   InnoDB
   NO INDEXES

Load file                            22.3         3.9      22.1
select count fid=?,rid=?              3.0         0.23      2.07
select count x > 5000, x < 5500       1.2         0.27      1.59
select count x > 5000, x < 5020       0.63        0.29      1.58

  INDEXES on (fid,rid)

Load file                            36.         13.5      30.1
vacuum analyze                        3.6
select count fid=?,rid=?              0.0         0.00      0.02
select count x > 5000, x < 5500       0.702       0.29      2.07
select count x > 5000, x < 5020       0.713       0.28      1.59

  INDEXES on (fid,rid) and (x)

Load file                           202.         24.      151.
vacuum analyze                       11.
select count fid=?,rid=?              0.002       0.00      0.02

select count x > 5000, x < 5500       0.9         0.06      0.75
select count x > 5000, x < 5020       0.048       0.01      0.01

********************* PG-SQL script ************************

\timing

--
-- Load table, no indexes
--

drop table data cascade;

create table data (
    fid    integer,
    rid    integer,
    range    real,
    x    real,
    y    real,
    z    real,
    bs    real,
    snr    real,
    rvel    real,
    cfar    smallint);

COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM
'/home/rick/bed/data/data.dat';
select count(*) from data where fid=2 and rid=6;
select count(*) from data where x > 5000 and x < 5500;
select count(*) from data where x > 5000 and x < 5020;

--
-- Load table, index on (fid,rid)
--

drop table data cascade;

create table data (
    fid    integer,
    rid    integer,
    range    real,
    x    real,
    y    real,
    z    real,
    bs    real,
    snr    real,
    rvel    real,
    cfar    smallint);

create index fidrid_data on data (fid,rid);
COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM
'/home/rick/bed/data/data.dat';
vacuum analyze data;
select count(*) from data where fid=2 and rid=6;
select count(*) from data where x > 5000 and x < 5500;
select count(*) from data where x > 5000 and x < 5020;

--
-- Load table, index on (fid,rid) and (x)
--

drop table data cascade;

create table data (
    fid    integer,
    rid    integer,
    range    real,
    x    real,
    y    real,
    z    real,
    bs    real,
    snr    real,
    rvel    real,
    cfar    smallint);

create index fidrid_data on data (fid,rid);
create index fidx on data (x);
COPY data (fid,rid,range,snr,bs,rvel,cfar,x,y,z) FROM
'/home/rick/bed/data/data.dat';
vacuum analyze data;
select count(*) from data where fid=2 and rid=6;
select count(*) from data where x > 5000 and x < 5500;
select count(*) from data where x > 5000 and x < 5020;




pgsql-general by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: New user: Windows, Postgresql, Python
Next
From: "Net Virtual Mailing Lists"
Date:
Subject: Re: Peculiar performance observation....