Thread: PostgreSQL vs. InnoDB performance

PostgreSQL vs. InnoDB performance

From
Peter Eisentraut
Date:
On a particular system, loading 1 million rows (100 bytes, nothing
fancy) into PostgreSQL one transaction at a time takes about 90
minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
is supposed to have a similar level of functionality as far as the
storage manager is concerned, so I'm puzzled about how this can be.
Does anyone know whether InnoDB is taking some kind of questionable
shortcuts it doesn't tell me about?  The client interface is DBI.  This
particular test is supposed to simulate a lot of transactions happening
in a short time, so turning off autocommit is not relevant.

As you might imagine, it's hard to argue when the customer sees these
kinds of numbers.  So I'd take any FUD I can send back at them. :)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: PostgreSQL vs. InnoDB performance

From
"Joshua D. Drake"
Date:
Peter Eisentraut wrote:
> On a particular system, loading 1 million rows (100 bytes, nothing
> fancy) into PostgreSQL one transaction at a time takes about 90
> minutes.


Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
> is supposed to have a similar level of functionality as far as the
> storage manager is concerned, so I'm puzzled about how this can be.
> Does anyone know whether InnoDB is taking some kind of questionable
> shortcuts it doesn't tell me about?

What about fsync/opensync and wal segments?

What happens if we turn off fsync entirely?


  The client interface is DBI.  This
> particular test is supposed to simulate a lot of transactions happening
> in a short time, so turning off autocommit is not relevant.
>
> As you might imagine, it's hard to argue when the customer sees these
> kinds of numbers.  So I'd take any FUD I can send back at them. :)
>


--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

Re: PostgreSQL vs. InnoDB performance

From
David Fetter
Date:
On Fri, Jun 03, 2005 at 12:36:29AM +0200, Peter Eisentraut wrote:
> On a particular system, loading 1 million rows (100 bytes, nothing
> fancy) into PostgreSQL one transaction at a time takes about 90
> minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.
> InnoDB is supposed to have a similar level of functionality as far
> as the storage manager is concerned, so I'm puzzled about how this
> can be.  Does anyone know whether InnoDB is taking some kind of
> questionable shortcuts it doesn't tell me about?  The client
> interface is DBI.  This particular test is supposed to simulate a
> lot of transactions happening in a short time, so turning off
> autocommit is not relevant.

This doesn't sound like a very good test.  Have they tried the OSDL
stuff and/or Jan Wieck's PHP-TPCW?

http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/
http://pgfoundry.org/projects/tpc-w-php/

> As you might imagine, it's hard to argue when the customer sees
> these kinds of numbers.  So I'd take any FUD I can send back at
> them. :)

HTH :)

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

Re: PostgreSQL vs. InnoDB performance

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On a particular system, loading 1 million rows (100 bytes, nothing
> fancy) into PostgreSQL one transaction at a time takes about 90
> minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.

What sort of hardware, exactly?

Simple division says that that's about 11K transactions per minute,
which is more or less what you could expect to get with a 15000RPM
drive if everyone is honest and a commit actually involves bits hitting
a platter.  Now we've talked about schemes for committing more than one
transaction per disk revolution, but there's no way we could get to 30
per revolution given our lack of knowledge about the actual disk layout.

I don't think I believe that InnoDB is really truly committing 330K
transactions per minute.  Suggest that the customer try a pull-the-plug
type of test.  Does the DB come back at all, and if so how close to the
last reported-committed row has it got?

            regards, tom lane

Re: PostgreSQL vs. InnoDB performance

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Something ain't kosher. I tried the same test with the latest and greatest
DBI, DBD::Pg, and PostgreSQL, tuned everything up, and still got around
10,000 transactions per minute or so. There is no way MySQL is doing an
order of magnitude or more better than that and using properly transactioned
inserts.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200506022050
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFCn6mavJuQZxSWSsgRAq4JAJ9SszAYi6i+RhhS0AQTLGr/+JqI6ACgk9Dj
3qXjrSk1nnh4vdnGmY/R3e0=
=kJkK
-----END PGP SIGNATURE-----



Re: PostgreSQL vs. InnoDB performance

From
Tino Wildenhain
Date:
Am Freitag, den 03.06.2005, 00:36 +0200 schrieb Peter Eisentraut:
> On a particular system, loading 1 million rows (100 bytes, nothing
> fancy) into PostgreSQL one transaction at a time takes about 90
> minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
> is supposed to have a similar level of functionality as far as the
> storage manager is concerned, so I'm puzzled about how this can be.
> Does anyone know whether InnoDB is taking some kind of questionable
> shortcuts it doesn't tell me about?  The client interface is DBI.  This
> particular test is supposed to simulate a lot of transactions happening
> in a short time, so turning off autocommit is not relevant.

Maybe postgres' actually working ref-integrity checks bite here?
That test is a bit vague - maybe we can see more details? :-)

> As you might imagine, it's hard to argue when the customer sees these
> kinds of numbers.  So I'd take any FUD I can send back at them. :)
>


Re: PostgreSQL vs. InnoDB performance

From
Peter Eisentraut
Date:
Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut:
> On a particular system, loading 1 million rows (100 bytes, nothing
> fancy) into PostgreSQL one transaction at a time takes about 90
> minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
> is supposed to have a similar level of functionality as far as the
> storage manager is concerned, so I'm puzzled about how this can be.
> Does anyone know whether InnoDB is taking some kind of questionable
> shortcuts it doesn't tell me about?

So here's another little gem about our friends from Uppsala: If you create a
table with InnoDB storage and your server does not have InnoDB configured, it
falls back to MyISAM without telling you.

As it turns out, the test done with PostgreSQL vs. real InnoDB results in just
about identical timings (90 min).  The test done using PostgreSQL with fsync
off vs. MyISAM also results in about identical timings (3 min).  So that
looks much better, although the update performance of PostgreSQL is still a
lot worse.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: PostgreSQL vs. InnoDB performance

From
Jochem van Dieten
Date:
Peter Eisentraut wrote:
> On a particular system, loading 1 million rows (100 bytes, nothing
> fancy) into PostgreSQL one transaction at a time takes about 90
> minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
> is supposed to have a similar level of functionality as far as the
> storage manager is concerned, so I'm puzzled about how this can be.
> Does anyone know whether InnoDB is taking some kind of questionable
> shortcuts it doesn't tell me about?

MySQL/InnoDB offers the same knobs to force commits to disk as
PostgreSQL does. Look at innodb_flush_log_at_trx_commit and
innodb_flush_method:
http://dev.mysql.com/doc/mysql/en/innodb-start.html

Jochem


Re: PostgreSQL vs. InnoDB performance

From
Marco Colombo
Date:
On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote:
> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut:
> > On a particular system, loading 1 million rows (100 bytes, nothing
> > fancy) into PostgreSQL one transaction at a time takes about 90
> > minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
> > is supposed to have a similar level of functionality as far as the
> > storage manager is concerned, so I'm puzzled about how this can be.
> > Does anyone know whether InnoDB is taking some kind of questionable
> > shortcuts it doesn't tell me about?
>
> So here's another little gem about our friends from Uppsala: If you create a
> table with InnoDB storage and your server does not have InnoDB configured, it
> falls back to MyISAM without telling you.

Silently falling back to something unexpected seems to be quite common
there. For sure it's not the only case.  :-|

> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just
> about identical timings (90 min).  The test done using PostgreSQL with fsync
> off vs. MyISAM also results in about identical timings (3 min).

The hardware seems to be the bottleneck. Try improving the performance
of your disk systems. It's very unlikely to get _exactly_ the same
figures from such two different RDBMS. You expect them to be close, but
not identical.

BTW, make sure the test correctly emulated multiple clients (say 25, 50
or 100). There's little point in stressing transaction support of a
RDBMS when there's only one single actor in the system, and therefore no
contention. Transaction code takes always the fast path that way and
you're testing the less important part of it.

Check out some performance tuning pages, you may need to adjust some OS
and PostgreSQL configuration parameters to allow and effectively handle
100+ connections (shared buffers come to mind). I believe the same is
true for MySQL.

.TM.
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


Re: PostgreSQL vs. InnoDB performance

From
Christopher Browne
Date:
After takin a swig o' Arrakan spice grog, pgsql@esiway.net (Marco Colombo) belched out:
> On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote:
>> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut:
>> > On a particular system, loading 1 million rows (100 bytes, nothing
>> > fancy) into PostgreSQL one transaction at a time takes about 90
>> > minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
>> > is supposed to have a similar level of functionality as far as the
>> > storage manager is concerned, so I'm puzzled about how this can be.
>> > Does anyone know whether InnoDB is taking some kind of questionable
>> > shortcuts it doesn't tell me about?
>>
>> So here's another little gem about our friends from Uppsala: If you create a
>> table with InnoDB storage and your server does not have InnoDB configured, it
>> falls back to MyISAM without telling you.
>
> Silently falling back to something unexpected seems to be quite common
> there. For sure it's not the only case.  :-|
>
>> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just
>> about identical timings (90 min).  The test done using PostgreSQL with fsync
>> off vs. MyISAM also results in about identical timings (3 min).
>
> The hardware seems to be the bottleneck. Try improving the performance
> of your disk systems. It's very unlikely to get _exactly_ the same
> figures from such two different RDBMS. You expect them to be close, but
> not identical.

If the bottleneck is in the identical place, and they are otherwise
well-tuned, it is actually *not* that surprising that the timings for
"PostgreSQL vs real InnoDB" would be pretty close.

If both are being bottlenecked by the same notion of "how fast does
the disk spin," then the differences in performance won't be dramatic.

> BTW, make sure the test correctly emulated multiple clients (say 25,
> 50 or 100). There's little point in stressing transaction support of
> a RDBMS when there's only one single actor in the system, and
> therefore no contention. Transaction code takes always the fast path
> that way and you're testing the less important part of it.

Actually, if you can demonstrate near-identical performance under a
common set of conditions, that's a really useful datum to start with.

It would then certainly be interesting to see how the behaviour
changes as various stresses are introduced...
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://linuxdatabases.info/info/spreadsheets.html
Signs  of a   Klingon Programmer  -  16.  "Klingon programs   don't do
accountancy. For that, you need a Ferengi."

Re: PostgreSQL vs. InnoDB performance

From
Leonardo Francalanci
Date:
Have a look at Mysql gotchas...

http://sql-info.de/mysql/database-definition.html#2_4




>
> So here's another little gem about our friends from Uppsala: If you create a
> table with InnoDB storage and your server does not have InnoDB configured, it
> falls back to MyISAM without telling you.
>
> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just
> about identical timings (90 min).  The test done using PostgreSQL with fsync
> off vs. MyISAM also results in about identical timings (3 min).  So that
> looks much better, although the update performance of PostgreSQL is still a
> lot worse.
>


Re: PostgreSQL vs. InnoDB performance

From
Marco Colombo
Date:
On Fri, 2005-06-03 at 08:43 -0400, Christopher Browne wrote:
> After takin a swig o' Arrakan spice grog, pgsql@esiway.net (Marco Colombo) belched out:
> > The hardware seems to be the bottleneck. Try improving the performance
> > of your disk systems. It's very unlikely to get _exactly_ the same
> > figures from such two different RDBMS. You expect them to be close, but
> > not identical.
>
> If the bottleneck is in the identical place, and they are otherwise
> well-tuned, it is actually *not* that surprising that the timings for
> "PostgreSQL vs real InnoDB" would be pretty close.
>
> If both are being bottlenecked by the same notion of "how fast does
> the disk spin," then the differences in performance won't be dramatic.

That's my point. If the purpose of the test is to compare "PostgreSQL vs
real InnoDB", there should not be any other bottleneck than software
itself.

> > BTW, make sure the test correctly emulated multiple clients (say 25,
> > 50 or 100). There's little point in stressing transaction support of
> > a RDBMS when there's only one single actor in the system, and
> > therefore no contention. Transaction code takes always the fast path
> > that way and you're testing the less important part of it.
>
> Actually, if you can demonstrate near-identical performance under a
> common set of conditions, that's a really useful datum to start with.
>
> It would then certainly be interesting to see how the behaviour
> changes as various stresses are introduced...

I take the purpose of the test is also to measure performance under
transactional load. Otherwise, inserting 1,000,000 rows one transaction
a time is just silly. I was able to do 12,000 row/s with COPY on very
cheap hardware (that's 1,000,000 rows in about 90 seconds, not minutes).
I think that if you benchmark how things perform in doing silly things,
you should expect silly results...

So, if you want transactions, make them _real_ transactions. One client
sequentially issuing transactions means nothing in this context.
Transactions is all about contention. You need a bunch of concurrent
clients processes, at least, and possibly a N-way system on the server
(to measure "real" contention at OS level too).

Otherwise you'd better measure many inserts per transaction (and COPY in
PostgreSQL) on a single client (which does make sense on its own).

.TM. (who has not been to Arrakis recently)
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


field alias in where condition

From
Havasvölgyi Ottó
Date:
Hi all,

I issued the following queries:


select substring(proname from 1 to 1) as nevresz, count(*)
from pg_proc
where nevresz = 'a'
order by nevresz
group by nevresz;

select substring(proname from 1 to 1) as nevresz, count(*)
from pg_proc
order by nevresz
group by nevresz;

The first query fails, and says that column 'nevresz' does not exist.
The second is OK.

What is the problem? I cannot use column alias in where condition?

Thanks,
Otto



Re: field alias in where condition

From
Richard Huxton
Date:
Havasvölgyi Ottó wrote:
> Hi all,
>
> I issued the following queries:
>
>
> select substring(proname from 1 to 1) as nevresz, count(*)
> from pg_proc
> where nevresz = 'a'
> order by nevresz
> group by nevresz;
>
> select substring(proname from 1 to 1) as nevresz, count(*)
> from pg_proc
> order by nevresz
> group by nevresz;
>
> The first query fails, and says that column 'nevresz' does not exist.
> The second is OK.
>
> What is the problem? I cannot use column alias in where condition?

Correct. SQL defines it that way. You could reformulate as a sub-query
if you wanted:

SELECT nevresz,count(*)
FROM
(
   SELECT substring(proname from 1 to 1) as nevresz FROM pg_proc
) AS dummy
GROUP BY nevresz
ORDER BY nevresz


--
   Richard Huxton
   Archonet Ltd


Re: field alias in where condition

From
Tom Lane
Date:
=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes:
> select substring(proname from 1 to 1) as nevresz, count(*)
> from pg_proc
> where nevresz = 'a'
> order by nevresz
> group by nevresz;

> What is the problem? I cannot use column alias in where condition?

Exactly.  Per the SQL spec, the WHERE condition is evaluated before the
SELECT list, so it makes no logical sense to do that.  Consider for
example trying to avoid division-by-zero failure like this:

    SELECT 1/x AS y WHERE x <> 0;

You'd be really unhappy if 1/x were computed so that it could be made
available in the WHERE condition.

The fact that you're allowed to refer to those aliases in ORDER BY/GROUP
BY is a historical accident stemming from the limited ORDER BY
facilities in ancient versions of the spec.

            regards, tom lane

Re: PostgreSQL vs. InnoDB performance

From
Scott Marlowe
Date:
On Fri, 2005-06-03 at 04:38, Peter Eisentraut wrote:
> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut:
> > On a particular system, loading 1 million rows (100 bytes, nothing
> > fancy) into PostgreSQL one transaction at a time takes about 90
> > minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
> > is supposed to have a similar level of functionality as far as the
> > storage manager is concerned, so I'm puzzled about how this can be.
> > Does anyone know whether InnoDB is taking some kind of questionable
> > shortcuts it doesn't tell me about?
>
> So here's another little gem about our friends from Uppsala: If you create a
> table with InnoDB storage and your server does not have InnoDB configured, it
> falls back to MyISAM without telling you.

If you're as used to PostgreSQL doing the right thing as I'm certain you
are, you will get frustrated with type of behaviour from MySQL very
quickly.

The PostgreSQL design philosophy of doing it right, and throwing errors
otherwise is the polar opposite of the MySQL philosophy of design.

> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just
> about identical timings (90 min).  The test done using PostgreSQL with fsync
> off vs. MyISAM also results in about identical timings (3 min).  So that
> looks much better, although the update performance of PostgreSQL is still a
> lot worse.

I wonder how well they're both run as you increase parallel contention
(both readers and writers) and with a RAID controller with battery
backed cache.

Also, I wonder how well both databases will survive having power removed
while under heavy load...

Re: PostgreSQL vs. InnoDB performance

From
Zlatko Calusic
Date:
Christopher Browne <cbbrowne@acm.org> writes:

> After takin a swig o' Arrakan spice grog, pgsql@esiway.net (Marco Colombo) belched out:
>> On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote:
>>> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut:
>>> > On a particular system, loading 1 million rows (100 bytes, nothing
>>> > fancy) into PostgreSQL one transaction at a time takes about 90
>>> > minutes.  Doing the same in MySQL/InnoDB takes about 3 minutes.  InnoDB
>>> > is supposed to have a similar level of functionality as far as the
>>> > storage manager is concerned, so I'm puzzled about how this can be.
>>> > Does anyone know whether InnoDB is taking some kind of questionable
>>> > shortcuts it doesn't tell me about?
>>>
>>> So here's another little gem about our friends from Uppsala: If you create a
>>> table with InnoDB storage and your server does not have InnoDB configured, it
>>> falls back to MyISAM without telling you.
>>
>> Silently falling back to something unexpected seems to be quite common
>> there. For sure it's not the only case.  :-|
>>
>>> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just
>>> about identical timings (90 min).  The test done using PostgreSQL with fsync
>>> off vs. MyISAM also results in about identical timings (3 min).
>>
>> The hardware seems to be the bottleneck. Try improving the performance
>> of your disk systems. It's very unlikely to get _exactly_ the same
>> figures from such two different RDBMS. You expect them to be close, but
>> not identical.
>
> If the bottleneck is in the identical place, and they are otherwise
> well-tuned, it is actually *not* that surprising that the timings for
> "PostgreSQL vs real InnoDB" would be pretty close.
>
> If both are being bottlenecked by the same notion of "how fast does
> the disk spin," then the differences in performance won't be dramatic.

Yes, I also think so. One transaction is one transaction, so if
neither database is lying, they really should come out with similar
results.

Having said that, I'm getting much better speed doing very simple
transactions, and that is on the low end hardware (Dual PIII 1GHz, IDE
disk 7200rpm, Linux 2.6, ext3fs with barrier=1 mount option - so the
disk cache can safely be left turned on). I'm getting around 950
transactions with the attached app.

Also, observing the output of the iostat utility, it can be seen that
disk is quite busy and that it is running with the number of writes
comparable to the number of transactions (and the average size of one
write operation is near 8KB, which is the default PostgreSQL's block
size).

                             extended device statistics
device mgr/s mgw/s    r/s    w/s    kr/s    kw/s   size queue   wait svc_t  %b
hda        0   995    0.4  951.3     1.7  7785.3    8.2   4.2    4.4   0.6  59

zcalusic=# \d words
            Table "public.words"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 word   | character varying(256) |

#! /usr/bin/perl

use DBI;
use strict;
use warnings;

$| = 1;

my $dbh = DBI->connect('dbi:Pg:dbname=zcalusic',
               'zcalusic', 'useyours',
               {PrintError => 1, RaiseError => 1, AutoCommit => 1});

my $sth = $dbh->prepare("INSERT INTO words VALUES (?)");

$dbh->do("TRUNCATE TABLE words");

open(WORDS, "</usr/share/dict/words")
  or die "can't opet words file for reading: $!\n";

my $sofar;
my $start = time();
my $time = $start;
my $oldtime = $start;

while (my $word = <WORDS>) {
  chomp $word;
  $sth->execute($word);
  $sofar++;
  if (($time = time()) > $oldtime) {
    print int($sofar / ($time - $start)), " inserts/second     \r";
    $oldtime = $time;
  }
}
print int($sofar / ($time - $start)), " inserts/second\n";

close(WORDS);

exit 0;

--
Zlatko

Re: PostgreSQL vs. InnoDB performance

From
Zlatko Calusic
Date:
Scott Marlowe <smarlowe@g2switchworks.com> writes:

> Also, I wonder how well both databases will survive having power removed
> while under heavy load...

It depends more on the underlying hardware setup (disk/raid array)
than on the any other aspect (like OS). Assuming you have fsync
enabled, of course.

There is a very interesting test that you can do (if you have two
machines) to see what happens if one of your machines suddenly loses
power. You can read about that here:
   http://www.livejournal.com/users/brad/2116715.html

Most of todays IDE disks comes with write caching turned on by
default, and if you lose power, you'll lose some of unwriten data for
sure. Turn it off, or if you're using ext3 on Linux 2.6, you can mount
your partitions with barrier=1 option which will make your fsyncs safe
and still let you get some benefits from write caching.

Of course, your production quality database would be on the powerful
SCSI disk array behind a good RAID controller. Question there is have
you bought the (often) additional battery backup for your RAID card?
If notm turn the write caching off, once again, or you WILL lose your
data and corrupt your database if you suddenly lose power.
--
Zlatko