Thread: PostgreSQL vs. InnoDB performance
On a particular system, loading 1 million rows (100 bytes, nothing fancy) into PostgreSQL one transaction at a time takes about 90 minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB is supposed to have a similar level of functionality as far as the storage manager is concerned, so I'm puzzled about how this can be. Does anyone know whether InnoDB is taking some kind of questionable shortcuts it doesn't tell me about? The client interface is DBI. This particular test is supposed to simulate a lot of transactions happening in a short time, so turning off autocommit is not relevant. As you might imagine, it's hard to argue when the customer sees these kinds of numbers. So I'd take any FUD I can send back at them. :) -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > On a particular system, loading 1 million rows (100 bytes, nothing > fancy) into PostgreSQL one transaction at a time takes about 90 > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB > is supposed to have a similar level of functionality as far as the > storage manager is concerned, so I'm puzzled about how this can be. > Does anyone know whether InnoDB is taking some kind of questionable > shortcuts it doesn't tell me about? What about fsync/opensync and wal segments? What happens if we turn off fsync entirely? The client interface is DBI. This > particular test is supposed to simulate a lot of transactions happening > in a short time, so turning off autocommit is not relevant. > > As you might imagine, it's hard to argue when the customer sees these > kinds of numbers. So I'd take any FUD I can send back at them. :) > -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
On Fri, Jun 03, 2005 at 12:36:29AM +0200, Peter Eisentraut wrote: > On a particular system, loading 1 million rows (100 bytes, nothing > fancy) into PostgreSQL one transaction at a time takes about 90 > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. > InnoDB is supposed to have a similar level of functionality as far > as the storage manager is concerned, so I'm puzzled about how this > can be. Does anyone know whether InnoDB is taking some kind of > questionable shortcuts it doesn't tell me about? The client > interface is DBI. This particular test is supposed to simulate a > lot of transactions happening in a short time, so turning off > autocommit is not relevant. This doesn't sound like a very good test. Have they tried the OSDL stuff and/or Jan Wieck's PHP-TPCW? http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/ http://pgfoundry.org/projects/tpc-w-php/ > As you might imagine, it's hard to argue when the customer sees > these kinds of numbers. So I'd take any FUD I can send back at > them. :) HTH :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
Peter Eisentraut <peter_e@gmx.net> writes: > On a particular system, loading 1 million rows (100 bytes, nothing > fancy) into PostgreSQL one transaction at a time takes about 90 > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. What sort of hardware, exactly? Simple division says that that's about 11K transactions per minute, which is more or less what you could expect to get with a 15000RPM drive if everyone is honest and a commit actually involves bits hitting a platter. Now we've talked about schemes for committing more than one transaction per disk revolution, but there's no way we could get to 30 per revolution given our lack of knowledge about the actual disk layout. I don't think I believe that InnoDB is really truly committing 330K transactions per minute. Suggest that the customer try a pull-the-plug type of test. Does the DB come back at all, and if so how close to the last reported-committed row has it got? regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Something ain't kosher. I tried the same test with the latest and greatest DBI, DBD::Pg, and PostgreSQL, tuned everything up, and still got around 10,000 transactions per minute or so. There is no way MySQL is doing an order of magnitude or more better than that and using properly transactioned inserts. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200506022050 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFCn6mavJuQZxSWSsgRAq4JAJ9SszAYi6i+RhhS0AQTLGr/+JqI6ACgk9Dj 3qXjrSk1nnh4vdnGmY/R3e0= =kJkK -----END PGP SIGNATURE-----
Am Freitag, den 03.06.2005, 00:36 +0200 schrieb Peter Eisentraut: > On a particular system, loading 1 million rows (100 bytes, nothing > fancy) into PostgreSQL one transaction at a time takes about 90 > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB > is supposed to have a similar level of functionality as far as the > storage manager is concerned, so I'm puzzled about how this can be. > Does anyone know whether InnoDB is taking some kind of questionable > shortcuts it doesn't tell me about? The client interface is DBI. This > particular test is supposed to simulate a lot of transactions happening > in a short time, so turning off autocommit is not relevant. Maybe postgres' actually working ref-integrity checks bite here? That test is a bit vague - maybe we can see more details? :-) > As you might imagine, it's hard to argue when the customer sees these > kinds of numbers. So I'd take any FUD I can send back at them. :) >
Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut: > On a particular system, loading 1 million rows (100 bytes, nothing > fancy) into PostgreSQL one transaction at a time takes about 90 > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB > is supposed to have a similar level of functionality as far as the > storage manager is concerned, so I'm puzzled about how this can be. > Does anyone know whether InnoDB is taking some kind of questionable > shortcuts it doesn't tell me about? So here's another little gem about our friends from Uppsala: If you create a table with InnoDB storage and your server does not have InnoDB configured, it falls back to MyISAM without telling you. As it turns out, the test done with PostgreSQL vs. real InnoDB results in just about identical timings (90 min). The test done using PostgreSQL with fsync off vs. MyISAM also results in about identical timings (3 min). So that looks much better, although the update performance of PostgreSQL is still a lot worse. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > On a particular system, loading 1 million rows (100 bytes, nothing > fancy) into PostgreSQL one transaction at a time takes about 90 > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB > is supposed to have a similar level of functionality as far as the > storage manager is concerned, so I'm puzzled about how this can be. > Does anyone know whether InnoDB is taking some kind of questionable > shortcuts it doesn't tell me about? MySQL/InnoDB offers the same knobs to force commits to disk as PostgreSQL does. Look at innodb_flush_log_at_trx_commit and innodb_flush_method: http://dev.mysql.com/doc/mysql/en/innodb-start.html Jochem
On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote: > Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut: > > On a particular system, loading 1 million rows (100 bytes, nothing > > fancy) into PostgreSQL one transaction at a time takes about 90 > > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB > > is supposed to have a similar level of functionality as far as the > > storage manager is concerned, so I'm puzzled about how this can be. > > Does anyone know whether InnoDB is taking some kind of questionable > > shortcuts it doesn't tell me about? > > So here's another little gem about our friends from Uppsala: If you create a > table with InnoDB storage and your server does not have InnoDB configured, it > falls back to MyISAM without telling you. Silently falling back to something unexpected seems to be quite common there. For sure it's not the only case. :-| > As it turns out, the test done with PostgreSQL vs. real InnoDB results in just > about identical timings (90 min). The test done using PostgreSQL with fsync > off vs. MyISAM also results in about identical timings (3 min). The hardware seems to be the bottleneck. Try improving the performance of your disk systems. It's very unlikely to get _exactly_ the same figures from such two different RDBMS. You expect them to be close, but not identical. BTW, make sure the test correctly emulated multiple clients (say 25, 50 or 100). There's little point in stressing transaction support of a RDBMS when there's only one single actor in the system, and therefore no contention. Transaction code takes always the fast path that way and you're testing the less important part of it. Check out some performance tuning pages, you may need to adjust some OS and PostgreSQL configuration parameters to allow and effectively handle 100+ connections (shared buffers come to mind). I believe the same is true for MySQL. .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
After takin a swig o' Arrakan spice grog, pgsql@esiway.net (Marco Colombo) belched out: > On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote: >> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut: >> > On a particular system, loading 1 million rows (100 bytes, nothing >> > fancy) into PostgreSQL one transaction at a time takes about 90 >> > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB >> > is supposed to have a similar level of functionality as far as the >> > storage manager is concerned, so I'm puzzled about how this can be. >> > Does anyone know whether InnoDB is taking some kind of questionable >> > shortcuts it doesn't tell me about? >> >> So here's another little gem about our friends from Uppsala: If you create a >> table with InnoDB storage and your server does not have InnoDB configured, it >> falls back to MyISAM without telling you. > > Silently falling back to something unexpected seems to be quite common > there. For sure it's not the only case. :-| > >> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just >> about identical timings (90 min). The test done using PostgreSQL with fsync >> off vs. MyISAM also results in about identical timings (3 min). > > The hardware seems to be the bottleneck. Try improving the performance > of your disk systems. It's very unlikely to get _exactly_ the same > figures from such two different RDBMS. You expect them to be close, but > not identical. If the bottleneck is in the identical place, and they are otherwise well-tuned, it is actually *not* that surprising that the timings for "PostgreSQL vs real InnoDB" would be pretty close. If both are being bottlenecked by the same notion of "how fast does the disk spin," then the differences in performance won't be dramatic. > BTW, make sure the test correctly emulated multiple clients (say 25, > 50 or 100). There's little point in stressing transaction support of > a RDBMS when there's only one single actor in the system, and > therefore no contention. Transaction code takes always the fast path > that way and you're testing the less important part of it. Actually, if you can demonstrate near-identical performance under a common set of conditions, that's a really useful datum to start with. It would then certainly be interesting to see how the behaviour changes as various stresses are introduced... -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://linuxdatabases.info/info/spreadsheets.html Signs of a Klingon Programmer - 16. "Klingon programs don't do accountancy. For that, you need a Ferengi."
Have a look at Mysql gotchas... http://sql-info.de/mysql/database-definition.html#2_4 > > So here's another little gem about our friends from Uppsala: If you create a > table with InnoDB storage and your server does not have InnoDB configured, it > falls back to MyISAM without telling you. > > As it turns out, the test done with PostgreSQL vs. real InnoDB results in just > about identical timings (90 min). The test done using PostgreSQL with fsync > off vs. MyISAM also results in about identical timings (3 min). So that > looks much better, although the update performance of PostgreSQL is still a > lot worse. >
On Fri, 2005-06-03 at 08:43 -0400, Christopher Browne wrote: > After takin a swig o' Arrakan spice grog, pgsql@esiway.net (Marco Colombo) belched out: > > The hardware seems to be the bottleneck. Try improving the performance > > of your disk systems. It's very unlikely to get _exactly_ the same > > figures from such two different RDBMS. You expect them to be close, but > > not identical. > > If the bottleneck is in the identical place, and they are otherwise > well-tuned, it is actually *not* that surprising that the timings for > "PostgreSQL vs real InnoDB" would be pretty close. > > If both are being bottlenecked by the same notion of "how fast does > the disk spin," then the differences in performance won't be dramatic. That's my point. If the purpose of the test is to compare "PostgreSQL vs real InnoDB", there should not be any other bottleneck than software itself. > > BTW, make sure the test correctly emulated multiple clients (say 25, > > 50 or 100). There's little point in stressing transaction support of > > a RDBMS when there's only one single actor in the system, and > > therefore no contention. Transaction code takes always the fast path > > that way and you're testing the less important part of it. > > Actually, if you can demonstrate near-identical performance under a > common set of conditions, that's a really useful datum to start with. > > It would then certainly be interesting to see how the behaviour > changes as various stresses are introduced... I take the purpose of the test is also to measure performance under transactional load. Otherwise, inserting 1,000,000 rows one transaction a time is just silly. I was able to do 12,000 row/s with COPY on very cheap hardware (that's 1,000,000 rows in about 90 seconds, not minutes). I think that if you benchmark how things perform in doing silly things, you should expect silly results... So, if you want transactions, make them _real_ transactions. One client sequentially issuing transactions means nothing in this context. Transactions is all about contention. You need a bunch of concurrent clients processes, at least, and possibly a N-way system on the server (to measure "real" contention at OS level too). Otherwise you'd better measure many inserts per transaction (and COPY in PostgreSQL) on a single client (which does make sense on its own). .TM. (who has not been to Arrakis recently) -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
Hi all, I issued the following queries: select substring(proname from 1 to 1) as nevresz, count(*) from pg_proc where nevresz = 'a' order by nevresz group by nevresz; select substring(proname from 1 to 1) as nevresz, count(*) from pg_proc order by nevresz group by nevresz; The first query fails, and says that column 'nevresz' does not exist. The second is OK. What is the problem? I cannot use column alias in where condition? Thanks, Otto
Havasvölgyi Ottó wrote: > Hi all, > > I issued the following queries: > > > select substring(proname from 1 to 1) as nevresz, count(*) > from pg_proc > where nevresz = 'a' > order by nevresz > group by nevresz; > > select substring(proname from 1 to 1) as nevresz, count(*) > from pg_proc > order by nevresz > group by nevresz; > > The first query fails, and says that column 'nevresz' does not exist. > The second is OK. > > What is the problem? I cannot use column alias in where condition? Correct. SQL defines it that way. You could reformulate as a sub-query if you wanted: SELECT nevresz,count(*) FROM ( SELECT substring(proname from 1 to 1) as nevresz FROM pg_proc ) AS dummy GROUP BY nevresz ORDER BY nevresz -- Richard Huxton Archonet Ltd
=?ISO-8859-1?Q?Havasv=F6lgyi_Ott=F3?= <h.otto@freemail.hu> writes: > select substring(proname from 1 to 1) as nevresz, count(*) > from pg_proc > where nevresz = 'a' > order by nevresz > group by nevresz; > What is the problem? I cannot use column alias in where condition? Exactly. Per the SQL spec, the WHERE condition is evaluated before the SELECT list, so it makes no logical sense to do that. Consider for example trying to avoid division-by-zero failure like this: SELECT 1/x AS y WHERE x <> 0; You'd be really unhappy if 1/x were computed so that it could be made available in the WHERE condition. The fact that you're allowed to refer to those aliases in ORDER BY/GROUP BY is a historical accident stemming from the limited ORDER BY facilities in ancient versions of the spec. regards, tom lane
On Fri, 2005-06-03 at 04:38, Peter Eisentraut wrote: > Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut: > > On a particular system, loading 1 million rows (100 bytes, nothing > > fancy) into PostgreSQL one transaction at a time takes about 90 > > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB > > is supposed to have a similar level of functionality as far as the > > storage manager is concerned, so I'm puzzled about how this can be. > > Does anyone know whether InnoDB is taking some kind of questionable > > shortcuts it doesn't tell me about? > > So here's another little gem about our friends from Uppsala: If you create a > table with InnoDB storage and your server does not have InnoDB configured, it > falls back to MyISAM without telling you. If you're as used to PostgreSQL doing the right thing as I'm certain you are, you will get frustrated with type of behaviour from MySQL very quickly. The PostgreSQL design philosophy of doing it right, and throwing errors otherwise is the polar opposite of the MySQL philosophy of design. > As it turns out, the test done with PostgreSQL vs. real InnoDB results in just > about identical timings (90 min). The test done using PostgreSQL with fsync > off vs. MyISAM also results in about identical timings (3 min). So that > looks much better, although the update performance of PostgreSQL is still a > lot worse. I wonder how well they're both run as you increase parallel contention (both readers and writers) and with a RAID controller with battery backed cache. Also, I wonder how well both databases will survive having power removed while under heavy load...
Christopher Browne <cbbrowne@acm.org> writes: > After takin a swig o' Arrakan spice grog, pgsql@esiway.net (Marco Colombo) belched out: >> On Fri, 2005-06-03 at 11:38 +0200, Peter Eisentraut wrote: >>> Am Freitag, 3. Juni 2005 00:36 schrieb Peter Eisentraut: >>> > On a particular system, loading 1 million rows (100 bytes, nothing >>> > fancy) into PostgreSQL one transaction at a time takes about 90 >>> > minutes. Doing the same in MySQL/InnoDB takes about 3 minutes. InnoDB >>> > is supposed to have a similar level of functionality as far as the >>> > storage manager is concerned, so I'm puzzled about how this can be. >>> > Does anyone know whether InnoDB is taking some kind of questionable >>> > shortcuts it doesn't tell me about? >>> >>> So here's another little gem about our friends from Uppsala: If you create a >>> table with InnoDB storage and your server does not have InnoDB configured, it >>> falls back to MyISAM without telling you. >> >> Silently falling back to something unexpected seems to be quite common >> there. For sure it's not the only case. :-| >> >>> As it turns out, the test done with PostgreSQL vs. real InnoDB results in just >>> about identical timings (90 min). The test done using PostgreSQL with fsync >>> off vs. MyISAM also results in about identical timings (3 min). >> >> The hardware seems to be the bottleneck. Try improving the performance >> of your disk systems. It's very unlikely to get _exactly_ the same >> figures from such two different RDBMS. You expect them to be close, but >> not identical. > > If the bottleneck is in the identical place, and they are otherwise > well-tuned, it is actually *not* that surprising that the timings for > "PostgreSQL vs real InnoDB" would be pretty close. > > If both are being bottlenecked by the same notion of "how fast does > the disk spin," then the differences in performance won't be dramatic. Yes, I also think so. One transaction is one transaction, so if neither database is lying, they really should come out with similar results. Having said that, I'm getting much better speed doing very simple transactions, and that is on the low end hardware (Dual PIII 1GHz, IDE disk 7200rpm, Linux 2.6, ext3fs with barrier=1 mount option - so the disk cache can safely be left turned on). I'm getting around 950 transactions with the attached app. Also, observing the output of the iostat utility, it can be seen that disk is quite busy and that it is running with the number of writes comparable to the number of transactions (and the average size of one write operation is near 8KB, which is the default PostgreSQL's block size). extended device statistics device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b hda 0 995 0.4 951.3 1.7 7785.3 8.2 4.2 4.4 0.6 59 zcalusic=# \d words Table "public.words" Column | Type | Modifiers --------+------------------------+----------- word | character varying(256) | #! /usr/bin/perl use DBI; use strict; use warnings; $| = 1; my $dbh = DBI->connect('dbi:Pg:dbname=zcalusic', 'zcalusic', 'useyours', {PrintError => 1, RaiseError => 1, AutoCommit => 1}); my $sth = $dbh->prepare("INSERT INTO words VALUES (?)"); $dbh->do("TRUNCATE TABLE words"); open(WORDS, "</usr/share/dict/words") or die "can't opet words file for reading: $!\n"; my $sofar; my $start = time(); my $time = $start; my $oldtime = $start; while (my $word = <WORDS>) { chomp $word; $sth->execute($word); $sofar++; if (($time = time()) > $oldtime) { print int($sofar / ($time - $start)), " inserts/second \r"; $oldtime = $time; } } print int($sofar / ($time - $start)), " inserts/second\n"; close(WORDS); exit 0; -- Zlatko
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Also, I wonder how well both databases will survive having power removed > while under heavy load... It depends more on the underlying hardware setup (disk/raid array) than on the any other aspect (like OS). Assuming you have fsync enabled, of course. There is a very interesting test that you can do (if you have two machines) to see what happens if one of your machines suddenly loses power. You can read about that here: http://www.livejournal.com/users/brad/2116715.html Most of todays IDE disks comes with write caching turned on by default, and if you lose power, you'll lose some of unwriten data for sure. Turn it off, or if you're using ext3 on Linux 2.6, you can mount your partitions with barrier=1 option which will make your fsyncs safe and still let you get some benefits from write caching. Of course, your production quality database would be on the powerful SCSI disk array behind a good RAID controller. Question there is have you bought the (often) additional battery backup for your RAID card? If notm turn the write caching off, once again, or you WILL lose your data and corrupt your database if you suddenly lose power. -- Zlatko