[jforman@tecso.com.br: RES: Database comparison question] - Mailing list pgsql-advocacy

From Karsten Hilbert
Subject [jforman@tecso.com.br: RES: Database comparison question]
Date
Msg-id 20040421105240.I817@hermes.hilbert.loc
Whole thread Raw
Responses Re: [jforman@tecso.com.br: RES: Database comparison question]
Re: [jforman@tecso.com.br: RES: Database comparison question]
List pgsql-advocacy
Dear advocacy team,

attached find a posting from the openhealth list. The story
behind this is that a colleague of one of the posters there
(Dan Johnson at Mayo Clinics) asked the OP to help him divert
the flak he was getting at his hospital IT dept. for
proffering OS solutions. The "bad guys" compared MS SQL and
MySQL concluding the latter and thusly all OS DBs are crap
(find details in the openhealth archives).

In the course of discussion someone posted the attached
comparison between PG and MySQL (note, it is not the poster's
work). There are gaping errors in there that I wanted to bring
to your attention.

Now, the people on openhealth are typically very clever and
well-mannered if that has any influence on how you might react
to the attached posting should you choose to do so.

I already posted the URL to your site and the URL to the MySQL
Gotcha's site.

Oh, me personally, I hear you say ? Well, I am just a humble
developer with GnuMed (http://www.gnumed.org) -- based on
PostgreSQL, of course :-)

Karsten Hilbert, MD
GnuMed i18n coordinator
Leipzig, Germany
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

This is one side (I lost the URL where I found this comparison)...



1.10.2.2 Featurewise Comparison of MySQL and PostgreSQL
On the crash-me page (http://www.mysql.com/information/crash-me.php) you can
find a list of those database constructs and limits that one can detect
automatically with a program. Note, however, that a lot of the numerical
limits may be changed with startup options for their respective databases.
This web page is, however, extremely useful when you want to ensure that
your applications work with many different databases or when you want to
convert your application from one database to another.

MySQL Server offers the following advantages over PostgreSQL:

MySQL Server is generally much faster than PostgreSQL. MySQL 4.0.1 also has
a query cache that can boost up the query speed for mostly-read-only sites
many times.
MySQL has a much larger user base than PostgreSQL. Therefore, the code is
tested more and has historically proven more stable than PostgreSQL. MySQL
Server is used more in production environments than PostgreSQL, mostly
thanks to the fact that MySQL AB, formerly TCX DataKonsult AB, has provided
top-quality commercial support for MySQL Server from the day it was
released, whereas until recently PostgreSQL was unsupported.
MySQL Server works better on Windows than PostgreSQL does. MySQL Server runs
as a native Windows application (a service on NT/2000/XP), while PostgreSQL
is run under the Cygwin emulation. We have heard that PostgreSQL is not yet
that stable on Windows but we haven't been able to verify this ourselves.
MySQL has more APIs to other languages and is supported by more existing
programs than PostgreSQL. See section B Contributed Programs.
MySQL Server works on 24/7 heavy-duty systems. In most circumstances you
never have to run any cleanups on MySQL Server. PostgreSQL doesn't yet
support 24/7 systems because you have to run VACUUM once in a while to
reclaim space from UPDATE and DELETE commands and to perform statistics
analyses that are critical to get good performance with PostgreSQL. VACUUM
is also needed after adding a lot of new rows to a table. On a busy system
with lots of changes, VACUUM must be run very frequently, in the worst cases
even many times a day. During the VACUUM run, which may take hours if the
database is big, the database is, from a production standpoint, practically
dead. Please note: in PostgreSQL version 7.2, basic vacuuming no longer
locks tables, thus allowing normal user access during the vacuum. A new
VACUUM FULL command does old-style vacuum by locking the table and shrinking
the on-disk copy of the table.
MySQL replication has been thoroughly tested, and is used by sites like:
Yahoo Finance (http://finance.yahoo.com/)
Mobile.de (http://www.mobile.de/)
Slashdot (http://www.slashdot.org/)
Included in the MySQL distribution are two different testing suites,
`mysql-test-run' and crash-me
(http://www.mysql.com/information/crash-me.php), as well as a benchmark
suite. The test system is actively updated with code to test each new
feature and almost all reproducible bugs that have come to our attention. We
test MySQL Server with these on a lot of platforms before every release.
These tests are more sophisticated than anything we have seen from
PostgreSQL, and they ensure that the MySQL Server is kept to a high
standard.
There are far more books in print about MySQL Server than about PostgreSQL.
O'Reilly, SAMS, Que, and New Riders are all major publishers with books
about MySQL. All MySQL features are also documented in the MySQL online
manual because when a new feature is implemented, the MySQL developers are
required to document it before it's included in the source.
MySQL Server supports more of the standard ODBC functions than PostgreSQL.
MySQL Server has a much more sophisticated ALTER TABLE.
MySQL Server has support for tables without transactions for applications
that need all the speed they can get. The tables may be memory-based, HEAP
tables or disk based MyISAM. See section 7 MySQL Table Types.
MySQL Server has support for two different storage engines that support
transactions, InnoDB, and BerkeleyDB. Because every transaction engine
performs differently under different conditions, this gives the application
writer more options to find an optimal solution for his or her setup, if
need be per individual table. See section 7 MySQL Table Types.
MERGE tables gives you a unique way to instantly make a view over a set of
identical tables and use these as one. This is perfect for systems where you
have log files that you order, for example, by month. See section 7.2 MERGE
Tables.
The option to compress read-only tables, but still have direct access to the
rows in the table, gives you better performance by minimising disk reads.
This is very useful when you are archiving things. See section 4.7.4
myisampack, The MySQL Compressed Read-only Table Generator.
MySQL Server has internal support for full-text search. See section 6.8
MySQL Full-text Search.
You can access many databases from the same connection (depending, of
course, on your privileges).
MySQL Server is coded from the start to be multi-threaded, while PostgreSQL
uses processes. Context switching and access to common storage areas is much
faster between threads than between separate processes. This gives MySQL
Server a big speed advantage in multi-user applications and also makes it
easier for MySQL Server to take full advantage of symmetric multiprocessor
(SMP) systems.
MySQL Server has a much more sophisticated privilege system than PostgreSQL.
While PostgreSQL only supports INSERT, SELECT, and UPDATE/DELETE grants per
user on a database or a table, MySQL Server allows you to define a full set
of different privileges on the database, table, and column level. MySQL
Server also allows you to specify the privilege on host and user
combinations. See section 4.3.1 GRANT and REVOKE Syntax.
MySQL Server supports a compressed client/server protocol which improves
performance over slow links.
MySQL Server employs a ``storage engine'' concept, and is the only
relational database we know of built around this concept. This allows
different low-level table types to be called from the SQL engine, and each
table type can be optimised for different performance characteristics.
All MySQL table types (except InnoDB) are implemented as files (one table
per file), which makes it really easy to back up, move, delete, and even
symlink databases and tables, even when the server is down.
Tools exist to repair and optimise MyISAM tables (the most common MySQL
table type). A repair tool is only needed when a physical corruption of a
datafile happens, usually from a hardware failure. It allows a majority of
the data to be recovered.
Upgrading MySQL Server is painless. When you are upgrading MySQL Server, you
don't need to dump/restore your data, as you have to do with most PostgreSQL
upgrades.
Drawbacks with MySQL Server compared to PostgreSQL:

Because MySQL Server uses threads, which are not yet flawless on many OSs,
one must either use binaries from http://www.mysql.com/downloads/, or
carefully follow our instructions in section 2.3 Installing a MySQL Source
Distribution to get an optimal binary that works in all cases.
Table locking, as used by the non-transactional MyISAM tables, is in many
cases faster than page locks, row locks, or versioning. The drawback,
however, is that if one doesn't take into account how table locks work, a
single long-running query can block a table for updates for a long time.
This can usually be avoided when designing the application. If not, one can
always switch the trouble table to use one of the transactional table types.
See section 5.3.2 Table Locking Issues.
With UDF (user-defined functions) one can extend MySQL Server with both
normal SQL functions and aggregates, but this is not yet as easy or as
flexible as in PostgreSQL. See section 10.2 Adding New Functions to MySQL.
Updates that run over multiple tables used to be harder to do in MySQL
Server. However, this has been fixed in MySQL Server 4.0.2 with multi-table
UPDATE and in MySQL Server 4.1 with subqueries. In MySQL Server 4.0 one can
use multi-table deletes to delete from many tables at the same time. See
section 6.4.6 DELETE Syntax.
PostgreSQL currently offers the following advantages over MySQL Server:

Note that because we know the MySQL road map, we have included in the
following table the version when MySQL Server should support this feature.
Unfortunately we couldn't do this for previous comparisons, because we don't
know the PostgreSQL roadmap.

Feature
 MySQL version

Subqueries
 4.1

Foreign keys
 5.1 (3.23 with InnoDB)

Views
 6.0

Stored procedures
 5.0

Triggers
 5.1

Unions
 4.0

Full outer join
 5.1

Constraints
 5.1

Cursors
 5.0

R-trees
 4.1 (for MyISAM tables)

Inherited tables
 Not planned

Extensible type system
 Not planned


Other reasons someone may consider using PostgreSQL:

Standard usage in PostgreSQL is closer to standard SQL in some cases.
One can speed up PostgreSQL by coding things as stored procedures.
The PostgreSQL optimiser can do some optimisation that the current MySQL
optimiser can't do. Most notable is doing joins when you don't have the
proper keys in place and doing a join where you are using different keys
combined with OR. The MySQL benchmark suite at
http://www.mysql.com/information/benchmarks.html shows you what kind of
constructs you should watch out for when using different databases.
PostgreSQL has a bigger team of developers that contribute to the server.
Drawbacks with PostgreSQL compared to MySQL Server:

VACUUM makes PostgreSQL hard to use in a 24/7 environment.
Only transactional tables.
Much slower INSERT, DELETE, and UPDATE.
For a complete list of drawbacks, you should also examine the first table in
this section.

1.10.2.3 Benchmarking MySQL and PostgreSQL
The only Open Source benchmark that we know of that can be used to benchmark
MySQL Server and PostgreSQL (and other databases) is our own. It can be
found at http://www.mysql.com/information/benchmarks.html.

We have many times asked the PostgreSQL developers and some PostgreSQL users
to help us extend this benchmark to make it the definitive benchmark for
databases, but unfortunately we haven't gotten any feedback for this.

We, the MySQL developers, have, because of this, spent a lot of hours to get
maximum performance from PostgreSQL for the benchmarks, but because we don't
know PostgreSQL intimately, we are sure that there are things that we have
missed. We have on the benchmark page documented exactly how we did run the
benchmark so that it should be easy for anyone to repeat and verify our
results.

The benchmarks are usually run with and without the --fast option. When run
with --fast we are trying to use every trick the server can do to get the
code to execute as fast as possible. The idea is that the normal run should
show how the server would work in a default setup and the --fast run shows
how the server would do if the application developer would use extensions in
the server to make his application run faster.

When running with PostgreSQL and --fast we do a VACUUM after every major
table UPDATE and DROP TABLE to make the database in perfect shape for the
following SELECTs. The time for VACUUM is measured separately.

When running with PostgreSQL 7.1.1 we could, however, not run with --fast
because during the INSERT test, the postmaster (the PostgreSQL daemon) died
and the database was so corrupted that it was impossible to restart
postmaster. After this happened twice, we decided to postpone the --fast
test until the next PostgreSQL release. The details about the machine we run
the benchmark on can be found on the benchmark page.

Before going to the other benchmarks we know of, we would like to give some
background on benchmarks.

It's very easy to write a test that shows any database to be the best
database in the world, by just restricting the test to something the
database is very good at and not testing anything that the database is not
good at. If one, after doing this, summarises the result as a single figure,
things are even easier.

This would be like us measuring the speed of MySQL Server compared to
PostgreSQL by looking at the summary time of the MySQL benchmarks on our web
page. Based on this MySQL Server would be more than 40 times faster than
PostgreSQL, something that is, of course, not true. We could make things
even worse by just taking the test where PostgreSQL performs worst and claim
that MySQL Server is more than 2000 times faster than PostgreSQL.

The case is that MySQL does a lot of optimisations that PostgreSQL doesn't
do. This is, of course, also true the other way around. An SQL optimiser is
a very complex thing, and a company could spend years just making the
optimiser faster and faster.

When looking at the benchmark results you should look for things that you do
in your application and just use these results to decide which database
would be best suited for your application. The benchmark results also show
things a particular database is not good at and should give you a notion
about things to avoid and what you may have to do in other ways.

We know of two benchmark tests that claim that PostgreSQL performs better
than MySQL Server. These are both multi-user tests, a test that we here at
MySQL AB haven't had time to write and include in the benchmark suite,
mainly because it's a big task to do this in a manner that is fair to all
databases.

One is the benchmark paid for by Great Bridge, the company that for 16
months attempted to build a business based on PostgreSQL but now has ceased
operations. This is probably the worst benchmark we have ever seen anyone
conduct. This was not only tuned to only test what PostgreSQL is absolutely
best at, but it was also totally unfair to every other database involved in
the test.

Note: We know that even some of the main PostgreSQL developers did not like
the way Great Bridge conducted the benchmark, so we don't blame the
PostgreSQL team for the way the benchmark was done.

This benchmark has been condemned in a lot of postings and newsgroups, so
here we will just briefly repeat some things that were wrong with it.

The tests were run with an expensive commercial tool that makes it
impossible for an Open Source company like us to verify the benchmarks, or
even check how the benchmarks were really done. The tool is not even a true
benchmark tool, but an application/setup testing tool. To refer to this as a
``standard'' benchmark tool is to stretch the truth a long way.
Great Bridge admitted that they had optimised the PostgreSQL database (with
VACUUM before the test) and tuned the startup for the tests, something they
hadn't done for any of the other databases involved. They say ``This process
optimises indexes and frees up disk space a bit. The optimised indexes boost
performance by some margin.'' Our benchmarks clearly indicate that the
difference in running a lot of selects on a database with and without VACUUM
can easily differ by a factor of 10.
The test results were also strange. The AS3AP test documentation mentions
that the test does ``selections, simple joins, projections, aggregates,
one-tuple updates, and bulk updates.'' PostgreSQL is good at doing SELECTs
and JOINs (especially after a VACUUM), but doesn't perform as well on
INSERTs or UPDATEs. The benchmarks seem to indicate that only SELECTs were
done (or very few updates). This could easily explain the good results for
PostgreSQL in this test. The bad results for MySQL will be obvious a bit
down in this document.
They did run the so-called benchmark from a Windows machine against a Linux
machine over ODBC, a setup that no normal database user would ever do when
running a heavy multi-user application. This tested more the ODBC driver and
the Windows protocol used between the clients than the database itself.
When running the database against Oracle and MS-SQL (Great Bridge has
indirectly indicated the databases they used in the test), they didn't use
the native protocol but instead ODBC. Anyone that has ever used Oracle knows
that all real applications use the native interface instead of ODBC. Doing a
test through ODBC and claiming that the results had anything to do with
using the database in a real-world situation can't be regarded as fair. They
should have done two tests with and without ODBC to provide the right facts
(after having gotten experts to tune all involved databases, of course).
They refer to the TPC-C tests, but they don't mention anywhere that the test
they did was not a true TPC-C test and they were not even allowed to call it
a TPC-C test. A TPC-C test can only be conducted by the rules approved by
the TPC Council (http://www.tpc.org/). Great Bridge didn't do that. By doing
this they have both violated the TPC trademark and miscredited their own
benchmarks. The rules set by the TPC Council are very strict to ensure that
no one can produce false results or make unprovable statements. Apparently
Great Bridge wasn't interested in doing this.
After the first test, we contacted Great Bridge and mentioned to them some
of the obvious mistakes they had done with MySQL Server:
Running with a debug version of our ODBC driver
Running on a Linux system that wasn't optimised for threads
Using an old MySQL version when there was a recommended newer one available
Not starting MySQL Server with the right options for heavy multi-user use
(the default installation of MySQL Server is tuned for minimal resource use)

Great Bridge did run a new test, with our optimised ODBC driver and with
better startup options for MySQL Server, but refused to either use our
updated glibc library or our standard binary (used by 80% of our users),
which was statically linked with a fixed glibc library. From what we are
able to determine, Great Bridge did nothing to ensure that the other
databases were set up correctly to run well in their test environment. We
are sure, however, that they didn't contact Oracle or Microsoft to ask for
their advice in this matter.

The benchmark was paid for by Great Bridge, and they decided to publish only
partial, chosen results (instead of publishing it all).
Tim Perdue, a long-time PostgreSQL fan and a reluctant MySQL user, published
a comparison on PHPbuilder
(http://www.phpbuilder.com/columns/tim20001112.php3).

When we became aware of the comparison, we phoned Tim Perdue about this
because there were a lot of strange things in his results. For example, he
claimed that MySQL Server had a problem with five users in his tests, when
we know that there are users with similar machines running MySQL Server with
2000 simultaneous connections doing 400 queries per second. (In this case
the limit was the web bandwidth, not the database.)

It sounded like he was using a Linux kernel that had some problems with many
threads, such as kernels before 2.4, which had a problem with many threads
on multi-CPU machines. This manual describes the fix for this and Tim should
be aware of this problem.

The other possible problem could have been an old glibc library and that Tim
didn't use a MySQL binary from our site, which is linked with a corrected
glibc library, but had compiled a version of his own. In any of these cases,
the symptom would have been exactly what Tim had measured.

We asked Tim if we could get access to his data so that we could repeat the
benchmark and if he could check the MySQL version on the machine to find out
what was wrong and he promised to come back to us about this. He has not
done that yet.

Because of this we can't put any trust in this benchmark either.

Over time things also change and the preceding benchmarks are no longer very
relevant. MySQL Server now has a couple of different storage engines with
different speed/concurrency tradeoffs. See section 7 MySQL Table Types. It
would be interesting to see how the above tests would run with the different
transactional table types in MySQL Server. PostgreSQL has, of course, also
got new features since the test was made. As these tests are not publicly
available there is no way for us to know how the database would perform in
the same tests today.

Conclusion:

The only benchmarks that exist today that anyone can download and run
against MySQL Server and PostgreSQL are the MySQL benchmarks. We here at
MySQL AB believe that Open Source databases should be tested with Open
Source tools. This is the only way to ensure that no one does tests that
nobody can reproduce and use this to claim that one database is better than
another. Without knowing all the facts it's impossible to answer the claims
of the tester.

The thing we find strange is that every test we have seen about PostgreSQL,
that is impossible to reproduce, claims that PostgreSQL is better in most
cases while our tests, which anyone can reproduce, clearly show otherwise.
With this we don't want to say that PostgreSQL isn't good at many things (it
is!) or that it isn't faster than MySQL Server under certain conditions. We
would just like to see a fair test where PostgreSQL performs very well, so
that we could get some friendly competition going.

For more information about our benchmark suite, see section 5.1.4 The MySQL
Benchmark Suite.

We are working on an even better benchmark suite, including multi-user
tests, and a better documentation of what the individual tests really do and
how to add more tests to the suite.



----------------------------------------------------------------------------
--
John L. Forman                 jforman@tecso.com.br
Tecso Informática Ltda.      www.tecso.com.br
Rua da Glória 190/1002      Fone: +55 (21) 2224-4643
Rio de Janeiro - Brasil        Fax: +55 (21) 2509-0023
----------------------------------------------------------------------------
--

-----Mensagem original-----
De: Jeff Gunther [mailto:jeff.gunther@intalgent.com]
Enviada em: terça-feira, 20 de abril de 2004 18:04
Para: openhealth-list@minoru-development.com
Assunto: Re: Database comparison question






Hi Dan,

Everything listed in this document looks correct except for the "Transaction
Support" item. MySQL with the InnoDB table type does support transactions.
Nevertheless, MySQL has been an excellent solution for many of our projects.
This table should include another row "Cross Platform".
MySQL runs on many different platforms including Windows, Linux, etc.

Unfortunately, since I've never used PostgreSQL, I cannot comment on the
PostgreSQL limitations. However, I do think PostgreSQL does have foreign key
support.

Regards,

Jeff Gunther

"Daniel L. Johnson" <johnson.danl@mayo.edu> wrote on 04/20/2004 04:48:03
PM:

> Dear List,
>
> I was recently sent the following table comparing SQL databases,
> comparing MySQL with SQL Server 2000.
> http://www.danlj.org/~danlj/OpenSource/Database_Comparisons.doc.html
>
> It does not seem accurate to me, and it omits PostgreSQL.
>
> 1: are the MySQL feature limitations cited accurate?
> 2: would PostgreSQL have fewer limitations?
>
> Thanks,
>
> Dan Johnson
>



pgsql-advocacy by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] What can we learn from MySQL?
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: [HACKERS] What can we learn from MySQL?