Re: MySQL or Postgres ? - Mailing list pgsql-general

From scott.marlowe
Subject Re: MySQL or Postgres ?
Date
Msg-id Pine.LNX.4.44.0208021204220.28034-100000@css120.ihs.com
Whole thread Raw
In response to Re: MySQL or Postgres ?  (antti@exadata.fi (Antti Halonen))
List pgsql-general
Couple of comments.  Not flamage, just more info.

On 1 Aug 2002, Antti Halonen wrote:

> Fabiàn R. Breschi <fabian.breschi@acm.org> wrote in message news:<3D48185C.B6663ED7@acm.org>...
> > Hi there,
> >
> > If you'd have to suggest which type of database one migh use, how would
> > you reflect the differences between them to show up pros and cons ?
>
> Some thoughts I posted last night to another group under basically same subject:
>
> MySQL has pretty robust replication and multiple ways to handle
> backups. Actually you now got hotbackup for MySQL if you run on innodb
> tables (www.innodb.com), which you probably should, as it offers
> transactions, row-level locking and rough foreign key implementation.
> Hotbackup means that you can backup your live database on the fly.

Note that hot backups of innodb are not free, and are not done with free
software.  http://www.innodb.com/hotbackup.html says that it's 400 euros a
year, or 1000 euros in perpetuity for the backup tool.

> AFAIK, Postgre don't have replication. Great Bridge gang was doing
> something but it never happend I think.

There are (at least) two working means of replication available.
One ships with the postgresql-7.x.x.tar.gz file in the contrib/rserv
directory.  The others are available on the the web site.

I've heard the symantic argument before that "if it's not built in it
doesn't count" which is really silly.  Being built-in does not assure
proper operation and integration any more than not being built in assures
improper operation.  I'll take a replication system (for any database, not
just postgresql) that works, and try it out and test it.  And if there's
more than one, there's probably a good reason for it, i.e different design
goals.  P.s. this isn't posted against your message, but it a pre-emptive
answer to that argument should someone feel the need to bring it up.

> For basic backups both of them should be fine. Note that with Postgre,
> if you get lot's of updates you have to run vacuum to clean up the
> transaction mess, and this basically halts the db.

Note that ALL postgresql backups are hot backups with transactional
integrity.  There is no other way to really do them with the delivered
tool set, and no need to do them any other way.  I.e. you ALWAYS get a
transaction safe dataset from a backup, no matter how many connections are
open and running transactions when you do the backup.  Did I mention that
tool is free?

Bzzzzzt.  Wrong.  But thanks for playing.  :-) Since 7.2 plain vacuums do
not block much of anything.  On my little test box, pgbench goes from ~70
tps to ~65 tps when I run a script that runs a vacuum in the background
continuously with a 1 second break between runs.

Note that 7.1.x had some truly heinous behavior in a heavily transactional
environment running background vacuums, so until 7.2.x came out, your
statement was very much true.

> On some situations MySQL is slower, like some multi-join
> queries and particularly when running on MyISAM tables, in situations when
> select's and updates are coming in on a same table in a steady flow.
> This is because MyISAM handles locking on a _table_ level, which is
> not that nice and makes MySQL not so scalable on those situations. Now
> again if you instead use Inno tables you have no problems because of
> the row locking.

Keep in mind that a table lock may be no big deal on a system with only a
few connections running at once, but when the number of connections runs
into the hundreds or thousands, the response time on the database will
increase in a rather non-linear fashion.  Note that ISAM table locks block
readers as well as writers, so that is why heavy parallel access on a
table locking database is usually a bad thing (TM).  I.e. even a tiny
percentage of your queries need be updates/inserts/deletes to slow down
the server to a crawl.

Innodb would appear to be the answer to this problem.

> Personally, I have not used Postgre in really high-traffic situations,

I've tested it under high traffic.  It's now finally, at 7.2.x capable of
true 24/7 operation with background vacuuming via a cron job.  My
company's intranet site runs on it, but we only average maybe 5 to 10
backends open at a time max.  We can handle about 200 comfortably on our
jack of all trades web server / database machine with 512 Meg ram.  We can
handle, slowly, about 500 to 700 connections.  Under heavy parallel
testing (>200 concurrent users) we have had no problems with Postgresql.

> but on some situations it simply is better because of it's feature
> set. Data integrity specifically being one of them. In many cases the
> most important thing is INTEGRITY on a server level. Performance comes
> next. And no matter what some MySQL idealists say; in some cases you
> simply cannot impliment integrity checks on a client-level (like
> making your client do say, foreign key checks). What if you have many
> different type of client applications modifying the same data? What if
> you have to use some GUI client where nothing can be implimented? Whatever.
> Besides, that's inheritently wrong thinking anyways.

Agreed.  I wish more people understood the issues caused by concurrent
upates well enough to prevent the problems they run into trying to do
database stuff in their client code.  Note that if you are willing to
apply table locks willy nilly, you could theoretically write transactions
in client code on a MySQL ISAM table.  But your performance would be
horrible under any kind of parallel load due to excessive table level
locking.  And you'd still probably have to look out for odd case
concurrent updates that could bite you in the butt about 1 in every 1*10^6
accesses or so.

> It all boils down what is important to _you_. Most of the database
> implementations are unique so making comparisons is pointless. Do you
> need foreign keys? Do you need sub-selects? No? Stored procedures? Do
> you need speed on bulk inserts? Speed on certain select's? If you are
> running website and clients are doing the same queries most of the
> time, then maybe query cache might help?

Agreed again.  This is another way of saying, you decide what data points
are important in your design phase, then look for a database that has the
most datapoints that match up with your requirements, then test it for
applicability and performance, and accept or reject it based on your
own testing.  The best all these MySQL versus Postgresql versus brand X
database comparisons can ever hope to do is provide a kind of jumping off
point for deciding which package to evaluate.

My final point would be that if you benchmarked postgresql before version
7.2.x, then your data is out of date.  So far, I've found 7.2.x to be on
par with MSSQL, MySQL, and most other databases for speed nowadays, if not
faster for a lot of things.  And that's strictly testing single user
access.  In a more concurrent environment, it really shines.

Scott Marlowe


pgsql-general by date:

Previous
From: Michal Taborsky
Date:
Subject: Re: Selecting random row
Next
From: Jeff Davis
Date:
Subject: custom type similar to varchar(#)