Re: postgre vs MySQL - Mailing list pgsql-general

From paul rivers
Subject Re: postgre vs MySQL
Date
Msg-id 47D836E5.5070303@gmail.com
Whole thread Raw
In response to Re: postgre vs MySQL  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: postgre vs MySQL
List pgsql-general
Alvaro Herrera wrote:
> Ivan Sergio Borgonovo wrote:
>
>> On Wed, 12 Mar 2008 09:13:14 -0700
>> paul rivers <rivers.paul@gmail.com> wrote:
>>
>>
>>> For a database of InnoDB tables, people tend to replicate the
>>> database, and then backup the slave (unless the db is trivially
>>>
>> That recalled me the *unsupported* feeling I have that it is easier
>> to setup a HA replication solution on MySQL.
>>
>
> Well, if you have a crappy system that cannot sustain concurrent load or
> even be backed up concurrently with regular operation, one solution is
> to write a kick-ass replication system.
>
> The other solution is to enhance the ability of the system to deal with
> concurrent operation.
>
> We keep hearing how great all those Web 2.0 sites are; Slashdot, Flickr,
> etc; and they all run on farms and farms of MySQL servers, "because
> MySQL replication is so good".  I wonder if replication is an actual
> _need_ or it's there just because the other aspects of the system are so
> crappy

"Kick-ass" imho really means "really simple to setup" and included as
part of the standard db.

There are all kinds of corner cases that can bite you with MySQL
replication. Offhand, I wager most of these (at least in InnoDB) result
from the replication "commit" status of a transaction is in the binlogs,
which is not the same as the InnoDB database commit status in the .ibd
files. Writing out binlog entries happens at a higher level than the
storage engine, and so it's not hard to imagine what can go wrong there.
There are a few my.cnf settings that let you really roll the dice with
data integrity based on this dichotomy, if you so choose.

In those high volume shops, imho replication is a requirement, but in
part to overcome technical limitations of MySQL. Or to phrase it from a
MySQL point of view, to do it the MySQL way. If you have 50-ish minutes,
this video by the YouTube people talks about their evolution with MySQL
(among many other things) :

http://video.google.com/videoplay?docid=-6304964351441328559

The summary from the video is:

- Start with a MySQL instance using InnoDB
- Go to 1-M replication, and use the replicants as read-only version.
- Eventually the cost of replication outweighs the gains, so go to
database sharding
- Keep 1-M replication within a shard group to allow easy backups of a
slave, some read-only use of the slaves, and a new master in case of
master failure (i.e. high availability)


Almost everyone finds MyISAM unworkable in large scale environments
because of the repairs necessary post-crash.


Big complaints about MySQL high-volume shops often, imho, come back to :

- You can only have so many active threads in the InnoDB storage engine
module at a time. See e.g.:

http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#option_mysqld_innodb_thread_concurrency

- Auto_increment columns as pkeys in InnoDB tables are practically
required, yet severely limited scalability due to how a transaction
would lock the structure to get the next auto-increment (significantly
improved in 5.1)

- Shutting down a MySQL engine can take forever, due partly dirty page
writes, partly due to insert buffer merging. See:

http://dev.mysql.com/doc/refman/5.1/en/innodb-insert-buffering.html


There are other complaints you'd expect people to have, but don't seem
to get talked about much, because people are so used to (from my point
of view) working around them. For example, statistics on an InnoDB table
are calculated when the table is first accessed, but not stored
anywhere, so there are extra costs on database startup. The backup issue
with InnoDB has already been covered. Tablespace management in InnoDB
seems exceptionally primitive, and is helped somewhat by the
tablespace-per-table option. There are many more, again imho.

Paul







pgsql-general by date:

Previous
From: "Pavel Stehule"
Date:
Subject: Re: PostgreSQL user documentation wiki open for business
Next
From: Vivek Khera
Date:
Subject: Re: pain of postgres upgrade with extensions