Thread: Replication

Replication

From
Pailloncy Jean-Gérard
Date:
Hi,

I just see that Mysql will propose at the end of the month a full
synchronous replication system with auto-recovery.
http://www.mysql.com/products/cluster/
We need to see when stable version would be released.....

I use PostgreSQL and I would appreciate to have the same features in
PostgreSQL.

Any comments ? (no flame, please)

Cordialement,
Jean-Gérard Pailloncy

Re: Replication

From
Andrew Sullivan
Date:
On Tue, Apr 20, 2004 at 11:26:24AM +0200, Pailloncy Jean-G?rard wrote:
> Hi,
>
> I just see that Mysql will propose at the end of the month a full
> synchronous replication system with auto-recovery.

Well, sort of.  It seems to be yet another 80/20 Solution From MySQL
(tm).

It looks like it's based on a new table type.  It stores everything
in memory, and then writes out asynchronously.  This strikes me as
pretty dangerous from the point of view of reliability: what if the
box dies before the write is complete?  (And don't tell me about
super-redundant high-availability hardware.  I _have_ all that.  All
hardware sucks; HA stuff just sucks less often at a higher price.)
Also, it doesn't support the other table types.  I don't want to
contemplate the horrible mess you'd have to clean up if you had a
transaction crossing three table types and get a hardware failure.

I'm afraid I agree with the recently-posted Oracle Veep interview:
this does not represent any serious challenge to the core ORAC
market.

> I use PostgreSQL and I would appreciate to have the same features in
> PostgreSQL.

Sure, so would I.  Talk to Jan Wieck about what he plans to do
about it, and maybe consider supporting that development work too ;-)

A

--
Andrew Sullivan  | ajs@crankycanuck.ca

Re: Replication

From
Jan Wieck
Date:
Andrew Sullivan wrote:

> On Tue, Apr 20, 2004 at 11:26:24AM +0200, Pailloncy Jean-G?rard wrote:
>> Hi,
>>
>> I just see that Mysql will propose at the end of the month a full
>> synchronous replication system with auto-recovery.
>
> Well, sort of.  It seems to be yet another 80/20 Solution From MySQL
> (tm).
>
> It looks like it's based on a new table type.  It stores everything
> in memory, and then writes out asynchronously.  This strikes me as
> pretty dangerous from the point of view of reliability: what if the
> box dies before the write is complete?  (And don't tell me about
> super-redundant high-availability hardware.  I _have_ all that.  All
> hardware sucks; HA stuff just sucks less often at a higher price.)
> Also, it doesn't support the other table types.  I don't want to
> contemplate the horrible mess you'd have to clean up if you had a
> transaction crossing three table types and get a hardware failure.
>
> I'm afraid I agree with the recently-posted Oracle Veep interview:
> this does not represent any serious challenge to the core ORAC
> market.

Quoting from the MySQL(tm) FAQ about MySQL(tm) Cluster(tm) avaliable at
http://www.mysql.com/products/cluster/faq.html

<quote>
Q: Does MySQL Cluster work with MyISAM and InnoDB?

A: MySQL Cluster can include the MyISAM and InnoDB storage engines. Of
these, the high-availability data must reside in the MySQL Cluster
storage engine.

The MySQL Cluster DB node stores MySQL Cluster data, the MySQL Server
parses SQL and sends requests to the DB node. The MySQL Server does not
store any data belonging to the MySQL Cluster storage engine.

InnoDB/MyISAM data is still stored in the MySQL server and can be used
in the standard way, but that data is not replicated, so that data is
not visible from any other MySQL server that is connected to the MySQL
Cluster.
</quote>

It is just another table handler made available for the SQL query
engine. Touting loudly and on all available channels that "MySQL Cluster
combines the world's most popular open source database with a
parallel-server" naturally leads to the misinterpretation that all the
wonderfull new features like foreign keys, MVCC and rollback will now
horizontally scale over multiple, high available nodes. This is not true.

The NDB table type does not have support for foreign keys, constraints,
triggers. It does support transactions, but these transactions are not
the same transactions as the ones of the InnoDB table handler, so a
COMMIT is not atomic across different table types. MySQL likes to point
out that the largest systems like SAP R/3 do not use referential
integrity on the database level. That is true so far, but having worked
for many years as an SAP base consultant I can tell you that the reason
for that is NOT performance. SAP spends that effort multiple times by
implementing their own, custom integrity control and data domain system
in the DB abstraction layer, to gain DB vendor independence. That
abstraction layer is larger than PHP and Apache together, so this
example is IMHO totally irrelevant for the typical MySQL user.

Also, the NDB table type is based on an in-memory, partitioned storage
engine (that's where the speed comes from) and to get high availablility
one needs at least two times the full database size in RAM (plus some
for the OS and other overhead), and a higher factor to really achieve
the 99.999%. So to serve let's say a 100 GB database, we're talking
about 220-240 GB of RAM. Now that's 8 boxes with 32GB each? And
according to a MySQL consultant I spoke with, the real bottleneck is the
network, so these boxes like to have "better than Gigabit Ethernet" as a
backbone. That are some decent hardware requirements, make sure you have
a forklift on your next shopping list.

So what one gets with NDB on the bottom line is another table type that
is usefull for some special cases. I can imagine for example systems
that read sensor data, which cannot be interrupted. Sensors usually
don't care much about referential integrity, so for the logging system
this is in fact irrelevant, the data has to be stored now and corrected
later. I think it is indeed a big plus for a system, to make that
logging data available inside the same SQL query engine where the more
complicated bits and pieces of the application are implemented in. But
that is all, and that can pretty easy be achieved by doing bulk-loads of
the log data into regular database tables. Unless one really needs the
ability to query and analyse up to the last second of logdata, running
some multiple 100 kilodollar hardware and network equipment just for the
fun of a memory cluster solution is a bit overkill.

As the Oracle VP of product strategy, Ken Jacobs, pointed out: "MySQL is
trying to address certain product shortcomings by acquiring a
third-party technology. This does not mean they now have a product that
is competitive with Oracle—or even other—database products, whether
clustered or not.". Absolutely right Mr. Jacobs, they have done that
before by adding InnoDB, now they added some limited multimaster
replication capabilities. But instead of developing an integrated
solution that includes the InnoDB table handler, where this
functionality would be usefull, they just added a fifth wheel to the cart.

>
>> I use PostgreSQL and I would appreciate to have the same features in
>> PostgreSQL.
>
> Sure, so would I.  Talk to Jan Wieck about what he plans to do
> about it, and maybe consider supporting that development work too ;-)

Ken Jacobs further said "No one has anything at all like Oracle's Real
Application Clusters". And that is right too. However good PostgreSQL by
now compares on SQL features and standalone DB performance. On
replication we are 2 years or more behind.

Right now we need to get the Slony-I project out the door and let that
settle a bit and maybe get enhanced over one more release. With that as
the base, we will start designing a synchronous multimaster system that
can be jump-started from a running, asynchronous replication setup. All
this "high-availability" babble is IMHO totally pointless as long as
there is no way of (re)creataing a (failed) node from scratch without
taking an outage. And that functionality is listed on the MySQL roadmap
for 5.1 ... so somewhere in 2008? Slony does that for async master-slave
right today.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Replication

From
Andrew Sullivan
Date:
On Wed, Apr 21, 2004 at 11:23:51AM -0400, Jan Wieck wrote:
> for that is NOT performance. SAP spends that effort multiple times by
> implementing their own, custom integrity control and data domain system
> in the DB abstraction layer, to gain DB vendor independence. That
> abstraction layer is larger than PHP and Apache together, so this
> example is IMHO totally irrelevant for the typical MySQL user.

Actually, I think it _is_ relevant.  It's proof, IMNSHO, that the
strategy of "doing it in the client" is completely bankrupt.  It's
one thing to do it this way if you have software which is a
category-killer the way SAP is, because you can afford the overhead
of all those developers doing all that extra work, and you can make
your customers buy trillion-dollar hardware to run your bloated
masterpiece.  The Rest Of Us, however, need to do things efficiently,
and that means doing the work in the place where it is least likely
to need to be checked again.  For most database applications, that's
inside the database.  (I'll not now start my rant on the mess caused
by developers who are careless with this principle.)

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The plural of anecdote is not data.
        --Roger Brinner

Re: Replication

From
"Uwe C. Schroeder"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 21 April 2004 10:28 am, Andrew Sullivan wrote:
> On Wed, Apr 21, 2004 at 11:23:51AM -0400, Jan Wieck wrote:
> > for that is NOT performance. SAP spends that effort multiple times by
> > implementing their own, custom integrity control and data domain system
> > in the DB abstraction layer, to gain DB vendor independence. That
> > abstraction layer is larger than PHP and Apache together, so this
> > example is IMHO totally irrelevant for the typical MySQL user.
>
> Actually, I think it _is_ relevant.  It's proof, IMNSHO, that the
> strategy of "doing it in the client" is completely bankrupt.  It's
> one thing to do it this way if you have software which is a
> category-killer the way SAP is, because you can afford the overhead
> of all those developers doing all that extra work, and you can make
> your customers buy trillion-dollar hardware to run your bloated
> masterpiece.  The Rest Of Us, however, need to do things efficiently,
> and that means doing the work in the place where it is least likely
> to need to be checked again.  For most database applications, that's
> inside the database.  (I'll not now start my rant on the mess caused
> by developers who are careless with this principle.)

I concur. However the problem SAP had some 18years ago when they invented
their system were massive differences between databases. The scope they had
in mind didn't allow for whole database layers to be redundant just for the
sake of being able to talk to several database engines - ergo they wrote one
layer and omitted using vendor dependant database features.
Nowadays most relevant databases are pretty compatible when it comes to
constraints, so if you stick to the basics you should be fine now.

    UC

- --
Open Source Solutions 4U, LLC    2570 Fleetwood Drive
Phone:  +1 650 872 2425        San Bruno, CA 94066
Cell:   +1 650 302 2405        United States
Fax:    +1 650 872 2417
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAhvgKjqGXBvRToM4RAmy1AJ9q2n44+9KFAp+o2u3NPqR6DISyGACePO6V
a6L/yfArAk3m0N6lSQVDx0k=
=dRNi
-----END PGP SIGNATURE-----


Re: Replication

From
Christopher Browne
Date:
Martha Stewart called it a Good Thing when ajs@crankycanuck.ca (Andrew Sullivan) wrote:
> On Wed, Apr 21, 2004 at 11:23:51AM -0400, Jan Wieck wrote:
>> for that is NOT performance. SAP spends that effort multiple times
>> by implementing their own, custom integrity control and data domain
>> system in the DB abstraction layer, to gain DB vendor
>> independence. That abstraction layer is larger than PHP and Apache
>> together, so this example is IMHO totally irrelevant for the
>> typical MySQL user.
>
> Actually, I think it _is_ relevant.  It's proof, IMNSHO, that the
> strategy of "doing it in the client" is completely bankrupt.  It's
> one thing to do it this way if you have software which is a
> category-killer the way SAP is, because you can afford the overhead
> of all those developers doing all that extra work, and you can make
> your customers buy trillion-dollar hardware to run your bloated
> masterpiece.  The Rest Of Us, however, need to do things
> efficiently, and that means doing the work in the place where it is
> least likely to need to be checked again.  For most database
> applications, that's inside the database.  (I'll not now start my
> rant on the mess caused by developers who are careless with this
> principle.)

There's a further issue, namely that these "bankrupt" ways were
adopted literally decades ago, and involve the "trillion-dollar"
investments.

Way back when, R/2 was implemented on MVS using IMS, and SAP
implemented their own toolset to manage that, including their own
more-or-less-implicit transaction manager.  Thirty years later, they
have ported it to run on systems that didn't exist back then, but it's
still, in essence, a "mainframe" thing, even if you're running it atop
Windows NT and Microsoft's port of Sybase.  It persists because
there's 30 years worth of ABAP/4 code customized for a zillion
purposes that SAP can keep on selling.

The problem is not one of carelessness; it is that R/2 was written to
run on databases like IMS, a pre-relational hierarchical system, and
Adabas, which couldn't support having more than 256 tables, once upon
a time...  They had to create an ad-hoc variation on CICS, and
starting from scratch would cost trillions.

Even they couldn't afford that.  Microsoft has been trying to do some
"reinvention" of Windows in the new "Longhorn" thing, and despite
having $Billion$ in the bank, it looks like those ambitions are dying
the death of a thousand paper cuts.

Jan's right, in that the typical MySQL user that's building an
unambitious little PHP application doesn't care about the extra
layers.  They wouldn't have bought CICS, whether from IBM or from BEA,
or something more modern, like Tuxedo; if they're using MySQL as a
step up from MS Access, "doing things right" wasn't a notion that they
had in their heads to even contemplate.

It's like deciding to prefer Windows because if you visit Office
Depot, Staples, CompUSA, and Circuit City, that's the only system you
see boxed software for; it's not a measure of goodness, but merely the
fact that it's visible, and can be made serviceable enough.

Sleepycat DB can accurately claim to have hundreds of millions of
deployments simply out of the fact that practically every Linux system
links to it.  (I see 67 programs in my /usr/bin that link to
libdb3.so.3...)  They're obviously far and away the "most popular open
source database."  Entertainingly enough, they have a replication
system, too, and even an XA interface to support 2PC :-).  No SQL,
though...
--
"cbbrowne","@","acm.org"
http://www.ntlug.org/~cbbrowne/linux.html
Why do we drive on parkways and park on driveways?

Re: Replication

From
Christopher Browne
Date:
Quoth uwe@oss4u.com ("Uwe C. Schroeder"):
> I concur. However the problem SAP had some 18years ago when they
> invented their system were massive differences between
> databases. The scope they had in mind didn't allow for whole
> database layers to be redundant just for the sake of being able to
> talk to several database engines - ergo they wrote one layer and
> omitted using vendor dependant database features. Nowadays most
> relevant databases are pretty compatible when it comes to
> constraints, so if you stick to the basics you should be fine now.

One of the issues was always that of locking.  Different systems still
have different semantics.
--
output = reverse("gro.gultn" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/nonrdbms.html
I've implemented a parser combinator library in Generic C#, and indeed
what is  pretty clear   in   a functional language   looks   extremely
scientific in an object-oriented one.  -- Peter Sestoft

Re: Replication

From
Michael Chaney
Date:
On Wed, Apr 21, 2004 at 10:08:07PM -0400, Christopher Browne wrote:
> or something more modern, like Tuxedo; if they're using MySQL as a
> step up from MS Access,
       ^^
That's spelled "down".  Access is almost 100% SQL-92 compliant, allows
subselects, and does pretty good query optimization.  MySQL has nothing
on it.  And, no, I'm not some Microsofty.

Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/

Re: Replication

From
Andrew Sullivan
Date:
On Wed, Apr 21, 2004 at 10:08:07PM -0400, Christopher Browne wrote:
> The problem is not one of carelessness;

No, I agree that in SAP's case it wasn't carelessness.  As you say,
that was a long time ago.  What I am arguing is that it is careless
to design things using that approach today.  And people do.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca

Re: Replication

From
"Eric Comeau"
Date:
>
> On Tue, Apr 20, 2004 at 11:26:24AM +0200, Pailloncy Jean-G?rard wrote:
> > Hi,
> >
> > I just see that Mysql will propose at the end of the month a full
> > synchronous replication system with auto-recovery.
>
> Well, sort of.  It seems to be yet another 80/20 Solution From MySQL
> (tm).
>
> It looks like it's based on a new table type.  It stores everything
> in memory, and then writes out asynchronously.  This strikes me as
> pretty dangerous from the point of view of reliability: what if the
> box dies before the write is complete?  (And don't tell me about
> super-redundant high-availability hardware.  I _have_ all that.  All
> hardware sucks; HA stuff just sucks less often at a higher price.)
> Also, it doesn't support the other table types.  I don't want to
> contemplate the horrible mess you'd have to clean up if you had a
> transaction crossing three table types and get a hardware failure.
>
> I'm afraid I agree with the recently-posted Oracle Veep interview:
> this does not represent any serious challenge to the core ORAC
> market.

What is Oracle selling as their replication solution these days?
When I still had a MetaLink userid they had posted a
   "Product Obsolescence Desupport Notice"
for
    "Oracle Replication Services"
The dates where something like:

Desupport End Dates
 Error Correction Support: 01-SEP-2002
 Extended Assistance Support: 01-SEP-2005

Oracle Recommended customers upgrade/migrate to the following...
which was no migration path exits, as no new versions will be release
and no replacement product is available

Their ORAC if I understand it correctly is a "cluster" solution and
no a "replication" solution.

Guess I should visit their web site and see what they are pedaling for
replication over WAN links these days.

>
> > I use PostgreSQL and I would appreciate to have the same
> features in
> > PostgreSQL.
>
> Sure, so would I.  Talk to Jan Wieck about what he plans to do
> about it, and maybe consider supporting that development work too ;-)
>
> A
>
> --
> Andrew Sullivan  | ajs@crankycanuck.ca
>

Re: Replication

From
Andrew Sullivan
Date:
On Thu, Apr 22, 2004 at 09:42:12AM -0400, Eric Comeau wrote:
>
> What is Oracle selling as their replication solution these days?

[. . .]

> Their ORAC if I understand it correctly is a "cluster" solution and
> no a "replication" solution.

This is an example of why I think most of the discussion about
"replication" is so confusing.  ORAC is certainly a kind of
replication: it provides always-on, hot redundancy in a cluster of
machines.  It's multi-master, and something very close to
asynchronous.  It's a _very_ clever system, but it'll do you not one
whit of good if your primary site fails.  Also, it's not suitable for
use on unreliable hardware: every cluster member failure causes a
"remastering" event which causes everything to stop while remastering
happens.  Finally, it requires some nifty but expensive storage --
storage which itself could be a single point of failure, if it failed
in the right ways.

To solve all of that, Oracle also offers Data Guard.  This is
basically a standard log-shipping technique.  The off-site "standby"
databases can't be used while in standby mode.  This has all the
standard caveats of asynchronous WAN replication, not least of which
is that if you processed a $100 million transaction right before your
master failed, and then you recovered onto a slave which didn't have
that last moment of data, you might find yourself making a $100
million mistake.

So, Oracle Corp offers two different ways to keeo you up nights. :)
I'm sure they're both wonderful products.  But they certainly don't
have a one-size-fits-all approach.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca