Thread: Proposal for a cascaded master-slave replication system

Proposal for a cascaded master-slave replication system

From

Jan Wieck

Date:

11 November 2003, 11:34:25

Dear community,

for some reason the post I sent yesterday night still did not show up on
the mailing lists. I have set up some links on the developers side under
http://developer.postgresql.org/~wieck/slony1.html

The concept will be the base for some of my work as a Software Engineer
here at Afilias USA INC. in the near future. Afilias is like many of you
in need of reliable and performant replication solutions for backup and
failover purposes. We started this work a couple of weeks ago by
defining the goals and required features for our usage of PostgreSQL.

Slony-I will be the first of 2 distinct replication systems designed
with the 24/7 datacenter in mind.

We want to build this system as a community project. The plan was from
the beginning to release the product under the BSD license. And we think
it is best to start it as such and to ask for suggestions during the
design phase already.

I would like to start developing the replication engine itself as soon
as possible. And as a PostgreSQL CORE developer I will sure put some of
my spare time into this as well. On the other hand there is absolutely
no design other than "they mostly call some stored procedures" done for
the frontend tools yet, and I think that we need some real good admin
tools in the end.

I look forward to your comments.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Proposal for a cascaded master-slave replication system

From

Joe Conway

Date:

11 November 2003, 13:16:36

Jan Wieck wrote:
> http://developer.postgresql.org/~wieck/slony1.html

Very interesting read. Nice work!

> We want to build this system as a community project. The plan was from
> the beginning to release the product under the BSD license. And we think
> it is best to start it as such and to ask for suggestions during the
> design phase already.

I couldn't quite tell from the design doc -- do you intend to support
conditional replication at a row level?

I'm also curious, with cascaded replication, how do you handle the case
where a second level slave has a transaction failure for some reason, i.e.:

             M
            / \
           /   \
         Sa     Sb
        /  \   /  \
       Sc  Sd Se  Sf

What happens if data is successfully replicated to Sa, Sb, Sc, and Sd,
and then an exception/rollback occurs on Se?

Joe

Re: Proposal for a cascaded master-slave replication system

From

Andrew Rawnsley

Date:

11 November 2003, 13:44:58

On Nov 11, 2003, at 12:11 PM, Joe Conway wrote:

> Jan Wieck wrote:
>> http://developer.postgresql.org/~wieck/slony1.html
>
> Very interesting read. Nice work!

Ditto.  I'll read it a bit closer later,  but after a quick read it
seems quite complete and well thought out. I especially like
that sequences are being dealt with.

Thanks for putting the effort in, and making it a community project.

>
>> We want to build this system as a community project. The plan was from
>> the beginning to release the product under the BSD license. And we
>> think
>> it is best to start it as such and to ask for suggestions during the
>> design phase already.
>
> I couldn't quite tell from the design doc -- do you intend to support
> conditional replication at a row level?
>
> I'm also curious, with cascaded replication, how do you handle the
> case where a second level slave has a transaction failure for some
> reason, i.e.:
>
>             M
>            / \
>           /   \
>         Sa     Sb
>        /  \   /  \
>       Sc  Sd Se  Sf
>
> What happens if data is successfully replicated to Sa, Sb, Sc, and Sd,
> and then an exception/rollback occurs on Se?
>
> Joe
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
--------------------

Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com

Re: Proposal for a cascaded master-slave replication system

From

Jan Wieck

Date:

11 November 2003, 15:55:45

Joe Conway wrote:

> Jan Wieck wrote:
>> http://developer.postgresql.org/~wieck/slony1.html
>
> Very interesting read. Nice work!
>
>> We want to build this system as a community project. The plan was from
>> the beginning to release the product under the BSD license. And we think
>> it is best to start it as such and to ask for suggestions during the
>> design phase already.
>
> I couldn't quite tell from the design doc -- do you intend to support
> conditional replication at a row level?

If you mean to configure the system to replicate rows to different
destinations (slaves) based on arbitrary qualifications, no. I had
thought about it, but it does not really fit into the "datacenter and
failover" picture, so it is not required to meet the goals and adds
unnecessary complexity.

This sort of feature is much more important for a replication system
designed for hundreds or thousands of sporadic, asynchronous
multi-master systems, the typical "salesman on the street" kind of
replication.

>
> I'm also curious, with cascaded replication, how do you handle the case
> where a second level slave has a transaction failure for some reason, i.e.:
>
>              M
>             / \
>            /   \
>          Sa     Sb
>         /  \   /  \
>        Sc  Sd Se  Sf
>
> What happens if data is successfully replicated to Sa, Sb, Sc, and Sd,
> and then an exception/rollback occurs on Se?

First, it does not replicate single transactions. It replicates batches
of them together. Since the transactions are already committed (and
possibly some other depending on them too), there is no way - you loose Se.

If this is only a temporary failure, like a power fail and the database
recovers on restart fine including the last confirmed SYNC event (they
get confirmed after they commit locally, but that's before the next
checkpoint so there is actually a gap where the slave could loose a
committed transaction and then it's lost for sure) ... so if it comes
back up without loosing the last confirmed SYNC, it will catch up.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Proposal for a cascaded master-slave replication system

From

Joe Conway

Date:

11 November 2003, 18:06:38

Jan Wieck wrote:
> If you mean to configure the system to replicate rows to different
> destinations (slaves) based on arbitrary qualifications, no. I had
> thought about it, but it does not really fit into the "datacenter and
> failover" picture, so it is not required to meet the goals and adds
> unnecessary complexity.
>
> This sort of feature is much more important for a replication system
> designed for hundreds or thousands of sporadic, asynchronous
> multi-master systems, the typical "salesman on the street" kind of
> replication.

OK, thanks. This actually fits any kind of distributed application. We
have one that lives in our datacenters, but needs to replicate across
both fast LAN/MAN and slow WAN. It is multimaster in the sense that
individual data rows can be originated anywhere, but they are read-only
in nodes other than where they were originated. Anyway, I'm using a
hacked copy of dbmirror at the moment.

> First, it does not replicate single transactions. It replicates batches
> of them together. Since the transactions are already committed (and
> possibly some other depending on them too), there is no way - you loose Se.

OK, got it. Thanks.

Joe

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Jan Wieck

Date:

12 November 2003, 11:47:58

Hans-Jürgen Schönig wrote:

> Jan,
>
> First of all we really appreciate that this is going to be an Open
> Source project.
> There is something I wanted to add from a marketing point of view: I
> have done many public talks in the 2 years or so. There is one question
> people keep asking me: "How about the pgreplication project?". In every
> training course, at any conference people keep asking for synchronous
> replication. We have offered this people some async solutions which are
> already out there but nobody seems to be interested in having it (my
> person impression). People keep asking for a sync approach via email but
> nobody seems to care about an async approach. This does not mean that
> async is bad but we can see a strong demand for synchronous replication.
>
> Meanwhile we seem to be in a situation where PostgreSQL is rather
> competing against Oracle than against MySQL. In our case there are more
> people asking for Oracle -> Pg migration than for MySQL -> Pg. MySQL
> does not seem to be the great enemy because most people know that it is
> an inferior product anyway. What I want to point out is that some people
> want an alternative Oracle's Real Application Cluster. They want load
> balancing and hot failover. Even data centers asking for replication did
> not want to have an async approach in the past.

Hans-Jürgen,

we are well aware of the high demand for multi-master replication
addressing load balancing and clustering. We have that need ourself as
well and I plan to work on a follow-up project as soon as Slony-I is
released. But as of now, we see a higher priority for a reliable master
slave system that includes the cascading and backup features described
in my concept. There are a couple of different similar product out
there, I know. But show me one of them where you can failover without
becoming the single point of failure? We've just recently seen ... or
better "where not able to see anything any more" how failures tend to
ripple through systems - half of the US East Coast was dark. So where is
the replication system where a slave becomes the "master", and not a
standalone server. Show me one that has a clear concept of failback, one
that has hot-join as a primary design goal. These are the features that
I expect if something is labeled "Enterprise Level".

As far as my ideas for multi-master go, it will be a synchronous
solution using group communication. My idea is "group commit" instead of
2-Phase ... and an early stage test hack has replicated some update 3
weeks ago. The big challange will be to integrate the two systems so
that a node can start as an asynchronous Slony-I slave, catch up ... and
switch over to synchronous multimaster without stopping the cluster. I
have no clue yet how to do that, but I refuse to think smaller.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Jordan Henderson

Date:

12 November 2003, 15:57:36

Jan,

I am wondering if you are familar with the work covered in 'Recovery in
Parallel Database Systems' by Svein-Olaf Hvasshovd (Vieweg) ? The book is an
excellent detailed description covering high availablility DB
implementations.

I think your right on by not thinking smaller!!

Jordan Henderson
On Wednesday 12 November 2003 10:45, Jan Wieck wrote:
> Hans-Jürgen Schönig wrote:
> > Jan,
> >
> > First of all we really appreciate that this is going to be an Open
> > Source project.
> > There is something I wanted to add from a marketing point of view: I
> > have done many public talks in the 2 years or so. There is one question
> > people keep asking me: "How about the pgreplication project?". In every
> > training course, at any conference people keep asking for synchronous
> > replication. We have offered this people some async solutions which are
> > already out there but nobody seems to be interested in having it (my
> > person impression). People keep asking for a sync approach via email but
> > nobody seems to care about an async approach. This does not mean that
> > async is bad but we can see a strong demand for synchronous replication.
> >
> > Meanwhile we seem to be in a situation where PostgreSQL is rather
> > competing against Oracle than against MySQL. In our case there are more
> > people asking for Oracle -> Pg migration than for MySQL -> Pg. MySQL
> > does not seem to be the great enemy because most people know that it is
> > an inferior product anyway. What I want to point out is that some people
> > want an alternative Oracle's Real Application Cluster. They want load
> > balancing and hot failover. Even data centers asking for replication did
> > not want to have an async approach in the past.
>
> Hans-Jürgen,
>
> we are well aware of the high demand for multi-master replication
> addressing load balancing and clustering. We have that need ourself as
> well and I plan to work on a follow-up project as soon as Slony-I is
> released. But as of now, we see a higher priority for a reliable master
> slave system that includes the cascading and backup features described
> in my concept. There are a couple of different similar product out
> there, I know. But show me one of them where you can failover without
> becoming the single point of failure? We've just recently seen ... or
> better "where not able to see anything any more" how failures tend to
> ripple through systems - half of the US East Coast was dark. So where is
> the replication system where a slave becomes the "master", and not a
> standalone server. Show me one that has a clear concept of failback, one
> that has hot-join as a primary design goal. These are the features that
> I expect if something is labeled "Enterprise Level".
>
> As far as my ideas for multi-master go, it will be a synchronous
> solution using group communication. My idea is "group commit" instead of
> 2-Phase ... and an early stage test hack has replicated some update 3
> weeks ago. The big challange will be to integrate the two systems so
> that a node can start as an asynchronous Slony-I slave, catch up ... and
> switch over to synchronous multimaster without stopping the cluster. I
> have no clue yet how to do that, but I refuse to think smaller.
>
>
> Jan

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Hans-Jürgen Schönig

Date:

12 November 2003, 15:57:41

Jan,

This is EXACTLY what we have been waiting for (years) :) :) :).
If you need somebody for testing or documentation just drop me a line.

    Cheers,

        Hans



Jan Wieck wrote:
> Hans-Jürgen Schönig wrote:
>
>> Jan,
>>
>> First of all we really appreciate that this is going to be an Open
>> Source project.
>> There is something I wanted to add from a marketing point of view: I
>> have done many public talks in the 2 years or so. There is one
>> question people keep asking me: "How about the pgreplication
>> project?". In every training course, at any conference people keep
>> asking for synchronous replication. We have offered this people some
>> async solutions which are already out there but nobody seems to be
>> interested in having it (my person impression). People keep asking for
>> a sync approach via email but nobody seems to care about an async
>> approach. This does not mean that async is bad but we can see a strong
>> demand for synchronous replication.
>>
>> Meanwhile we seem to be in a situation where PostgreSQL is rather
>> competing against Oracle than against MySQL. In our case there are
>> more people asking for Oracle -> Pg migration than for MySQL -> Pg.
>> MySQL does not seem to be the great enemy because most people know
>> that it is an inferior product anyway. What I want to point out is
>> that some people want an alternative Oracle's Real Application
>> Cluster. They want load balancing and hot failover. Even data centers
>> asking for replication did not want to have an async approach in the
>> past.
>
>
> Hans-Jürgen,
>
> we are well aware of the high demand for multi-master replication
> addressing load balancing and clustering. We have that need ourself as
> well and I plan to work on a follow-up project as soon as Slony-I is
> released. But as of now, we see a higher priority for a reliable master
> slave system that includes the cascading and backup features described
> in my concept. There are a couple of different similar product out
> there, I know. But show me one of them where you can failover without
> becoming the single point of failure? We've just recently seen ... or
> better "where not able to see anything any more" how failures tend to
> ripple through systems - half of the US East Coast was dark. So where is
> the replication system where a slave becomes the "master", and not a
> standalone server. Show me one that has a clear concept of failback, one
> that has hot-join as a primary design goal. These are the features that
> I expect if something is labeled "Enterprise Level".
>
> As far as my ideas for multi-master go, it will be a synchronous
> solution using group communication. My idea is "group commit" instead of
> 2-Phase ... and an early stage test hack has replicated some update 3
> weeks ago. The big challange will be to integrate the two systems so
> that a node can start as an asynchronous Slony-I slave, catch up ... and
> switch over to synchronous multimaster without stopping the cluster. I
> have no clue yet how to do that, but I refuse to think smaller.
>
>
> Jan
>


--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

Re: Proposal for a cascaded master-slave replication system

From

Christopher Browne

Date:

12 November 2003, 15:59:31

In the last exciting episode, JanWieck@Yahoo.com (Jan Wieck) wrote:
> I look forward to your comments.

It is not evident from the paper what approach is taken to dealing
with the duplicate key conflicts.

The example:

  UPDATE table SET col1 = 'temp' where col = 'A';
  UPDATE table SET col1 = 'A' where col = 'B';
  UPDATE table SET col1 = 'B' where col = 'temp';

I can think of several approaches to this:

1.  The present eRserv code reads what is in the table at the time of
the 'snapshot', and so tries to pass on:

  update table set col1 = 'B' where otherkey = 123;
  update table set col1 = 'A' where otherkey = 456;

which breaks because at some point, col1 is not unique, irrespective
of what order we apply the changes in.

2.  If the contents as at the time of the COMMIT are stored in the log
table, then we would do all three updates in the destination DB, in
order, as shown above.

Either we have to:
 a) Store the updated fields in the replication tables somewhere, or
 b) Make the third UPDATE wait for the updates to be stored in a
    file somewhere.

3.  The replication code requires that any given key only be updated
once in a 'snapshot', so that the updates may be unambiguously
partitioned:

  UPDATE table SET col1 = 'temp' where col = 'A' ;  -- and otherkey = 123
  UPDATE table SET col1 = 'A' where col = 'B';      -- and otherkey = 456
--   Must partition here before hitting #123 again  --
  UPDATE table SET col1 = 'B' where col = 'temp';   -- and otherkey = 123

The third UPDATE may have to be held up until the "partition" is set
up, right?

4.  I seem to recall a recent discussion about the possibility of
deferring the UNIQUE constraint 'til the END of a commit, with the
result that we could simplify to

  update table set col1 = 'B' where otherkey = 123;
  update table set col1 = 'A' where otherkey = 456;

and discover that the UNIQUE constraint was relaxed just long enough
for us to make the TWO changes that in the end combined to being
unique.

None of these look like they turn out totally happily, or am I missing
an approach?
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','ntlug.org').
http://www.ntlug.org/~cbbrowne/languages.html
"Java and C++ make you think that the new ideas are like the old ones.
Java is the most distressing thing to hit computing since MS-DOS."
-- Alan Kay

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Hans-Jürgen Schönig

Date:

12 November 2003, 16:00:46

Jan Wieck wrote:
> Dear community,
>
> for some reason the post I sent yesterday night still did not show up on
> the mailing lists. I have set up some links on the developers side under
> http://developer.postgresql.org/~wieck/slony1.html
>
> The concept will be the base for some of my work as a Software Engineer
> here at Afilias USA INC. in the near future. Afilias is like many of you
> in need of reliable and performant replication solutions for backup and
> failover purposes. We started this work a couple of weeks ago by
> defining the goals and required features for our usage of PostgreSQL.
>
> Slony-I will be the first of 2 distinct replication systems designed
> with the 24/7 datacenter in mind.
>
> We want to build this system as a community project. The plan was from
> the beginning to release the product under the BSD license. And we think
> it is best to start it as such and to ask for suggestions during the
> design phase already.
>
> I would like to start developing the replication engine itself as soon
> as possible. And as a PostgreSQL CORE developer I will sure put some of
> my spare time into this as well. On the other hand there is absolutely
> no design other than "they mostly call some stored procedures" done for
> the frontend tools yet, and I think that we need some real good admin
> tools in the end.
>
> I look forward to your comments.
>
>
> Jan
>

Jan,

First of all we really appreciate that this is going to be an Open
Source project.
There is something I wanted to add from a marketing point of view: I
have done many public talks in the 2 years or so. There is one question
people keep asking me: "How about the pgreplication project?". In every
training course, at any conference people keep asking for synchronous
replication. We have offered this people some async solutions which are
already out there but nobody seems to be interested in having it (my
person impression). People keep asking for a sync approach via email but
nobody seems to care about an async approach. This does not mean that
async is bad but we can see a strong demand for synchronous replication.

Meanwhile we seem to be in a situation where PostgreSQL is rather
competing against Oracle than against MySQL. In our case there are more
people asking for Oracle -> Pg migration than for MySQL -> Pg. MySQL
does not seem to be the great enemy because most people know that it is
an inferior product anyway. What I want to point out is that some people
want an alternative Oracle's Real Application Cluster. They want load
balancing and hot failover. Even data centers asking for replication did
not want to have an async approach in the past.

I just wanted to mention that because personally I don't have the
impression that an additional async project is worth the effort.

Note: This does not mean that it is bad to have one more product ;).

    Cheers,

        Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Jan Wieck

Date:

12 November 2003, 16:31:01

Jordan Henderson wrote:

> Jan,
>
> I am wondering if you are familar with the work covered in 'Recovery in
> Parallel Database Systems' by Svein-Olaf Hvasshovd (Vieweg) ? The book is an
> excellent detailed description covering high availablility DB
> implementations.

No, but it sounds like something I allways wanted to have.

>
> I think your right on by not thinking smaller!!

Thanks

Jan

>
> Jordan Henderson
> On Wednesday 12 November 2003 10:45, Jan Wieck wrote:
>> Hans-Jürgen Schönig wrote:
>> > Jan,
>> >
>> > First of all we really appreciate that this is going to be an Open
>> > Source project.
>> > There is something I wanted to add from a marketing point of view: I
>> > have done many public talks in the 2 years or so. There is one question
>> > people keep asking me: "How about the pgreplication project?". In every
>> > training course, at any conference people keep asking for synchronous
>> > replication. We have offered this people some async solutions which are
>> > already out there but nobody seems to be interested in having it (my
>> > person impression). People keep asking for a sync approach via email but
>> > nobody seems to care about an async approach. This does not mean that
>> > async is bad but we can see a strong demand for synchronous replication.
>> >
>> > Meanwhile we seem to be in a situation where PostgreSQL is rather
>> > competing against Oracle than against MySQL. In our case there are more
>> > people asking for Oracle -> Pg migration than for MySQL -> Pg. MySQL
>> > does not seem to be the great enemy because most people know that it is
>> > an inferior product anyway. What I want to point out is that some people
>> > want an alternative Oracle's Real Application Cluster. They want load
>> > balancing and hot failover. Even data centers asking for replication did
>> > not want to have an async approach in the past.
>>
>> Hans-Jürgen,
>>
>> we are well aware of the high demand for multi-master replication
>> addressing load balancing and clustering. We have that need ourself as
>> well and I plan to work on a follow-up project as soon as Slony-I is
>> released. But as of now, we see a higher priority for a reliable master
>> slave system that includes the cascading and backup features described
>> in my concept. There are a couple of different similar product out
>> there, I know. But show me one of them where you can failover without
>> becoming the single point of failure? We've just recently seen ... or
>> better "where not able to see anything any more" how failures tend to
>> ripple through systems - half of the US East Coast was dark. So where is
>> the replication system where a slave becomes the "master", and not a
>> standalone server. Show me one that has a clear concept of failback, one
>> that has hot-join as a primary design goal. These are the features that
>> I expect if something is labeled "Enterprise Level".
>>
>> As far as my ideas for multi-master go, it will be a synchronous
>> solution using group communication. My idea is "group commit" instead of
>> 2-Phase ... and an early stage test hack has replicated some update 3
>> weeks ago. The big challange will be to integrate the two systems so
>> that a node can start as an asynchronous Slony-I slave, catch up ... and
>> switch over to synchronous multimaster without stopping the cluster. I
>> have no clue yet how to do that, but I refuse to think smaller.
>>
>>
>> Jan
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Proposal for a cascaded master-slave replication system

From

Jan Wieck

Date:

12 November 2003, 16:51:38

Christopher Browne wrote:

> In the last exciting episode, JanWieck@Yahoo.com (Jan Wieck) wrote:
>> I look forward to your comments.
>
> It is not evident from the paper what approach is taken to dealing
> with the duplicate key conflicts.
>
> The example:
>
>   UPDATE table SET col1 = 'temp' where col = 'A';
>   UPDATE table SET col1 = 'A' where col = 'B';
>   UPDATE table SET col1 = 'B' where col = 'temp';
>
> I can think of several approaches to this:

One fundamental flaw in eRServer is that it tries to "combine" multiple
updates into one update at snapshot-time in the first place. The
application can do these three steps in one single transaction, how do
you split that?

You can develop an automatic recovery for that. At the time you got a
dupkey error, you rollback but remember the _rserv_ts and table_id that
caused the dupkey. In the next sync attempt, you fetch the row with that
_rserv_ts and delete all rows from the slave table with that primary key
plus fake INSERT log rows on the master for the same. Then you prepare
and apply and cross fingers that nobody touched the same row again
already between your last attempt and now ... which was how many hours
ago? And since you can only find one dupkey per round, you might do this
a few times with larger and larger lists of _rserv_ts,table_id.

The idea of not accumulating log forever, but just holding this status
table (the name log is misleading in eRServer, it holds flags telling
"the row with _rserv_ts=nnnn got INS|UPD|DEL'd") has one big advantage.
However long your slave does not sync, your master will not run out of
space.

But I don't think that there is value in the attempt to let a slave
catch up the last 4 days at once anyway. Drop it and use COPY. When your
slave does not come up before you have modified half your database, it
will be faster this way anyway.

Jan

>
> 1.  The present eRserv code reads what is in the table at the time of
> the 'snapshot', and so tries to pass on:
>
>   update table set col1 = 'B' where otherkey = 123;
>   update table set col1 = 'A' where otherkey = 456;
>
> which breaks because at some point, col1 is not unique, irrespective
> of what order we apply the changes in.
>
> 2.  If the contents as at the time of the COMMIT are stored in the log
> table, then we would do all three updates in the destination DB, in
> order, as shown above.
>
> Either we have to:
>  a) Store the updated fields in the replication tables somewhere, or
>  b) Make the third UPDATE wait for the updates to be stored in a
>     file somewhere.
>
> 3.  The replication code requires that any given key only be updated
> once in a 'snapshot', so that the updates may be unambiguously
> partitioned:
>
>   UPDATE table SET col1 = 'temp' where col = 'A' ;  -- and otherkey = 123
>   UPDATE table SET col1 = 'A' where col = 'B';      -- and otherkey = 456
> --   Must partition here before hitting #123 again  --
>   UPDATE table SET col1 = 'B' where col = 'temp';   -- and otherkey = 123
>
> The third UPDATE may have to be held up until the "partition" is set
> up, right?
>
> 4.  I seem to recall a recent discussion about the possibility of
> deferring the UNIQUE constraint 'til the END of a commit, with the
> result that we could simplify to
>
>   update table set col1 = 'B' where otherkey = 123;
>   update table set col1 = 'A' where otherkey = 456;
>
> and discover that the UNIQUE constraint was relaxed just long enough
> for us to make the TWO changes that in the end combined to being
> unique.
>
> None of these look like they turn out totally happily, or am I missing
> an approach?

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Andrew Sullivan

Date:

12 November 2003, 17:43:48

On Wed, Nov 12, 2003 at 02:08:23PM +0100, Hans-J?rgen Sch?nig wrote:

> an inferior product anyway. What I want to point out is that some people
> want an alternative Oracle's Real Application Cluster. They want load
> balancing and hot failover. Even data centers asking for replication did
> not want to have an async approach in the past.

I think Jan has already outlined his more-distant-future idea, but
I'd also like to know whether the people who are asking for a
replacement for RAC are willing to invest in it?  You could buy some
_awfully_ good development time for even a year's worth of licensing
for RAC.  I get the impression from the Postgres-R list that their
biggest obstacle is development resources.

<rant> People often like to say they need hot-fail-capable, five
nines, 24/7/365 systems.  For most applications, I just do not
believe that, and the truth is that the cost of getting from three
nines to four (never mind five) is so great that people cheat: one
paragraph has the "five nines" clause, and the next paragraph talks
about scheduled downtime.  In a real "five nines" system (the phone
company, say, or the air traffic control system), the time for
scheduled downtime is just the cumulative possible outage at any node
when it is being switched with its replacement.  Five minutes a year
is a pretty high bar to jump, and most people long ago concluded that
you don't actually need it for most applications. </rant>

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Bruce Momjian

Date:

13 November 2003, 00:37:26

Hans-J�rgen Sch�nig wrote:
> Meanwhile we seem to be in a situation where PostgreSQL is rather
> competing against Oracle than against MySQL. In our case there are more
> people asking for Oracle -> Pg migration than for MySQL -> Pg. MySQL
> does not seem to be the great enemy because most people know that it is
> an inferior product anyway.

I can confirm Hans' impressions --- I get very few questions about MySQL
vs. PostgreSQL, at least in the past few years.  People still using
MySQL at this point know they are using something inferior to
PostgreSQL, and if they didn't, the new MySQL licensing has made it
abundantly clear.  MySQL just isn't in the same league, and probably
will never be.  What people want is Informix/Oracle/MS-SQL => PostgreSQL.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Proposal for a cascaded master-slave replication system

From

Andrew Sullivan

Date:

13 November 2003, 11:08:45

On Wed, Nov 12, 2003 at 07:46:11PM -0500, James Robinson wrote:
> Speaking from a non-profit whose enterprise data sits inside postgres,
> we would be willing to invest a few thousand dollars into the pot of
> synchronous multi-master replication. Postgres-r sounded absolutely
> marvelous to us back in the day that it was rumored to be one of the
> possible deliverables of 7.4.

As far as I know, Postgres-R hackers are eager for funding.  There is
a foundation which was established to fund such activities, in an
effort to pool the resources that people had to donate.  You could
ask in the Postgres-R mailing list about it.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Andrew Sullivan

Date:

13 November 2003, 12:16:59

On Wed, Nov 12, 2003 at 04:43:03PM -0500, Andrew Sullivan wrote:
> <rant> People often like to say they need hot-fail-capable, five

BTW, this was not a rant at the person posting -- he was just
reporting what he has heard.  I've heard it plenty, too, and the
people whence I've heard it are the rant targets.  Since hot-failover
replication really is indistinguishable from magic in the eyes of the
correctly-shaped-hair crowd, they ask for it all over the place,
figuring it'll be free.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Proposal for a cascaded master-slave replication system

From

Andrew Sullivan

Date:

13 November 2003, 12:20:07

On Tue, Nov 11, 2003 at 03:38:53PM -0500, Christopher Browne wrote:
> In the last exciting episode, JanWieck@Yahoo.com (Jan Wieck) wrote:
> > I look forward to your comments.
>
> It is not evident from the paper what approach is taken to dealing
> with the duplicate key conflicts.
>
> The example:
>
>   UPDATE table SET col1 = 'temp' where col = 'A';
>   UPDATE table SET col1 = 'A' where col = 'B';
>   UPDATE table SET col1 = 'B' where col = 'temp';

It's not a problem, because as the proposal states, the actual SQL is
to be sent in order to the slave.  That is, only consistent sets are
sent: you can't have a condition on the slave that never could have
obtained on the master.  This means greater overhead for cases where
the same row is altered repeatedly, but it's safe.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

Jan Wieck

Date:

13 November 2003, 12:31:22

Bruce Momjian wrote:

> Hans-J?rgen Sch?nig wrote:
>> Meanwhile we seem to be in a situation where PostgreSQL is rather
>> competing against Oracle than against MySQL. In our case there are more
>> people asking for Oracle -> Pg migration than for MySQL -> Pg. MySQL
>> does not seem to be the great enemy because most people know that it is
>> an inferior product anyway.
>
> I can confirm Hans' impressions --- I get very few questions about MySQL
> vs. PostgreSQL, at least in the past few years.  People still using
> MySQL at this point know they are using something inferior to
> PostgreSQL, and if they didn't, the new MySQL licensing has made it
> abundantly clear.  MySQL just isn't in the same league, and probably
> will never be.  What people want is Informix/Oracle/MS-SQL => PostgreSQL.
>

I would like to add that there is a good reason why they aren't in the
same league. As a rule of thumb one can say that the smaller a software
company, the faster some development must turn into revenue. That is why
Oracle and Microsoft have the "time" to do things right. They can throw
20 manyears at a project and if it turns out that wasn't enough, double
down on that. I include MS on purpose here, because they gain that time
from some products, and then use it on others like SQL server. MySQL on
the other hand didn't have that "time" in the past, and look what they
do as soon as they have 19.5 million seconds more "time" ... the only
thing that is right, replace the whole architecture, or what is that
MaxSQL move? I hope 19.5 million seconds are enough, honestly. Because
nobody will double down in their case.

PostgreSQL does not have that problem because the base project itself
does not depend on any companies success. Time is relative. Our time is
very patient compared to their time. PostgreSQL gets the time it needs
for free.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: Proposal for a cascaded master-slave replication system

From

James Robinson

Date:

14 November 2003, 14:17:56

Speaking from a non-profit whose enterprise data sits inside postgres,
we would be willing to invest a few thousand dollars into the pot of
synchronous multi-master replication. Postgres-r sounded absolutely
marvelous to us back in the day that it was rumored to be one of the
possible deliverables of 7.4.

Not so much for nine-nines of uptime, but for the case of being able to
take a full hit on a DB box in production yet still remain running w/o
any data loss. Our application servers are JBoss and will be
high-available clustered / fully-mirrored, but even with RAID on the DB
box one bad thing could take it down, and the data between the hourly
backup would go down with it. We have experimented in-house with C-JDBC
[ being 'lucky' enough to have all DB writes to go through JDBC ], but
would feel more confident w/o involving another service in-between the
application and the DB layers, especially since it is not yet fully
high-available -- currently shifts the single-point of failure from the
DB layer to the CJDBC controller single point. It is reported to have
HA via group communication 'soon', but, you never can tell. Read up on
it at http://c-jdbc.objectweb.org/ , but the end feel I got from it was
not nearly so warm and cozy with the problem being solved at the right
place -- the postgres-r way felt much more robust / speedy.

We won't ever have parallel oracle dollars, but we would have dollars
to bring higher-availability to postgres. 'Cause its our butt on the
line hosting our client's data.

----
James Robinson
Socialserve.com

Re: [HACKERS] Proposal for a cascaded master-slave replication system

From

dalgoda@ix.netcom.com (Mike Castle)

Date:

14 November 2003, 17:46:31

In article <200311131624.hADGO7824904@candle.pha.pa.us>,
Bruce Momjian  <pgman@candle.pha.pa.us> wrote:
>Yes, I noticed that we have a much longer view of our software lifecycle
>than most other open source projects.

I think the only other things comparable are the OSes themselves.  The
Linux kernel and the releases of the various *BSDs seem to be on similar
scales.

mrc
--
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc