Home > mailing lists

Re: Proposal for a cascaded master-slave replication system - Mailing list pgsql-general

From	Jan Wieck
Subject	Re: Proposal for a cascaded master-slave replication system
Date	November 12, 2003 16:51:38
Msg-id	3FB29D21.4050308@Yahoo.com Whole thread Raw
In response to	Re: Proposal for a cascaded master-slave replication system (Christopher Browne <cbbrowne@acm.org>)
List	pgsql-general

Tree view

Christopher Browne wrote:

> In the last exciting episode, JanWieck@Yahoo.com (Jan Wieck) wrote:
>> I look forward to your comments.
>
> It is not evident from the paper what approach is taken to dealing
> with the duplicate key conflicts.
>
> The example:
>
>   UPDATE table SET col1 = 'temp' where col = 'A';
>   UPDATE table SET col1 = 'A' where col = 'B';
>   UPDATE table SET col1 = 'B' where col = 'temp';
>
> I can think of several approaches to this:

One fundamental flaw in eRServer is that it tries to "combine" multiple
updates into one update at snapshot-time in the first place. The
application can do these three steps in one single transaction, how do
you split that?

You can develop an automatic recovery for that. At the time you got a
dupkey error, you rollback but remember the _rserv_ts and table_id that
caused the dupkey. In the next sync attempt, you fetch the row with that
_rserv_ts and delete all rows from the slave table with that primary key
plus fake INSERT log rows on the master for the same. Then you prepare
and apply and cross fingers that nobody touched the same row again
already between your last attempt and now ... which was how many hours
ago? And since you can only find one dupkey per round, you might do this
a few times with larger and larger lists of _rserv_ts,table_id.

The idea of not accumulating log forever, but just holding this status
table (the name log is misleading in eRServer, it holds flags telling
"the row with _rserv_ts=nnnn got INS|UPD|DEL'd") has one big advantage.
However long your slave does not sync, your master will not run out of
space.

But I don't think that there is value in the attempt to let a slave
catch up the last 4 days at once anyway. Drop it and use COPY. When your
slave does not come up before you have modified half your database, it
will be faster this way anyway.

Jan

>
> 1.  The present eRserv code reads what is in the table at the time of
> the 'snapshot', and so tries to pass on:
>
>   update table set col1 = 'B' where otherkey = 123;
>   update table set col1 = 'A' where otherkey = 456;
>
> which breaks because at some point, col1 is not unique, irrespective
> of what order we apply the changes in.
>
> 2.  If the contents as at the time of the COMMIT are stored in the log
> table, then we would do all three updates in the destination DB, in
> order, as shown above.
>
> Either we have to:
>  a) Store the updated fields in the replication tables somewhere, or
>  b) Make the third UPDATE wait for the updates to be stored in a
>     file somewhere.
>
> 3.  The replication code requires that any given key only be updated
> once in a 'snapshot', so that the updates may be unambiguously
> partitioned:
>
>   UPDATE table SET col1 = 'temp' where col = 'A' ;  -- and otherkey = 123
>   UPDATE table SET col1 = 'A' where col = 'B';      -- and otherkey = 456
> --   Must partition here before hitting #123 again  --
>   UPDATE table SET col1 = 'B' where col = 'temp';   -- and otherkey = 123
>
> The third UPDATE may have to be held up until the "partition" is set
> up, right?
>
> 4.  I seem to recall a recent discussion about the possibility of
> deferring the UNIQUE constraint 'til the END of a commit, with the
> result that we could simplify to
>
>   update table set col1 = 'B' where otherkey = 123;
>   update table set col1 = 'A' where otherkey = 456;
>
> and discover that the UNIQUE constraint was relaxed just long enough
> for us to make the TWO changes that in the end combined to being
> unique.
>
> None of these look like they turn out totally happily, or am I missing
> an approach?

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

pgsql-general by date:

From: "Joshua D. Drake"
Date: 12 November 2003, 16:45:37
Subject: Re: pl/perl function life and variable scope - concurrency

From: Suchandra Thapa
Date: 12 November 2003, 17:04:23
Subject: serial type vs. sequences

Re: Proposal for a cascaded master-slave replication system - Mailing list pgsql-general

Previous

Next