Re: Big 7.4 items - Replication - Mailing list pgsql-hackers

From Al Sutton
Subject Re: Big 7.4 items - Replication
Date
Msg-id 01d101c2a36f$7c529740$0100a8c0@cloud
Whole thread Raw
In response to Re: Big 7.4 items  (<darren@up.hrcoxmail.com>)
Responses Re: Big 7.4 items - Replication
List pgsql-hackers
For live replication could I propose that we consider the systems A,B, and C
connected to each other independantly (i.e. A has links to B and C, B has
links to A and C, and C has links to A and B), and that replication is
handled by the node receiving the write based transaction.

If we consider a write transaction that arrives at A (called WT(A)), system
A will then send WT(A) to systems B and C via it's direct connections.
System A will receive back either an OK response if there are not conflicts,
a NOT_OK response if there are conflicts, or no response if the system is
unavailable.

If system A receives a NOT_OK response from any other node it begins the
process of rolling back the transaction from all nodes which previously
issued an OK, and the transaction returns a failure code to the client which
submitted WT(A). The other systems (B and C) would track recent transactions
and there would be a specified timeout after which the transaction is
considered safe and could not be rolled out.

Any system not returning an OK or NOT_OK state is assumed to be down, and
error messages are logged to state that the transaction could not be sent to
the system due it it's unavailablility, and any monitoring system would
alter the administrator that a replicant is faulty.

There would also need to be code developed to ensure that a system could be
brought into sync with the current state of other systems within the group
in order to allow new databases to be added, and faulty databases to be
re-entered to the group. This code could also be used for non-realtime
replication to allow databases to be syncronised with the live master.

This would give a multi-master solution whereby a write transaction to any
one node would guarentee that all available replicants would also hold the
data once it is completed, and would also provide the code to handle
scenarios where non-realtime data replication is required.

This system assumes that a majority of transactions will be sucessful (which
should be the case for a well designed system).

Comments?

Al.






----- Original Message -----
From: "Darren Johnson" <darren@up.hrcoxmail.com>
To: "Jan Wieck" <JanWieck@Yahoo.com>
Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>;
<shridhar_daithankar@persistent.co.in>; "PostgreSQL-development"
<pgsql-hackers@postgresql.org>
Sent: Saturday, December 14, 2002 1:28 AM
Subject: [mail] Re: [HACKERS] Big 7.4 items


> >
> >
> >>
> >>Lets say we have systems A, B and C.  Each one has some
> >>changes and sends a writeset to the group communication
> >>system (GSC).  The total order dictates WS(A), WS(B), and
> >>WS(C) and the writes sets are recieved in that order at
> >>each system.  Now C gets WS(A) no conflict, gets WS(B) no
> >>conflict, and receives WS(C).  Now C can commit WS(C) even
> >>before the commit messages C(A) or C(B), because there is no
> >>conflict.
> >>
> >
> >And that is IMHO not synchronous. C does not have to wait for A and B to
> >finish the same tasks. If now at this very moment two new transactions
> >query system A and system C (assuming A has not yet committed WS(C)
> >while C has), they will get different data back (thanks to non-blocking
> >reads). I think this is pretty asynchronous.
> >
>
> So if we hold WS(C) until we receive commit messages for WS(A) and
> WS(B), will that meet
> your synchronous expectations, or do all the systems need to commit the
> WS in the same order
> and at the same exact time.
>
> >
> >
> >It doesn't lead to inconsistencies, because the transaction on A cannot
> >do something that is in conflict with the changes made by WS(C), since
> >it's WS(A)2 will come back after WS(C) arrived at A and thus WS(C)
> >arriving at A will cause WS(A)2 to rollback (WS used synonymous to Xact
> >in this context).
> >
> Right
>
> >
> >Hope this doesn't add too much confusion :-)
> >
> No, however I guess I need to adjust my slides to include your
> definition of synchronous
> replication.  ;-)
>
> Darren
>
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>




pgsql-hackers by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Big 7.4 items
Next
From: Kevin Brown
Date:
Subject: Re: PQnotifies() in 7.3 broken?