Home > mailing lists

Re: [MLIST] Re: [mail] Re: Big 7.4 items - Replication - Mailing list pgsql-hackers

From	Al Sutton
Subject	Re: [MLIST] Re: [mail] Re: Big 7.4 items - Replication
Date	December 15, 2002 10:06:24
Msg-id	00e801c2a44b$8af8d160$0100a8c0@cloud Whole thread Raw
In response to	Re: Big 7.4 items - Replication (Bruce Momjian <pgman@candle.pha.pa.us>)
List	pgsql-hackers

Tree view

David,

This can be resolved by requiring that for any transaction to succeed the
entrypoint database must receive acknowlegements from n/2 + 0.5 (rounded up
to the nearest integer) databases where n is the total number in the
replicant set. The following cases are shown as an example;

Total Number of databases: 2
Number required to accept transaction: 2

Total Number of databases: 3
Number required to accept transaction: 2

Total Number of databases: 4
Number required to accept transaction: 3

Total Number of databases: 5
Number required to accept transaction: 3

Total Number of databases: 6
Number required to accept transaction: 4

Total Number of databases: 7
Number required to accept transaction: 4

Total Number of databases: 8
Number required to accept transaction: 5

This would prevent two replicant sub-sets forming, because it is impossible
for both sets to have over 50% of the databases.

Applications could be able to detect when a database has dropped out of the
replicant set because the database could report a state of "Unable to obtain
majority consesus". This would allow applications differentiate between a
database out of the set where writing to other databases in the set could
yield a sucessful result, and "Unable to commit due to conflict" where
trying other databases is pointless.

Al

Example
----- Original Message -----
From: "David Walker" <pgsql@grax.com>
To: "Al Sutton" <al@alsutton.com>; "Darren Johnson"
<darren@up.hrcoxmail.com>
Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; "Jan Wieck"
<JanWieck@Yahoo.com>; <shridhar_daithankar@persistent.co.in>;
"PostgreSQL-development" <pgsql-hackers@postgresql.org>
Sent: Sunday, December 15, 2002 2:29 PM
Subject: Re: [MLIST] Re: [mail] Re: [HACKERS] Big 7.4 items - Replication


> Another concern I have with multi-master systems is what happens if the
> network splits in 2 so that 2 master systems are taking commits for 2
> separate sets of clients.  It seems to me that to re-sync the 2 databases
> upon the network healing would be a very complex task or impossible task.
>
> On Sunday 15 December 2002 04:16 am, Al Sutton wrote:
> > Many thanks for the explanation. Could you explain to me where the order
or
> > the writeset for the following scenario;
> >
> > If a tranasction takes 50ms to reach one database from another, for a
> > specific data element (called X), the following timeline occurs
> >
> > at 0ms, T1(X) is written to system A.
> > at 10ms, T2(X) is written to system B.
> >
> > Where T1(X) and T2(X) conflict.
> >
> > My concern is that if the Group Communication Daemon (gcd) is operating
on
> > each database,  a successful result for T1(X) will returned to the
client
> > talking to database A because T2(X) has not reached it, and thus no
> > conflict is known about, and a sucessful result is returned to the
client
> > submitting T2(X) to database B because it is not aware of T1(X). This
would
> > mean that the two clients beleive bothe T1(X) and T2(X) completed
> > succesfully, yet they can not due to the conflict.
> >
> > Thanks,
> >
> > Al.
> >
> > ----- Original Message -----
> > From: "Darren Johnson" <darren@up.hrcoxmail.com>
> > To: "Al Sutton" <al@alsutton.com>
> > Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; "Jan Wieck"
> > <JanWieck@Yahoo.com>; <shridhar_daithankar@persistent.co.in>;
> > "PostgreSQL-development" <pgsql-hackers@postgresql.org>
> > Sent: Saturday, December 14, 2002 6:48 PM
> > Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication
> >
> > > >b) The Group Communication blob will consist of a number of processes
> >
> > which
> >
> > > >need to talk to all of the others to interrogate them for changes
which
> >
> > may
> >
> > > >conflict with the current write that being handled and then issue the
> > > >transaction response. This is basically the two phase commit solution
> >
> > with
> >
> > > >phases moved into the group communication process.
> > > >
> > > >I can see the possibility of using solution b and having less group
> > > >communication processes than databases as attempt to simplify things,
> > > > but this would mean the loss of a number of databases if the machine
> > > > running
> >
> > the
> >
> > > >group communication process for the set of databases is lost.
> > >
> > > The group communication system doesn't just run on one system.  For
> > > postgres-r using spread
> > > there is actually a spread daemon that runs on each database server.
It
> > > has nothing to do with
> > > detecting the conflicts.  Its job is to deliver messages in a total
> > > order for writesets or simple order
> > > for commits, aborts, joins, etc.
> > >
> > > The detection of conflicts will be done at the database level, by a
> > > backend processes.  The basic
> > > concept is "if all databases get the writesets (changes) in the exact
> > > same order, apply them in a
> > > consistent order, avoid conflicts, then one copy serialization is
> > > achieved.  (one copy of the database
> > > replicated across all databases in the replica)
> > >
> > > I hope that explains the group communication system's responsibility.
> > >
> > > Darren
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------(end of
broadcast)---------------------------
> > > TIP 5: Have you checked our extensive FAQ?
> > >
> > > http://www.postgresql.org/users-lounge/docs/faq.html
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> > http://archives.postgresql.org
>
>

pgsql-hackers by date:

From: David Walker
Date: 15 December 2002, 09:30:01
Subject: Re: [MLIST] Re: [mail] Re: Big 7.4 items - Replication

From: Peter Eisentraut
Date: 15 December 2002, 11:13:58
Subject: Re: PQnotifies() in 7.3 broken?

Re: [MLIST] Re: [mail] Re: Big 7.4 items - Replication - Mailing list pgsql-hackers

Previous

Next