Re: [MLIST] Re: [mail] Re: Big 7.4 items - Replication - Mailing list pgsql-hackers
From | Al Sutton |
---|---|
Subject | Re: [MLIST] Re: [mail] Re: Big 7.4 items - Replication |
Date | |
Msg-id | 00e801c2a44b$8af8d160$0100a8c0@cloud Whole thread Raw |
In response to | Re: Big 7.4 items - Replication (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
David, This can be resolved by requiring that for any transaction to succeed the entrypoint database must receive acknowlegements from n/2 + 0.5 (rounded up to the nearest integer) databases where n is the total number in the replicant set. The following cases are shown as an example; Total Number of databases: 2 Number required to accept transaction: 2 Total Number of databases: 3 Number required to accept transaction: 2 Total Number of databases: 4 Number required to accept transaction: 3 Total Number of databases: 5 Number required to accept transaction: 3 Total Number of databases: 6 Number required to accept transaction: 4 Total Number of databases: 7 Number required to accept transaction: 4 Total Number of databases: 8 Number required to accept transaction: 5 This would prevent two replicant sub-sets forming, because it is impossible for both sets to have over 50% of the databases. Applications could be able to detect when a database has dropped out of the replicant set because the database could report a state of "Unable to obtain majority consesus". This would allow applications differentiate between a database out of the set where writing to other databases in the set could yield a sucessful result, and "Unable to commit due to conflict" where trying other databases is pointless. Al Example ----- Original Message ----- From: "David Walker" <pgsql@grax.com> To: "Al Sutton" <al@alsutton.com>; "Darren Johnson" <darren@up.hrcoxmail.com> Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; "Jan Wieck" <JanWieck@Yahoo.com>; <shridhar_daithankar@persistent.co.in>; "PostgreSQL-development" <pgsql-hackers@postgresql.org> Sent: Sunday, December 15, 2002 2:29 PM Subject: Re: [MLIST] Re: [mail] Re: [HACKERS] Big 7.4 items - Replication > Another concern I have with multi-master systems is what happens if the > network splits in 2 so that 2 master systems are taking commits for 2 > separate sets of clients. It seems to me that to re-sync the 2 databases > upon the network healing would be a very complex task or impossible task. > > On Sunday 15 December 2002 04:16 am, Al Sutton wrote: > > Many thanks for the explanation. Could you explain to me where the order or > > the writeset for the following scenario; > > > > If a tranasction takes 50ms to reach one database from another, for a > > specific data element (called X), the following timeline occurs > > > > at 0ms, T1(X) is written to system A. > > at 10ms, T2(X) is written to system B. > > > > Where T1(X) and T2(X) conflict. > > > > My concern is that if the Group Communication Daemon (gcd) is operating on > > each database, a successful result for T1(X) will returned to the client > > talking to database A because T2(X) has not reached it, and thus no > > conflict is known about, and a sucessful result is returned to the client > > submitting T2(X) to database B because it is not aware of T1(X). This would > > mean that the two clients beleive bothe T1(X) and T2(X) completed > > succesfully, yet they can not due to the conflict. > > > > Thanks, > > > > Al. > > > > ----- Original Message ----- > > From: "Darren Johnson" <darren@up.hrcoxmail.com> > > To: "Al Sutton" <al@alsutton.com> > > Cc: "Bruce Momjian" <pgman@candle.pha.pa.us>; "Jan Wieck" > > <JanWieck@Yahoo.com>; <shridhar_daithankar@persistent.co.in>; > > "PostgreSQL-development" <pgsql-hackers@postgresql.org> > > Sent: Saturday, December 14, 2002 6:48 PM > > Subject: Re: [mail] Re: [HACKERS] Big 7.4 items - Replication > > > > > >b) The Group Communication blob will consist of a number of processes > > > > which > > > > > >need to talk to all of the others to interrogate them for changes which > > > > may > > > > > >conflict with the current write that being handled and then issue the > > > >transaction response. This is basically the two phase commit solution > > > > with > > > > > >phases moved into the group communication process. > > > > > > > >I can see the possibility of using solution b and having less group > > > >communication processes than databases as attempt to simplify things, > > > > but this would mean the loss of a number of databases if the machine > > > > running > > > > the > > > > > >group communication process for the set of databases is lost. > > > > > > The group communication system doesn't just run on one system. For > > > postgres-r using spread > > > there is actually a spread daemon that runs on each database server. It > > > has nothing to do with > > > detecting the conflicts. Its job is to deliver messages in a total > > > order for writesets or simple order > > > for commits, aborts, joins, etc. > > > > > > The detection of conflicts will be done at the database level, by a > > > backend processes. The basic > > > concept is "if all databases get the writesets (changes) in the exact > > > same order, apply them in a > > > consistent order, avoid conflicts, then one copy serialization is > > > achieved. (one copy of the database > > > replicated across all databases in the replica) > > > > > > I hope that explains the group communication system's responsibility. > > > > > > Darren > > > > > > > > > > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 5: Have you checked our extensive FAQ? > > > > > > http://www.postgresql.org/users-lounge/docs/faq.html > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > >
pgsql-hackers by date: