Thread: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Sun, 2011-03-06 at 18:09 -0500, Andrew Dunstan wrote: > > On 03/06/2011 05:51 PM, Simon Riggs wrote: > > Efficient transaction-controlled synchronous replication. > > > > I'm glad this is in, but I thought we agreed NOT to call it "synchronous > replication". The discussion on the thread was that its not sync rep unless we have the strictest guarantees. We have the strictest guarantees, so it qualifies as sync rep. Relaxations are possible and, to some people, desirable. Perhaps there is a more marketable term, and if so, we can rebrand. It wouldn't be the first time things got renamed in beta. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Heikki Linnakangas
Date:
On 07.03.2011 01:28, Simon Riggs wrote: > On Sun, 2011-03-06 at 18:09 -0500, Andrew Dunstan wrote: >> >> On 03/06/2011 05:51 PM, Simon Riggs wrote: >>> Efficient transaction-controlled synchronous replication. >> >> I'm glad this is in, but I thought we agreed NOT to call it "synchronous >> replication". > > The discussion on the thread was that its not sync rep unless we have > the strictest guarantees. We have the strictest guarantees, so it > qualifies as sync rep. What do you mean by "strictes guarantees"? I don't see allow_synchronous_standby setting in the committed patch. I presume you didn't make allow_synchronous_standby=off the default behavior. Also, the documentation that describes this as two-safe replication and claims that "the only possibility that data can be lost is if both the primary and the standby suffer crashes at the same time" needs big fat caveats to clarify that this doesn't actually achieve those guarantees. Please change the name. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Mon, 2011-03-07 at 09:29 +0200, Heikki Linnakangas wrote: > I presume you didn't make allow_synchronous_standby=off the default > behavior. You presume incorrectly. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Heikki Linnakangas
Date:
On 07.03.2011 09:48, Simon Riggs wrote: > On Mon, 2011-03-07 at 09:29 +0200, Heikki Linnakangas wrote: > >> I presume you didn't make allow_synchronous_standby=off the default >> behavior. Sorry, s/allow_synchronous_standby/allow_standalone_master > You presume incorrectly. Ok, ok then. Thank you! Looks like I need to git pull and get myself up-to-speed with these latest developments :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Andrew Dunstan
Date:
On 03/07/2011 02:29 AM, Heikki Linnakangas wrote: > On 07.03.2011 01:28, Simon Riggs wrote: >> On Sun, 2011-03-06 at 18:09 -0500, Andrew Dunstan wrote: >>> >>> On 03/06/2011 05:51 PM, Simon Riggs wrote: >>>> Efficient transaction-controlled synchronous replication. >>> >>> I'm glad this is in, but I thought we agreed NOT to call it >>> "synchronous >>> replication". >> >> The discussion on the thread was that its not sync rep unless we have >> the strictest guarantees. We have the strictest guarantees, so it >> qualifies as sync rep. > > What do you mean by "strictes guarantees"? > > I don't see allow_synchronous_standby setting in the committed patch. > I presume you didn't make allow_synchronous_standby=off the default > behavior. Also, the documentation that describes this as two-safe > replication and claims that "the only possibility that data can be > lost is if both the primary and the standby suffer crashes at the same > time" needs big fat caveats to clarify that this doesn't actually > achieve those guarantees. > > Please change the name. > Previously, Simon said: > Truly "synchronous" requires two-phase commit, which this never was. So I too am confused about how it's now become "truly synchronous". Are we saying this give the same or better guarantees than a 2PC setup? cheers andrew
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Heikki Linnakangas
Date:
On 07.03.2011 15:30, Andrew Dunstan wrote: > Previously, Simon said: > >> Truly "synchronous" requires two-phase commit, which this never was. > > So I too am confused about how it's now become "truly synchronous". Are > we saying this give the same or better guarantees than a 2PC setup? The guarantee we have now with synchronous_replication=on is that when the server acknowledges a commit to the client (ie. when COMMIT command returns), the transaction is safely flushed to disk on the master and at least one synchronous standby server. What you don't get is a guarantee on what happens to transactions that were not acknowledged to the client. For example, if you pull the power plug, the transaction that was just being committed might be committed on the master, but not yet on the standby. For me, that's enough to call it "synchronous replication". It provides a useful guarantee to the client. But you could argue for an even stricter definition, requiring atomicity so that if a transaction is not successfully replicated for any reason, including crash, it is rolled back in the master too. That would require 2PC. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Andrew Dunstan
Date:
On 03/07/2011 09:02 AM, Heikki Linnakangas wrote: > On 07.03.2011 15:30, Andrew Dunstan wrote: >> Previously, Simon said: >> >>> Truly "synchronous" requires two-phase commit, which this never was. >> >> So I too am confused about how it's now become "truly synchronous". Are >> we saying this give the same or better guarantees than a 2PC setup? > > The guarantee we have now with synchronous_replication=on is that when > the server acknowledges a commit to the client (ie. when COMMIT > command returns), the transaction is safely flushed to disk on the > master and at least one synchronous standby server. > > What you don't get is a guarantee on what happens to transactions that > were not acknowledged to the client. For example, if you pull the > power plug, the transaction that was just being committed might be > committed on the master, but not yet on the standby. > > For me, that's enough to call it "synchronous replication". It > provides a useful guarantee to the client. But you could argue for an > even stricter definition, requiring atomicity so that if a transaction > is not successfully replicated for any reason, including crash, it is > rolled back in the master too. That would require 2PC. > My worry is that the stricter definition is what many people will expect, without reading the fine print. cheers andrew
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Aidan Van Dyk
Date:
On Mon, Mar 7, 2011 at 2:21 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> For me, that's enough to call it "synchronous replication". It provides a >> useful guarantee to the client. But you could argue for an even stricter >> definition, requiring atomicity so that if a transaction is not successfully >> replicated for any reason, including crash, it is rolled back in the master >> too. That would require 2PC. >> > > My worry is that the stricter definition is what many people will expect, > without reading the fine print. They they are either already hosed or already using 2PC. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > if you pull the power plug, the transaction that was just being > committed might be committed on the master, but not yet on the > standby. > For me, that's enough to call it "synchronous replication". It > provides useful guarantee to the client. I don't think most people would expect full 2PC behavior from something called "synchronous replication" -- I agree that a guarantee that a successful commit means it has been written to the master and at least one replica is sufficient. > you could argue for an even stricter definition, requiring > atomicity so that if a transaction is not successfully replicated > for any reason, including crash, it is rolled back in the master > too. That would require 2PC. I'm not sure you can say it breaks atomicity; if proper procedures are followed on recovery, all servers will either reflect the transaction or not, right? It seems to me what you lose is the ability to know whether a transaction for which commit was requested and for which there had not yet been a reply at the time of failure is going to be in your recovered database. In this particular regard it is no different from a standalone or async replication, and you would need 2PC with a proper transaction manager to do better. Getting that additional guarantee may not be worth the performance hit for most people. We train our users to save (or make) a paper copy of whet they were entering if a crash occurs (which, of course, is very rare, but does happen), so they can check the state of it on recovery. It is, of course, important for the programmers to use appropriate database transaction boundaries so that the database is always in a state with internal integrity and from which users can determine the state and proceed on their own. I think we should document the issues, of course. If there is really a demand for a stricter "sync rep" feature, I think it must be built on top of 2PC and some particular transaction manager, which seems a though that makes it pgfoundry material. -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Andrew Dunstan
Date:
On 03/07/2011 09:29 AM, Aidan Van Dyk wrote: > On Mon, Mar 7, 2011 at 2:21 PM, Andrew Dunstan<andrew@dunslane.net> wrote: > >>> For me, that's enough to call it "synchronous replication". It provides a >>> useful guarantee to the client. But you could argue for an even stricter >>> definition, requiring atomicity so that if a transaction is not successfully >>> replicated for any reason, including crash, it is rolled back in the master >>> too. That would require 2PC. >>> >> My worry is that the stricter definition is what many people will expect, >> without reading the fine print. > They they are either already hosed or already using 2PC. > > This is about expectations. The thing that worries me is that the use of this term might cause some people NOT to use 2PC because they think they are getting an equivalent guarantee, when in fact they are not. And that's hardly unreasonable. Here for example is what wikipedia says <http://en.wikipedia.org/wiki/Replication_%28computer_science%29>: Synchronous replication - guarantees "zero data loss" by the means of atomic write operation, i.e. write either completeson both sides or not at all. Write is not considered complete until acknowledgement by both local and remotestorage. cheers andrew
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Aidan Van Dyk
Date:
On Mon, Mar 7, 2011 at 2:29 PM, Aidan Van Dyk <aidan@highrise.ca> wrote: > They they are either already hosed or already using 2PC. Sorry, to expand on my all too brief comment, even *without* replication, they are hosed. Once you issue commit, you have know knowledge if the commit is durable, (or even posibly seen by somoene else even) until you get the acknowledgement of the commit. That's already a posibility with a single machine databse. Adding replication in it, just increases the perioud that window exists for (and the possiblity of things making something "Bad" hit that window). a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Andrew Dunstan <andrew@dunslane.net> wrote: > Synchronous replication - guarantees "zero data loss" by the > means of atomic write operation, i.e. write either completes on > both sides or not at all. So far, so good. > Write is not considered complete until acknowledgement by both > local and remote storage. OK, *if* we want to live up to this definition, we don't seem to have that part covered. Of course, since the connection is broken during the hypothetical crash, it seems hard to acknowledge it on recovery, and short of 2PC I don't see how we roll it back. About the best we could do is somehow have explicit logging of the disposition of unacknowledged commit requests upon recovery, and consider logging of success to be "acknowledgement". Is this logging provided by other databases with "synchronous replication" features? -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Heikki Linnakangas
Date:
On 07.03.2011 17:03, Andrew Dunstan wrote: > This is about expectations. The thing that worries me is that the use of > this term might cause some people NOT to use 2PC because they think they > are getting an equivalent guarantee, when in fact they are not. And > that's hardly unreasonable. Here for example is what wikipedia says > <http://en.wikipedia.org/wiki/Replication_%28computer_science%29>: > > Synchronous replication - guarantees "zero data loss" by the means > of atomic write operation, i.e. write either completes on both sides > or not at all. Write is not considered complete until > acknowledgement by both local and remote storage. Hmm, I've read that wikipedia definition before, but the "atomic" part never caught my eye. You do get zero data loss with what we have; if a meteor strikes the master, no acknowledged transaction is lost. I find that definition a bit confusing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Andrew Dunstan
Date:
On 03/07/2011 10:46 AM, Heikki Linnakangas wrote: > On 07.03.2011 17:03, Andrew Dunstan wrote: >> This is about expectations. The thing that worries me is that the use of >> this term might cause some people NOT to use 2PC because they think they >> are getting an equivalent guarantee, when in fact they are not. And >> that's hardly unreasonable. Here for example is what wikipedia says >> <http://en.wikipedia.org/wiki/Replication_%28computer_science%29>: >> >> Synchronous replication - guarantees "zero data loss" by the means >> of atomic write operation, i.e. write either completes on both sides >> or not at all. Write is not considered complete until >> acknowledgement by both local and remote storage. > > Hmm, I've read that wikipedia definition before, but the "atomic" part > never caught my eye. You do get zero data loss with what we have; if a > meteor strikes the master, no acknowledged transaction is lost. I find > that definition a bit confusing. Maybe it is - I agree the difference might be small. I'm just trying to make sure we don't use a term that could mislead reasonable people about what we're providing. If we're satisfied that we aren't, then keep it. cheers andrew
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Alvaro Herrera
Date:
Excerpts from Andrew Dunstan's message of lun mar 07 12:51:49 -0300 2011: > > On 03/07/2011 10:46 AM, Heikki Linnakangas wrote: > > Hmm, I've read that wikipedia definition before, but the "atomic" part > > never caught my eye. You do get zero data loss with what we have; if a > > meteor strikes the master, no acknowledged transaction is lost. I find > > that definition a bit confusing. > > Maybe it is - I agree the difference might be small. I'm just trying to > make sure we don't use a term that could mislead reasonable people about > what we're providing. If we're satisfied that we aren't, then keep it. I think these terms are used inconsistenly enough across the industry that what would make the most sense would be to use the common term and document accurately what we mean by it, rather than relying on some external entity's definition, which could change (like wikipedia's). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
Hi, sorry for being late to join that bike-shedding discussion. On 03/07/2011 05:09 PM, Alvaro Herrera wrote: > I think these terms are used inconsistenly enough across the industry > that what would make the most sense would be to use the common term and > document accurately what we mean by it, rather than relying on some > external entity's definition, which could change (like wikipedia's). I absolutely agree to Alvaro here. The Wikipedia definition seems to only speak about one local and one remote node. Requiring an ack from "at least one" remote node seems to cover that. Not even Wikipedia goes further in their definition and tries to explain what 'synchronous replication' could mean in case we have more than two nodes. A somewhat common expectation is, that all nodes would have to ack. However, with such a requirement a single node failure brings your cluster to a full stop. So this isn't a practical option. Google invented the term "semi-syncronous" for something that's essentially the same that we have, now, I think. However, I full heartedly hate that term (based on the reasoning that there's no semi-pregnant, either). Others (like me) use "synchronous" or (lately rather) "eager" to mean that only a majority of nodes need to send an ACK. I have to explain what I mean every time. In the end, I don't have a strong opinion either way, anymore. I'm happy to think of the replication between the master and the one standby that's sending an ACK first as "synchronous". (Even if those may well be different standbies for different transactions). Hope to have brought some light into this discussion. Regards Markus Wanner
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
MARK CALLAGHAN
Date:
On Fri, Mar 18, 2011 at 9:27 AM, Markus Wanner <markus@bluegap.ch> wrote: > Google invented the term "semi-syncronous" for something that's > essentially the same that we have, now, I think. However, I full > heartedly hate that term (based on the reasoning that there's no > semi-pregnant, either). We didn't invent the term, we just implemented something that Heikki Tuuri briefly described, for example: http://bugs.mysql.com/bug.php?id=7440 In the Google patch and official MySQL version, the sequence is: 1) commit on master 2) wait for slave to ack 3) return to user After step 1 another user on the master can observe the commit and the following is possible: 1) commit on master 2) other user observes that commit on master 3) master blows up and a user observed a commit that never made it to a slave I do not think this sequence should be possible in a sync replication system. But it is possible in what has been implemented for MySQL. Thus it was named semi-sync rather than sync. -- Mark Callaghan mdcallag@gmail.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Fri, Mar 18, 2011 at 9:16 AM, MARK CALLAGHAN <mdcallag@gmail.com> wrote: > On Fri, Mar 18, 2011 at 9:27 AM, Markus Wanner <markus@bluegap.ch> wrote: >> Google invented the term "semi-syncronous" for something that's >> essentially the same that we have, now, I think. However, I full >> heartedly hate that term (based on the reasoning that there's no >> semi-pregnant, either). > > We didn't invent the term, we just implemented something that Heikki > Tuuri briefly described, for example: > http://bugs.mysql.com/bug.php?id=7440 > > In the Google patch and official MySQL version, the sequence is: > 1) commit on master > 2) wait for slave to ack > 3) return to user > > After step 1 another user on the master can observe the commit and the > following is possible: > 1) commit on master > 2) other user observes that commit on master > 3) master blows up and a user observed a commit that never made it to a slave > > I do not think this sequence should be possible in a sync replication > system. But it is possible in what has been implemented for MySQL. > Thus it was named semi-sync rather than sync. Thanks for the insight. That can't happen with our implementation, I believe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
MARK CALLAGHAN <mdcallag@gmail.com> wrote: > Markus Wanner <markus@bluegap.ch> wrote: >> Google invented the term "semi-syncronous" for something that's >> essentially the same that we have, now, I think. However, I full >> heartedly hate that term (based on the reasoning that there's no >> semi-pregnant, either). To be fair, what we're considering calling semi-synchronous is something which tries to stay in synchronous mode but switches out of it when necessary to meet availability targets. Your analogy doesn't match up at all well -- at least without getting really ugly. > We didn't invent the term, we just implemented something that > Heikki Tuuri briefly described, for example: > http://bugs.mysql.com/bug.php?id=7440 > > In the Google patch and official MySQL version, the sequence is: > 1) commit on master > 2) wait for slave to ack > 3) return to user > > After step 1 another user on the master can observe the commit and > the following is possible: > 1) commit on master > 2) other user observes that commit on master > 3) master blows up and a user observed a commit that never made it > to a slave > > I do not think this sequence should be possible in a sync > replication system. Then the only thing you would consider sync replication, as far as I can see, is two phase commit, which we already have. So your use case seems to be covered already, and we're trying to address other people's needs. The guarantee that some people are looking for is that a successful commit means that the data has been persisted on two separate servers. Others want to try for that, but are willing to compromise it for HA; in general I think they want to know when the guarantee is not there so they can take action to get back to a safer condition. -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Fri, 2011-03-18 at 13:16 +0000, MARK CALLAGHAN wrote: > On Fri, Mar 18, 2011 at 9:27 AM, Markus Wanner <markus@bluegap.ch> wrote: > > Google invented the term "semi-syncronous" for something that's > > essentially the same that we have, now, I think. However, I full > > heartedly hate that term (based on the reasoning that there's no > > semi-pregnant, either). > > We didn't invent the term, we just implemented something that Heikki > Tuuri briefly described, for example: > http://bugs.mysql.com/bug.php?id=7440 > > In the Google patch and official MySQL version, the sequence is: > 1) commit on master > 2) wait for slave to ack > 3) return to user > > After step 1 another user on the master can observe the commit and the > following is possible: > 1) commit on master > 2) other user observes that commit on master > 3) master blows up and a user observed a commit that never made it to a slave > > I do not think this sequence should be possible in a sync replication > system. But it is possible in what has been implemented for MySQL. > Thus it was named semi-sync rather than sync. Thanks for clearing it up Mark. We should definitely not be calling what we have "semi-sync". The semantics are very different. In PostgreSQL other users cannot observe the commit until an acknowledgement has been received. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
Mark, On 03/18/2011 02:16 PM, MARK CALLAGHAN wrote: > We didn't invent the term, we just implemented something that Heikki > Tuuri briefly described, for example: > http://bugs.mysql.com/bug.php?id=7440 Oh, okay, good to know who to blame ;-) However, I didn't mean to offend anybody. > I do not think this sequence should be possible in a sync replication > system. But it is possible in what has been implemented for MySQL. > Thus it was named semi-sync rather than sync. Sure? Their documentation [1] isn't entirely clear on that first: "the master blocks after the commit is done and waits until at least one semisynchronous slave acknowledges that it has received all events for the transaction" and the "slave acknowledges receipt of a transaction's events only after the events have been written to its relay log and flushed to disk". But then continues to say that "[the master is] waiting for acknowledgment from a slave after having performed a commit", so this indeed sounds like the transaction is visible to other sessions before the slave ACKs. So, semi-sync may show temporary inconsistencies in case of a master failure. Wow! Regards Markus Wanner [1] MySQL 5.5 reference manual, 17.3.8. Semisynchronous Replication: http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
Hi, On 03/18/2011 02:40 PM, Kevin Grittner wrote: > Then the only thing you would consider sync replication, as far as I > can see, is two phase commit I think waiting for the ACK before actually making the changes from the transaction visible (COMMIT) would suffice for disallowing such an inconsistency to manifest. But obviously, MySQL decided it's not worth doing that, as it's such a rare event and a short period of time that may show inconsistencies... > people's needs. The guarantee that some people are looking for is > that a successful commit means that the data has been persisted on > two separate servers. Well, MySQL's semi-sync also seems to guarantee that WRT the client confirmation. And transactions always appear committed *before* the client receives the COMMIT acknowledgement, due to the time it takes for the ACK to arrive at the client. It's just the commit *before* receiving the slave's ACK, which might make a transaction visible that's not durable, yet. But I guess that simplified implementation for them... Regards Markus Wanner
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote: > In PostgreSQL other users cannot observe the commit until an > acknowledgement has been received. Really? I hadn't picked up on that. That makes for a lot of complication on crash-and-recovery of a master, but if we can pull it off, that's really cool. If we do that and MySQL doesn't, we definitely don't want to use the same terminology they do, which would imply the same behavior. Apologies for not picking up on that aspect of the implementation. -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/18/2011 03:52 PM, Kevin Grittner wrote: > Really? I hadn't picked up on that. That makes for a lot of > complication on crash-and-recovery of a master What complication do you have in mind here? I think of it the opposite way (at least for Postgres, that is): committing a transaction that's not acknowledged means having to revert a (locally only) committed transaction if you want to use the current data to recover to some cluster-agreed state. (Of course, you can always simply transfer the whole If you don't commit the transaction before the ACK in the first place, you don't have anything special to do upon recovery. Regards Markus Wanner
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Heikki Linnakangas
Date:
On 18.03.2011 16:52, Kevin Grittner wrote: > Simon Riggs<simon@2ndQuadrant.com> wrote: > >> In PostgreSQL other users cannot observe the commit until an >> acknowledgement has been received. > > Really? I hadn't picked up on that. That makes for a lot of > complication on crash-and-recovery of a master, but if we can pull > it off, that's really cool. If we do that and MySQL doesn't, we > definitely don't want to use the same terminology they do, which > would imply the same behavior. To be clear: other users cannot observe the commit until standby acknowledges it - unless the master crashes while waiting for the acknowledgment. If that happens, the commit will be visible to everyone after recovery. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
MARK CALLAGHAN
Date:
On Fri, Mar 18, 2011 at 2:19 PM, Markus Wanner <markus@bluegap.ch> wrote: > Their documentation [1] isn't entirely clear on that first: "the master > blocks after the commit is done and waits until at least one > semisynchronous slave acknowledges that it has received all events for > the transaction" and the "slave acknowledges receipt of a transaction's > events only after the events have been written to its relay log and > flushed to disk". > > But then continues to say that "[the master is] waiting for > acknowledgment from a slave after having performed a commit", so this > indeed sounds like the transaction is visible to other sessions before > the slave ACKs. Yes, their docs are not clear on this. -- Mark Callaghan mdcallag@gmail.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
MARK CALLAGHAN
Date:
On Fri, Mar 18, 2011 at 2:37 PM, Markus Wanner <markus@bluegap.ch> wrote: > Hi, > > On 03/18/2011 02:40 PM, Kevin Grittner wrote: >> Then the only thing you would consider sync replication, as far as I >> can see, is two phase commit > > I think waiting for the ACK before actually making the changes from the > transaction visible (COMMIT) would suffice for disallowing such an > inconsistency to manifest. But obviously, MySQL decided it's not worth > doing that, as it's such a rare event and a short period of time that > may show inconsistencies... There are fewer options for implementing this in MySQL because replication requires a binlog on the master and that requires the internal use of XA to keep the binlog and InnoDB in sync as they are separate resource managers. In theory, this can be changed so that commit is only forced for the binlog and then on a crash missing transactions could be copied from the binlog to InnoDB but I don't think this will ever change. By "fewer options" I mean that commit in MySQL with InnoDB and the binlog requires: 1) prepare to InnoDB (force transaction log to disk for changes from this transaction) 2) write binlog events from this transaction to the binlog 3) write XID event to the binlog (at this point transaction commit is official, will survive a crash) 4) force binlog to disk 5) release row locks held by transaction in innodb 6) write commit record to innodb transaction log 7) force write of commit record to disk Group commit is done for the fsyncs from steps 1 and 7. It is not done for the fsync done in step 4. Regardless, the processing above is complicated even without semi-sync. AFAIK, semi-sync code occurs after step 7 but I have not looked at the official version of semi-sync code in MySQL and my memory of the work we did at Google is vague. It is great if Postgres doesn't have this issue. It wasn't clear to me from lurking on this list. I hope your docs highlight the behavior as not having the issue is a big deal. -- Mark Callaghan mdcallag@gmail.com
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Fri, 2011-03-18 at 17:47 +0200, Heikki Linnakangas wrote: > On 18.03.2011 16:52, Kevin Grittner wrote: > > Simon Riggs<simon@2ndQuadrant.com> wrote: > > > >> In PostgreSQL other users cannot observe the commit until an > >> acknowledgement has been received. > > > > Really? I hadn't picked up on that. That makes for a lot of > > complication on crash-and-recovery of a master, but if we can pull > > it off, that's really cool. If we do that and MySQL doesn't, we > > definitely don't want to use the same terminology they do, which > > would imply the same behavior. > > To be clear: other users cannot observe the commit until standby > acknowledges it - unless the master crashes while waiting for the > acknowledgment. If that happens, the commit will be visible to everyone > after recovery. No, only in the case where you choose not to failover to the standby when you crash, which would be a fairly strange choice after the effort to set up the standby. In a correctly configured and operated cluster what I say above is fully correct and needs no addendum. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
> On 18.03.2011 16:52, Kevin Grittner wrote: >> Simon Riggs<simon@2ndQuadrant.com> wrote: >> >>> In PostgreSQL other users cannot observe the commit until an >>> acknowledgement has been received. >> >> Really? I hadn't picked up on that. That makes for a lot of >> complication on crash-and-recovery of a master, but if we can >> pull it off, that's really cool. Markus Wanner <markus@bluegap.ch> wrote: > What complication do you have in mind here? Basically, what Heikki addresses. It has to be committed after crash and recovery, and deal with replicas which may or may not have been notified and may or may not have applied the transaction. Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > To be clear: other users cannot observe the commit until standby > acknowledges it - unless the master crashes while waiting for the > acknowledgment. If that happens, the commit will be visible to > everyone after recovery. Right. If other transactions cannot see the transaction before the COMMIT returns, I was kinda assuming that this was the behavior, because otherwise one or more replicas could be ahead of the master after recovery, which would be horribly broken. I agree that the behavior which you describe is much better than allowing other transactions to see the work of the pending COMMIT. In fact, on further reflection, allowing other transactions to see work before the committing transaction returns could lead to broken behavior if that viewing transaction took some action based on the that, the master crashed, recovery was done using a standby, and that standby hadn't persisted the transaction. So this behavior is necessary for good behavior. Even though that "perfect storm" of events might be fairly rare, the difference in the level of confidence in correctness is significant, and certainly something to brag about. -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Fri, Mar 18, 2011 at 12:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2011-03-18 at 17:47 +0200, Heikki Linnakangas wrote: >> On 18.03.2011 16:52, Kevin Grittner wrote: >> > Simon Riggs<simon@2ndQuadrant.com> wrote: >> > >> >> In PostgreSQL other users cannot observe the commit until an >> >> acknowledgement has been received. >> > >> > Really? I hadn't picked up on that. That makes for a lot of >> > complication on crash-and-recovery of a master, but if we can pull >> > it off, that's really cool. If we do that and MySQL doesn't, we >> > definitely don't want to use the same terminology they do, which >> > would imply the same behavior. >> >> To be clear: other users cannot observe the commit until standby >> acknowledges it - unless the master crashes while waiting for the >> acknowledgment. If that happens, the commit will be visible to everyone >> after recovery. > > No, only in the case where you choose not to failover to the standby > when you crash, which would be a fairly strange choice after the effort > to set up the standby. In a correctly configured and operated cluster > what I say above is fully correct and needs no addendum. Except it doesn't work that way. If, say, a backend on the master core dumps, the system will perform a crash and restart cycle, and the transaction will become visible whether it's yet been replicated or not. Since we now have a GUC to suppress restart after a backend crash, it's theoretically possible to set up the system so that this doesn't occur, but it'd take quite a bit of work to make it robust and automatic, and it's certainly not the default out of the box. The fundamental problem here is that once you update CLOG and flush the corresponding WAL record, there is no going backward. You can hold the system in some intermediate state where the transaction still holds locks and is excluded from MVCC snapshots, but there's no way to back up. So there are bound to be corner cases where the where the wait doesn't last as long as you want, and stuff leaks out around the edges. It's fundamentally impossible to guarantee that you'll remain in that intermediate state forever - what do you do if a meteor hits the synchronous standby and at the same time you lose power to the master? No amount of configuration will save you from coming back on line with a visible-but-unreplicated transaction. I'm not knocking the system; I think what we have is impressively good. But pretending that corner cases can't happen gets us nowhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: > Simon Riggs <simon@2ndquadrant.com> wrote: >> No, only in the case where you choose not to failover to the >> standby when you crash, which would be a fairly strange choice >> after the effort to set up the standby. In a correctly configured >> and operated cluster what I say above is fully correct and needs >> no addendum. > what do you do if a meteor hits the synchronous standby and at the > same time you lose power to the master? No amount of > configuration will save you from coming back on line with a > visible-but-unreplicated transaction. You don't even need to postulate an extreme condition like that; we prefer to have a DBA pull the trigger on a failover, rather than trust the STONITH call to software. This is particularly true when the master is local to its primary users and the replica is remote to them. -Kevin
Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Greg Stark
Date:
On Fri, Mar 18, 2011 at 4:33 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The fundamental problem here is that once you update CLOG and flush > the corresponding WAL record, there is no going backward. You can > hold the system in some intermediate state where the transaction still > holds locks and is excluded from MVCC snapshots, but there's no way to > back up. So there are bound to be corner cases where the where the > wait doesn't last as long as you want, and stuff leaks out around the > edges. I'm finding this whole idea of hiding the committed transaction until the slave acks it kind of strange. It means there are times when the slave is actually *ahead* of the master which would actually be kind of hard to code against if you're trying to use the slave as a possibly-not-up-to-date mirror. I think promising that the COMMIT doesn't return until the transaction and all previous transactions are replicated is enough. We don't have to promise that nobody else will see it either. Those same transactions eventually have to commit as well and if they want that level of protection they can block waiting until they're replicated as well which will imply that anything they depended on will be replicated. This is akin to the synchronous_commit=off case where other transactions can see your data as soon as you commit even before the xlog is fsynced. If you have synchronous_commit mode enabled then you'll block until your xlog is fsynced and that will implicitly mean the other transactions you saw were also fsynced. -- greg
Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/18/2011 06:35 PM, Greg Stark wrote: > I think promising that the COMMIT doesn't return until the transaction > and all previous transactions are replicated is enough. We don't have > to promise that nobody else will see it either. Those same > transactions eventually have to commit as well No, they don't have to. They can ROLLBACK, get aborted, lose connection to the master, etc.. The issue here is that, given the MySQL scheme, these transactions see a snapshot that's not durable, because at that point in time, no standby guarantees to have stored the transaction to be committed, yet. So in case of a failover, you'd suddenly see a different snapshot (and lose changes of that transaction). > This is akin to the synchronous_commit=off case where other > transactions can see your data as soon as you commit even before the > xlog is fsynced. If you have synchronous_commit mode enabled then > you'll block until your xlog is fsynced and that will implicitly mean > the other transactions you saw were also fsynced. Somewhat, yes. And for exactly that reason, most users run with synchronous_commit enabled. They don't want to lose committed transactions. Regards Markus Wanner
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
Simon, On 03/18/2011 05:19 PM, Simon Riggs wrote: >>> Simon Riggs<simon@2ndQuadrant.com> wrote: >>>> In PostgreSQL other users cannot observe the commit until an >>>> acknowledgement has been received. On other nodes as well? To me that means the standby needs to hold back COMMIT of an ACKed transaction, until receives a re-ACK from the master, that it committed the transaction there. How else could the slave know when to commit its ACKed transactions? > No, only in the case where you choose not to failover to the standby > when you crash, which would be a fairly strange choice after the effort > to set up the standby. In a correctly configured and operated cluster > what I say above is fully correct and needs no addendum. If you don't failover, how can the standby be ahead of the master, given it takes measures not to be during normal operation? Eager to understand... ;-) Regards Markus
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/18/2011 05:27 PM, Kevin Grittner wrote: > Basically, what Heikki addresses. It has to be committed after > crash and recovery, and deal with replicas which may or may not have > been notified and may or may not have applied the transaction. Huh? I'm not quite following here. Committing additional transactions isn't a problem, reverting committed transactions is. And yes, given that we only wait for ACK from a single standby, you'd have to failover to exactly *that* standby to guarantee consistency. > In fact, on further reflection, allowing other transactions to see > work before the committing transaction returns could lead to broken > behavior if that viewing transaction took some action based on the > that, the master crashed, recovery was done using a standby, and > that standby hadn't persisted the transaction. So this behavior is > necessary for good behavior. I fully agree to that. Regards Markus
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Fri, 2011-03-18 at 20:19 +0100, Markus Wanner wrote: > Simon, > > On 03/18/2011 05:19 PM, Simon Riggs wrote: > >>> Simon Riggs<simon@2ndQuadrant.com> wrote: > >>>> In PostgreSQL other users cannot observe the commit until an > >>>> acknowledgement has been received. > > On other nodes as well? To me that means the standby needs to hold back > COMMIT of an ACKed transaction, until receives a re-ACK from the master, > that it committed the transaction there. How else could the slave know > when to commit its ACKed transactions? We could do that easily enough, actually, if we wished. Do we wish? > > No, only in the case where you choose not to failover to the standby > > when you crash, which would be a fairly strange choice after the effort > > to set up the standby. In a correctly configured and operated cluster > > what I say above is fully correct and needs no addendum. > > If you don't failover, how can the standby be ahead of the master, given > it takes measures not to be during normal operation? > > Eager to understand... ;-) > > Regards > > Markus -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote: > On Fri, 2011-03-18 at 20:19 +0100, Markus Wanner wrote: >> >>> Simon Riggs<simon@2ndQuadrant.com> wrote: >> >>>> In PostgreSQL other users cannot observe the commit until an >> >>>> acknowledgement has been received. >> >> On other nodes as well? To me that means the standby needs to >> hold back COMMIT of an ACKed transaction, until receives a re-ACK >> from the master, that it committed the transaction there. How >> else could the slave know when to commit its ACKed transactions? > > We could do that easily enough, actually, if we wished. > > Do we wish? +1 If we're going out of our way to suppress it on the master until the COMMIT returns, it shouldn't be showing on the replicas before that. -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/18/2011 08:29 PM, Simon Riggs wrote: > We could do that easily enough, actually, if we wished. > > Do we wish? I personally don't see any problem letting a standby show a snapshot before the master. I'd consider it unneeded network traffic. But then again, I'm completely biased. Regards Markus Wanner
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Fri, Mar 18, 2011 at 3:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2011-03-18 at 20:19 +0100, Markus Wanner wrote: >> Simon, >> >> On 03/18/2011 05:19 PM, Simon Riggs wrote: >> >>> Simon Riggs<simon@2ndQuadrant.com> wrote: >> >>>> In PostgreSQL other users cannot observe the commit until an >> >>>> acknowledgement has been received. >> >> On other nodes as well? To me that means the standby needs to hold back >> COMMIT of an ACKed transaction, until receives a re-ACK from the master, >> that it committed the transaction there. How else could the slave know >> when to commit its ACKed transactions? > > We could do that easily enough, actually, if we wished. > > Do we wish? Seems like it would be nice, but isn't it dreadfully expensive? Wouldn't you need to prevent the slave from applying the WAL until the master has released the sync rep waiters? You'd need a whole new series of messages back and forth. Since the current solution is intended to support data-loss-free failover, but NOT to guarantee a consistent view of the world from a SQL level, I doubt it's worth paying any price for this. Certainly in the hot_standby=off case it's a nonissue. We might need to think harder about it when and if someone impements an 'apply' level though, because this would seem more of a concern in that case (though I haven't thought through all the details). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: > Since the current solution is intended to support data-loss-free > failover, but NOT to guarantee a consistent view of the world from > a SQL level, I doubt it's worth paying any price for this. Well, that brings us back to the question of why we would want to suppress the view of the data on the master until the replica acknowledges the commit. It *is* committed on the master, we're just holding off on telling the committer about it until we can honor the guarantee of replication. If it can be seen on the replica before the committer get such acknowledgment, why not on the master? -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Fri, 2011-03-18 at 17:08 -0400, Aidan Van Dyk wrote: > On Fri, Mar 18, 2011 at 3:41 PM, Markus Wanner <markus@bluegap.ch> wrote: > > On 03/18/2011 08:29 PM, Simon Riggs wrote: > >> We could do that easily enough, actually, if we wished. > >> > >> Do we wish? > > > > I personally don't see any problem letting a standby show a snapshot > > before the master. I'd consider it unneeded network traffic. But then > > again, I'm completely biased. > > In fact, we *need* to have standbys show a snapshot before the master. > > By the time the master acks the commit to the client, the snapshot > must be visible to all client connected to both the master and the > syncronous slave. > > Even with just a single server postgresql cluster, other > clients(backends) can see the commit before the commiting client > receives the ACK. Just that on a single server, the time period for > that is small. > > Sync rep increases that time period by the length of time from when > the slave reaches the commit point in the WAL stream to when it's ack > of that point get's back to the wal sender. Ideally, that ACK time is > small. > > Adding another round trip in there just for a "go almost to $COMIT, > ok, now go to $COMMIT" type of WAL/ack is going to be pessimal for > performance, and still not improve the *guarentees* it can make. > > It can only slightly reduce, but not eliminated that window where them > master has WAL that the slave doesn't, and without a complete > elimination (where you just switch the problem to be the slave has the > data that the master doesn't), you haven't changed any of the > guarantees sync rep can make (or not). Well explained observation. Agreed. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Simon Riggs
Date:
On Fri, 2011-03-18 at 16:24 -0500, Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > > Since the current solution is intended to support data-loss-free > > failover, but NOT to guarantee a consistent view of the world from > > a SQL level, I doubt it's worth paying any price for this. > > Well, that brings us back to the question of why we would want to > suppress the view of the data on the master until the replica > acknowledges the commit. It *is* committed on the master, we're > just holding off on telling the committer about it until we can > honor the guarantee of replication. If it can be seen on the > replica before the committer get such acknowledgment, why not on the > master? I think the issue is explicit acknowledgement, not visibility. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Fri, Mar 18, 2011 at 5:24 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> Since the current solution is intended to support data-loss-free >> failover, but NOT to guarantee a consistent view of the world from >> a SQL level, I doubt it's worth paying any price for this. > > Well, that brings us back to the question of why we would want to > suppress the view of the data on the master until the replica > acknowledges the commit. It *is* committed on the master, we're > just holding off on telling the committer about it until we can > honor the guarantee of replication. If it can be seen on the > replica before the committer get such acknowledgment, why not on the > master? Well, the idea is that we don't want to let people depend on the value until it's guaranteed to be durably committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: > Well, the idea is that we don't want to let people depend on the > value until it's guaranteed to be durably committed. OK, so if you see it on the replica, you know it is in at least two places. I guess that makes sense. It kinda "feels" wrong to see a view of the replica which is ahead of the master, but I guess it's the least of the evils. I guess we should document it, though, so nobody has a false expectation that seeing something on the replica means that a connection looking at the master will see something that current. -Kevin
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Fri, Mar 18, 2011 at 5:48 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> Well, the idea is that we don't want to let people depend on the >> value until it's guaranteed to be durably committed. > > OK, so if you see it on the replica, you know it is in at least two > places. I guess that makes sense. It kinda "feels" wrong to see a > view of the replica which is ahead of the master, but I guess it's > the least of the evils. I guess we should document it, though, so > nobody has a false expectation that seeing something on the replica > means that a connection looking at the master will see something > that current. Yeah, it can go both ways: a snapshot taken on the standby can be either earlier or later in the commit ordering than the master. That's counterintuitive, but I see no reason to stress about it. It's perfectly reasonable to set up a server with synchronous replication for enhanced durability and also enable hot standby just for convenience, but without actually relying on it all that heavily, or only for non-critical reporting purposes. Synchronous replication, like asynchronous replication, is basically a high-availability tool. As long as it does that well, I'm not going to get worked up about the fact that it doesn't address every other use case someone might want. We can always add more frammishes in future releases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/18/2011 10:48 PM, Kevin Grittner wrote: > the least of the evils. I guess we should document it, though, so > nobody has a false expectation that seeing something on the replica > means that a connection looking at the master will see something > that current. Agreed. Note, however, that even if there's no such guarantee, it's highly unlikely for a user (or application) to ever notice this during normal operation. Regards Markus Wanner
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Aidan Van Dyk
Date:
On Fri, Mar 18, 2011 at 3:41 PM, Markus Wanner <markus@bluegap.ch> wrote: > On 03/18/2011 08:29 PM, Simon Riggs wrote: >> We could do that easily enough, actually, if we wished. >> >> Do we wish? > > I personally don't see any problem letting a standby show a snapshot > before the master. I'd consider it unneeded network traffic. But then > again, I'm completely biased. In fact, we *need* to have standbys show a snapshot before the master. By the time the master acks the commit to the client, the snapshot must be visible to all client connected to both the master and the syncronous slave. Even with just a single server postgresql cluster, other clients(backends) can see the commit before the commiting client receives the ACK. Just that on a single server, the time period for that is small. Sync rep increases that time period by the length of time from when the slave reaches the commit point in the WAL stream to when it's ack of that point get's back to the wal sender. Ideally, that ACK time is small. Adding another round trip in there just for a "go almost to $COMIT, ok, now go to $COMMIT" type of WAL/ack is going to be pessimal for performance, and still not improve the *guarentees* it can make. It can only slightly reduce, but not eliminated that window where them master has WAL that the slave doesn't, and without a complete elimination (where you just switch the problem to be the slave has the data that the master doesn't), you haven't changed any of the guarantees sync rep can make (or not). a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Fri, Mar 18, 2011 at 5:08 PM, Aidan Van Dyk <aidan@highrise.ca> wrote: > On Fri, Mar 18, 2011 at 3:41 PM, Markus Wanner <markus@bluegap.ch> wrote: >> On 03/18/2011 08:29 PM, Simon Riggs wrote: >>> We could do that easily enough, actually, if we wished. >>> >>> Do we wish? >> >> I personally don't see any problem letting a standby show a snapshot >> before the master. I'd consider it unneeded network traffic. But then >> again, I'm completely biased. > > In fact, we *need* to have standbys show a snapshot before the master. > > By the time the master acks the commit to the client, the snapshot > must be visible to all client connected to both the master and the > syncronous slave. We might have a version of synchronous replication that works this way some day, but it's not the version were shipping with 9.1. The slave acknowledges the WAL records when they hit the disk (i.e. fsync) not when they are applied; WAL apply can lag arbitrarily. The point is to guarantee clients that the WAL is on disk somewhere and that it will be replayed in the event of a failover. Despite the fact that this doesn't work as you're describing, it's a useful feature in its own right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/22/2011 09:33 PM, Robert Haas wrote: > We might have a version of synchronous replication that works this way > some day, but it's not the version were shipping with 9.1. The slave > acknowledges the WAL records when they hit the disk (i.e. fsync) not > when they are applied; WAL apply can lag arbitrarily. The point is to > guarantee clients that the WAL is on disk somewhere and that it will > be replayed in the event of a failover. Despite the fact that this > doesn't work as you're describing, it's a useful feature in its own > right. In that sense, our approach may be more synchronous than most others, because after the ACK is sent from the slave, the slave still needs to apply the transaction data from WAL before it gets visible, while the master needs to wait for the ACK to arrive at its side, before making it visible there. Ideally, these two latencies (disk seek and network induced) are just about equal. But of course, there's no such guarantee. So whenever one of the two is off by an order of magnitude or two (by use case or due to a temporary overload), either the master or the slave may lag behind the other machine. What pleases me is that the guarantee from the slave is somewhat similar to Postgres-R's: with its ACK, the receiving node doesn't guarantee the transaction *is* applied locally, it just guarantees that it *will* be able to do so sometime in the future. Kind of a mind twister, though... Regards Markus
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Wed, Mar 23, 2011 at 3:27 AM, Markus Wanner <markus@bluegap.ch> wrote: > On 03/22/2011 09:33 PM, Robert Haas wrote: >> We might have a version of synchronous replication that works this way >> some day, but it's not the version were shipping with 9.1. The slave >> acknowledges the WAL records when they hit the disk (i.e. fsync) not >> when they are applied; WAL apply can lag arbitrarily. The point is to >> guarantee clients that the WAL is on disk somewhere and that it will >> be replayed in the event of a failover. Despite the fact that this >> doesn't work as you're describing, it's a useful feature in its own >> right. > > In that sense, our approach may be more synchronous than most others, > because after the ACK is sent from the slave, the slave still needs to > apply the transaction data from WAL before it gets visible, while the > master needs to wait for the ACK to arrive at its side, before making it > visible there. > > Ideally, these two latencies (disk seek and network induced) are just > about equal. But of course, there's no such guarantee. So whenever one > of the two is off by an order of magnitude or two (by use case or due to > a temporary overload), either the master or the slave may lag behind the > other machine. > > What pleases me is that the guarantee from the slave is somewhat similar > to Postgres-R's: with its ACK, the receiving node doesn't guarantee the > transaction *is* applied locally, it just guarantees that it *will* be > able to do so sometime in the future. Kind of a mind twister, though... Yes. What this won't do is let you build a big load-balancing network (at least not without great caution about what you assume). What it will do is make it really, really hard to lose committed transactions.Both good things, but different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Markus Wanner
Date:
On 03/23/2011 12:52 PM, Robert Haas wrote: > Yes. What this won't do is let you build a big load-balancing network > (at least not without great caution about what you assume). This sounds too strong to me. Session-aware load balancing is pretty common these days. It's the default mode of PgBouncer, for example. Not much caution required there, IMO. Or what pitfalls did you have in mind? > What it > will do is make it really, really hard to lose committed transactions. > Both good things, but different. ..you can still get both at the same time. At least as long as you are happy with session-aware load balancing. And who really needs finer grained balancing? (Note that no matter how fine-grained you balance, you are still bound to a (single core of a) single node. That changes with distributed querying, and things really start to get interesting there... but we are far from that, yet). Regards Markus
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Robert Haas
Date:
On Wed, Mar 23, 2011 at 8:16 AM, Markus Wanner <markus@bluegap.ch> wrote: > On 03/23/2011 12:52 PM, Robert Haas wrote: >> Yes. What this won't do is let you build a big load-balancing network >> (at least not without great caution about what you assume). > > This sounds too strong to me. Session-aware load balancing is pretty > common these days. It's the default mode of PgBouncer, for example. > Not much caution required there, IMO. Or what pitfalls did you have in > mind? Well, just the one we were talking about: a COMMIT on one node doesn't guarantee that the transactions is visible on the other node, just that it will become visible there eventually, even if a crash happens. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
From
Fujii Masao
Date:
On Sat, Mar 19, 2011 at 4:29 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2011-03-18 at 20:19 +0100, Markus Wanner wrote: >> Simon, >> >> On 03/18/2011 05:19 PM, Simon Riggs wrote: >> >>> Simon Riggs<simon@2ndQuadrant.com> wrote: >> >>>> In PostgreSQL other users cannot observe the commit until an >> >>>> acknowledgement has been received. >> >> On other nodes as well? To me that means the standby needs to hold back >> COMMIT of an ACKed transaction, until receives a re-ACK from the master, >> that it committed the transaction there. How else could the slave know >> when to commit its ACKed transactions? > > We could do that easily enough, actually, if we wished. > > Do we wish? No. I'm not sure what's the problem with seeing from the standby the data which is not visible on the master yet? And, I'm really not sure whether that problem can be solved by making the data visible on the master before the standby. If we really want to see the consistent data from each node, we should implement and use a cluster-wide snapshot as well as Postgres-XC does. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center