Thread: Standalone synchronous master
This patch implements the following TODO item:
Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period
This would require some type of command to be executed to alert administrators of this change.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
This patch implementation is in the same line as it was given in the earlier thread.
Some Of the additional important changes are:
1. Have added two GUC variable to take commands from user to be executed
a. Master_to_standalone_cmd: To be executed before master switches to standalone mode.
b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode.
2. Master mode switch will happen only if the corresponding command executed successfully.
3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiry of
wal_sender_timeout, the master will switch from sync mode to standalone mode.
Please provide your opinion or any other expectation out of this patch.
I will add the same to November commitFest.
Thanks and Regards,
Kumar Rajeev Rastogi
Attachment
On Wed, Nov 13, 2013 at 6:39 PM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: > Add a new "eager" synchronous mode that starts out synchronous but reverts > to asynchronous after a failure timeout period > > This would require some type of command to be executed to alert > administrators of this change. > > http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php > This patch implementation is in the same line as it was given in the earlier > thread. > > Some Of the additional important changes are: > > 1. Have added two GUC variable to take commands from user to be > executed > > a. Master_to_standalone_cmd: To be executed before master switches to > standalone mode. > > b. Master_to_sync_cmd: To be executed before master switches from sync > mode to standalone mode. In description of both switches (a & b), you are telling that it will switch to standalone mode, I think by your point 1b. you mean to say other way (switch from standalone to sync mode). Instead of getting commands, why can't we just log such actions? I think adding 3 new guc variables for this functionalityseems to be bit high. Also what will happen when it switches to standalone mode incase there are some async standby's already connected to itbefore going to standalone mode, if it continues to send data then I think naming it as 'enable_standalone_master' isnot good. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 8th Jan, 2014, Amit Kapila Wrote > > > Add a new "eager" synchronous mode that starts out synchronous but > > reverts to asynchronous after a failure timeout period > > > > This would require some type of command to be executed to alert > > administrators of this change. > > > > http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php > > This patch implementation is in the same line as it was given in the > > earlier thread. > > > > Some Of the additional important changes are: > > > > 1. Have added two GUC variable to take commands from user to be > > executed > > > > a. Master_to_standalone_cmd: To be executed before master > switches to > > standalone mode. > > > > b. Master_to_sync_cmd: To be executed before master switches > from sync > > mode to standalone mode. > > In description of both switches (a & b), you are telling that it > will switch to > standalone mode, I think by your point 1b. you mean to say other way > (switch from standalone to sync mode). Yes you are right. Its typo mistake. > Instead of getting commands, why can't we just log such actions? I > think > adding 3 new guc variables for this functionality seems to be bit > high. Actually in earlier discussion as well as in TODO added, it is mentioned to execute some kind of command to be executed to alert administrator.http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php In my current patch, I have kept the LOG along with command. > Also what will happen when it switches to standalone mode incase > there > are some async standby's already connected to it before going to > standalone mode, if it continues to send data then I think naming it > as > 'enable_standalone_master' is not good. Yes we can change name to something more appropriate, some of them are: 1. enable_async_master 2. sync_standalone_master 3. enable_nowait_master 4. enable_nowait_resp_master Please provide your suggestion on above name or any other?. Thanks and Regards, Kumar Rajeev Rastogi
On 11/13/2013 03:09 PM, Rajeev rastogi wrote: > This patch implements the following TODO item: > > Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period > This would require some type of command to be executed to alert administrators of this change. > http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php > > This patch implementation is in the same line as it was given in the earlier thread. > Some Of the additional important changes are: > > 1. Have added two GUC variable to take commands from user to be executed > > a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. > > b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. > > 2. Master mode switch will happen only if the corresponding command executed successfully. > > 3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiryof > > wal_sender_timeout, the master will switch from sync mode to standalone mode. > > Please provide your opinion or any other expectation out of this patch. I'm going to say right off the bat that I think the whole notion to automatically disable synchronous replication when the standby goes down is completely bonkers. If you don't need the strong guarantee that your transaction is safe in at least two servers before it's acknowledged to the client, there's no point enabling synchronous replication in the first place. If you do need it, then you shouldn't fall back to a degraded mode, at least not automatically. It's an idea that keeps coming back, but I have not heard a convincing argument why it makes sense. It's been discussed many times before, most recently in that thread you linked to. Now that I got that out of the way, I concur that some sort of hooks or commands that fire when a standby goes down or comes back up makes sense, for monitoring purposes. I don't much like this particular design. If you just want to write log entry, when all the standbys are disconnected, running a shell command seems like an awkward interface. It's OK for raising an alarm, but there are many other situations where you might want to raise alarms, so I'd rather have us implement some sort of a generic trap system, instead of adding this one particular extra config option. What do people usually use to monitor replication? There are two things we're trying to solve here: raising an alarm when something interesting happens, and changing the configuration to temporarily disable synchronous replication. What would be a good API to disable synchronous replication? Editing the config file and SIGHUPing is not very nice. There's been talk of an ALTER command to change the config, but I'm not sure that's a very good API either. Perhaps expose the sync_master_in_standalone_mode variable you have in your patch to new SQL-callable functions. Something like: pg_disable_synchronous_replication() pg_enable_synchronous_replication() I'm not sure where that state would be stored. Should it persist restarts? And you probably should get some sort of warnings in the log when synchronous replication is disabled. In summary, more work is required to design a good user/admin/programming interface. Let's hear a solid proposal for that, before writing patches. BTW, calling an external command with system(), while holding SyncRepLock in exclusive-mode, seems like a bad idea. For starters, holding a lock will prevent a new WAL sender from starting up and becoming a synchronous standby, and the external command might take a long time to return. - Heikki
On 2014-01-08 11:07:48 +0200, Heikki Linnakangas wrote: > I'm going to say right off the bat that I think the whole notion to > automatically disable synchronous replication when the standby goes down is > completely bonkers. If you don't need the strong guarantee that your > transaction is safe in at least two servers before it's acknowledged to the > client, there's no point enabling synchronous replication in the first > place. I think that's likely caused by the misconception that synchronous replication is synchronous in apply, not just remote write/fsync. I have now seen several sites that assumed that and just set up sync rep to maintain that goal to then query standbys instead of the primary after the commit finished. If that assumption were true, supporting a timeout that way would possibly be helpful, but it is not atm... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I'm going to say right off the bat that I think the whole notion to > automatically disable synchronous replication when the standby goes down is > completely bonkers. Agreed We had this discussion across 3 months and we don't want it again. This should not have been added as a TODO item. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote: > On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > > > I'm going to say right off the bat that I think the whole notion to > > automatically disable synchronous replication when the standby goes down is > > completely bonkers. > > Agreed > > We had this discussion across 3 months and we don't want it again. > This should not have been added as a TODO item. I am glad Heikki and Simon agree, but I don't. ;-) The way that I understand it is that you might want durability, but might not want to sacrifice availability. Phrased that way, it makes sense, and notifying the administrator seems the appropriate action. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Jan 8, 2014, at 9:27 PM, Bruce Momjian wrote: > On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote: >> On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> >>> I'm going to say right off the bat that I think the whole notion to >>> automatically disable synchronous replication when the standby goes down is >>> completely bonkers. >> >> Agreed >> >> We had this discussion across 3 months and we don't want it again. >> This should not have been added as a TODO item. > > I am glad Heikki and Simon agree, but I don't. ;-) > > The way that I understand it is that you might want durability, but > might not want to sacrifice availability. Phrased that way, it makes > sense, and notifying the administrator seems the appropriate action. > technically and conceptually i agree with andres and simon but from daily experience i would say that we should make it configurable. some people got some nasty experiences when their systems stopped working. +1 for a GUC to control this one. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On 01/08/2014 10:27 PM, Bruce Momjian wrote: > On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote: >> On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> >>> I'm going to say right off the bat that I think the whole notion to >>> automatically disable synchronous replication when the standby goes down is >>> completely bonkers. >> >> Agreed >> >> We had this discussion across 3 months and we don't want it again. >> This should not have been added as a TODO item. > > I am glad Heikki and Simon agree, but I don't. ;-) > > The way that I understand it is that you might want durability, but > might not want to sacrifice availability. Phrased that way, it makes > sense, and notifying the administrator seems the appropriate action. They want to have the cake and eat it too. But they're not actually getting that. What they actually get is extra latency when things work, with no gain in durability. - Heikki
On Wed, Jan 8, 2014 at 10:46:51PM +0200, Heikki Linnakangas wrote: > On 01/08/2014 10:27 PM, Bruce Momjian wrote: > >On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote: > >>On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > >> > >>>I'm going to say right off the bat that I think the whole notion to > >>>automatically disable synchronous replication when the standby goes down is > >>>completely bonkers. > >> > >>Agreed > >> > >>We had this discussion across 3 months and we don't want it again. > >>This should not have been added as a TODO item. > > > >I am glad Heikki and Simon agree, but I don't. ;-) > > > >The way that I understand it is that you might want durability, but > >might not want to sacrifice availability. Phrased that way, it makes > >sense, and notifying the administrator seems the appropriate action. > > They want to have the cake and eat it too. But they're not actually > getting that. What they actually get is extra latency when things > work, with no gain in durability. They are getting guaranteed durability until they get a notification --- that seems valuable. When they get the notification, they can reevaluate if they want that tradeoff. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> wrote: > Heikki Linnakangas wrote: >> They want to have the cake and eat it too. But they're not >> actually getting that. What they actually get is extra latency >> when things work, with no gain in durability. > > They are getting guaranteed durability until they get a > notification --- that seems valuable. When they get the > notification, they can reevaluate if they want that tradeoff. My first reaction to this has been that if you want synchronous replication without having the system wait if the synchronous target goes down, you should configure an alternate target. With the requested change we can no longer state that when a COMMIT returns with an indication of success that the data has been persisted to multiple clusters. We would be moving to a situation where the difference between synchronous is subtle -- either way the data may or may not be on a second cluster by the time the committer is notified of success. We wait up to some threshold time to try to make the success indication indicate that, but then return success even if the guarantee has not been provided, without any way for the committer to know the difference. On the other hand, we keep getting people saying they want the database to make the promise of synchronous replication, and tell applications that it has been successful even when it hasn't been, as long as there's a line in the server log to record the lie. Or, more likely, to record the boundaries of time blocks where it has been a lie. This appears to be requested because other products behave that way. I'm torn on whether we should cave to popular demand on this; but if we do, we sure need to be very clear in the documentation about what a successful return from a commit request means. Sooner or later, Murphy's Law being what it is, if we do this someone will lose the primary and blame us because the synchronous replica is missing gobs of transactions that were successfully committed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > I'm torn on whether we should cave to popular demand on this; but > if we do, we sure need to be very clear in the documentation about > what a successful return from a commit request means.� Sooner or > later, Murphy's Law being what it is, if we do this someone will > lose the primary and blame us because the synchronous replica is > missing gobs of transactions that were successfully committed. I'm for not caving. I think people who are asking for this don't actually understand what they'd be getting. regards, tom lane
On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote: > On the other hand, we keep getting people saying they want the > database to make the promise of synchronous replication, and tell > applications that it has been successful even when it hasn't been, > as long as there's a line in the server log to record the lie. Most people having such a position I've talked to have held that position because they thought synchronous replication would mean that apply (and thus visibility) would also be synchronous. Is that different from your experience? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/08/2014 12:27 PM, Bruce Momjian wrote: > I am glad Heikki and Simon agree, but I don't. ;-) > > The way that I understand it is that you might want durability, but > might not want to sacrifice availability. Phrased that way, it makes > sense, and notifying the administrator seems the appropriate action. I think there's a valid argument to want things the other way, but I find the argument not persuasive. In general, people who want auto-degrade for sync rep either: a) don't understand what sync rep actually does (lots of folks confuse synchronous with simultaneous), or b) want more infrastructure than we actually have around managing sync replicas Now, the folks who want (b) have a legitimate need, and I'll point out that we always planned to have more features around sync rep, it's just that we never actually worked on any. For example, "quorum sync" was extensively discussed and originally projected for 9.2, only certain hackers changed jobs and interests. If we just did the minimal change, that is, added an "auto-degrade" GUC and an alert to the logs each time the master server went into degraded mode, as Heikki says we'd be loading a big foot-gun for a bunch of ill-informed DBAs. People who want that are really much better off with async rep in the first place. If we really want auto-degrading sync rep, then we'd (at a minimum) need a way to determine *from the replica* whether or not it was in degraded mode when the master died. What good do messages to the master log do you if the master no longer exists? Mind you, being able to determine on the replica whether it was synchronous or not when it lost communication with the master would be a great feature to have for sync rep groups as well, and would make them practical (right now, they're pretty useless). However, I seriously doubt that someone is going to code that up in the next 5 days. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 01/08/2014 01:34 PM, Kevin Grittner wrote: > I'm torn on whether we should cave to popular demand on this; but > if we do, we sure need to be very clear in the documentation about > what a successful return from a commit request means. Sooner or > later, Murphy's Law being what it is, if we do this someone will > lose the primary and blame us because the synchronous replica is > missing gobs of transactions that were successfully committed. I am trying to follow this thread and perhaps I am just being dense but it seems to me that: If you are running synchronous replication, as long as the target (subscriber) is up, synchronous replication operates as it should. That is that the origin will wait for a notification from the subscriber that the write has been successful before continuing. However, if the subscriber is down, the origin should NEVER wait. That is just silly behavior and makes synchronous replication pretty much useless. Machines go down, that is the nature of things. Yes, we should log and log loudly if the subscriber is down: ERROR: target xyz is non-communicative: switching to async replication. We then should store the wal logs up to wal_keep_segments. When the subscriber comes back up, it will then replicate in async mode until the two are back in sync and then switch (perhaps by hand) to sync mode. This of course assumes that we have a valid database on the subscriber and we have not overrun wal_keep_segments. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 01/08/2014 11:37 PM, Andres Freund wrote: > On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote: >> On the other hand, we keep getting people saying they want the >> database to make the promise of synchronous replication, and tell >> applications that it has been successful even when it hasn't been, >> as long as there's a line in the server log to record the lie. > > Most people having such a position I've talked to have held that > position because they thought synchronous replication would mean that > apply (and thus visibility) would also be synchronous. And I totally agree that it would be a useful mode if apply was synchronous. You could then build a master-standby pair where it's guaranteed that when you commit a transaction in the master, it's thereafter always seen as committed in the standby too. In that usage, if the link between the two is broken, you could set up timeouts e.g so that the standby stops accepting new queries after 20 seconds, and then the master proceeds without the standby after 25 seconds. Then the guarantee would hold. I don't know if the people asking for the fallback mode are thinking that synchronous replication means synchronous apply, or if they're trying to have the cake and eat it too wrt. durability and availability. Synchronous apply would be cool.. - Heikki
Josh Berkus <josh@agliodbs.com> writes: > If we really want auto-degrading sync rep, then we'd (at a minimum) need > a way to determine *from the replica* whether or not it was in degraded > mode when the master died. What good do messages to the master log do > you if the master no longer exists? How would it be possible for a replica to know whether the master had committed more transactions while communication was lost, if the master dies without ever restoring communication? It sounds like pie in the sky from here ... regards, tom lane
"Joshua D. Drake" <jd@commandprompt.com> writes: > However, if the subscriber is down, the origin should NEVER wait. That > is just silly behavior and makes synchronous replication pretty much > useless. Machines go down, that is the nature of things. Yes, we should > log and log loudly if the subscriber is down: > ERROR: target xyz is non-communicative: switching to async replication. > We then should store the wal logs up to wal_keep_segments. > When the subscriber comes back up, it will then replicate in async mode > until the two are back in sync and then switch (perhaps by hand) to sync > mode. This of course assumes that we have a valid database on the > subscriber and we have not overrun wal_keep_segments. It sounds to me like you are describing the existing behavior of async mode, with the possible exception of exactly what shows up in the postmaster log. Sync mode is about providing a guarantee that the data exists on more than one server *before* we tell the client it's committed. If you don't need that guarantee, you shouldn't be using sync mode. If you do need it, it's not clear to me why you'd suddenly not need it the moment the going actually gets tough. regards, tom lane
Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote: > >> On the other hand, we keep getting people saying they want the >> database to make the promise of synchronous replication, and >> tell applications that it has been successful even when it >> hasn't been, as long as there's a line in the server log to >> record the lie. > > Most people having such a position I've talked to have held that > position because they thought synchronous replication would mean > that apply (and thus visibility) would also be synchronous. Is > that different from your experience? I haven't pursued it that far because we don't have maybe-synchronous mode yet and seem unlikely to ever support it. I'm not sure why that use-case is any better than any other. You still would never really know whether the data read is current. If we were to implement this, the supposedly synchronous replica could be out-of-date by any arbitrary amount of time (from milliseconds to months). (Consider what could happen if the replication connection authorizations got messed up while application connections to the replica were fine.) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/08/2014 01:55 PM, Tom Lane wrote: > Sync mode is about providing a guarantee that the data exists on more than > one server *before* we tell the client it's committed. If you don't need > that guarantee, you shouldn't be using sync mode. If you do need it, > it's not clear to me why you'd suddenly not need it the moment the going > actually gets tough. As I understand it what is being suggested is that if a subscriber or target goes down, then the master will just sit there and wait. When I read that, I read that the master will no longer process write transactions. If I am wrong in that understanding then cool. If I am not then that is a serious problem with a production scenario. There is an expectation that a master will continue to function if the target is down, synchronous or not. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
On 2014-01-08 14:23:34 -0800, Joshua D. Drake wrote: > > On 01/08/2014 01:55 PM, Tom Lane wrote: > > >Sync mode is about providing a guarantee that the data exists on more than > >one server *before* we tell the client it's committed. If you don't need > >that guarantee, you shouldn't be using sync mode. If you do need it, > >it's not clear to me why you'd suddenly not need it the moment the going > >actually gets tough. > > As I understand it what is being suggested is that if a subscriber or target > goes down, then the master will just sit there and wait. When I read that, I > read that the master will no longer process write transactions. If I am > wrong in that understanding then cool. If I am not then that is a serious > problem with a production scenario. There is an expectation that a master > will continue to function if the target is down, synchronous or not. I don't think you've understood synchronous replication. There wouldn't be *any* benefit to using it if it worked the way you wish since there wouldn't be any additional guarantees. A single reconnect of the streaming rep connection, without any permanent outage, would potentially lead to data loss if the primary crashed in the wrong moment. So you'd buy no guarantees with a noticeable loss in performance. Just use async mode if you want things work like that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/08/2014 02:34 PM, Andres Freund wrote: > I don't think you've understood synchronous replication. There wouldn't > be *any* benefit to using it if it worked the way you wish since there > wouldn't be any additional guarantees. A single reconnect of the > streaming rep connection, without any permanent outage, would > potentially lead to data loss if the primary crashed in the wrong > moment. > So you'd buy no guarantees with a noticeable loss in performance. > > Just use async mode if you want things work like that. Well no. That isn't what I am saying. Consider the following scenario: db0->db1 in synchronous mode The idea is that we know that data on db0 is not written until we know for a fact that db1 also has that data. That is great and a guarantee of data integrity between the two nodes. If we have the following: db0->db1:down Using the model (as I understand it) that is being discussed we have increased our failure rate because the moment db1:down we also lose db0. The node db0 may be up but if it isn't going to process transactions it is useless. I can tell you that I have exactly 0 customers that would want that model because a single node failure would cause a double node failure. All the other stuff with wal_keep_segments is just idea throwing. I don't care about that at this point. What I care about specifically is that a single node failure regardless of replication mode should not be able to (automatically) stop the operation of the master node. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote: > > On 01/08/2014 02:34 PM, Andres Freund wrote: > > >I don't think you've understood synchronous replication. There wouldn't > >be *any* benefit to using it if it worked the way you wish since there > >wouldn't be any additional guarantees. A single reconnect of the > >streaming rep connection, without any permanent outage, would > >potentially lead to data loss if the primary crashed in the wrong > >moment. > >So you'd buy no guarantees with a noticeable loss in performance. > > > >Just use async mode if you want things work like that. > > Well no. That isn't what I am saying. Consider the following scenario: > > db0->db1 in synchronous mode > > The idea is that we know that data on db0 is not written until we know for a > fact that db1 also has that data. That is great and a guarantee of data > integrity between the two nodes. That guarantee is never there. The only thing guaranteed is that the client isn't notified of the commit until db1 has received the data. > If we have the following: > > db0->db1:down > > Using the model (as I understand it) that is being discussed we have > increased our failure rate because the moment db1:down we also lose db0. The > node db0 may be up but if it isn't going to process transactions it is > useless. I can tell you that I have exactly 0 customers that would want that > model because a single node failure would cause a double node failure. That's why you should configure a second standby as another (candidate) synchronous replica, also listed in synchronous_standby_names. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
"Joshua D. Drake" <jd@commandprompt.com> writes: > On 01/08/2014 01:55 PM, Tom Lane wrote: >> Sync mode is about providing a guarantee that the data exists on more than >> one server *before* we tell the client it's committed. If you don't need >> that guarantee, you shouldn't be using sync mode. If you do need it, >> it's not clear to me why you'd suddenly not need it the moment the going >> actually gets tough. > As I understand it what is being suggested is that if a subscriber or > target goes down, then the master will just sit there and wait. When I > read that, I read that the master will no longer process write > transactions. If I am wrong in that understanding then cool. If I am not > then that is a serious problem with a production scenario. There is an > expectation that a master will continue to function if the target is > down, synchronous or not. Then you don't understand the point of sync mode, and you shouldn't be using it. The point is *exactly* to refuse to commit transactions unless we can guarantee the data's been replicated. There might be other interpretations of "synchronous replication" in which it makes sense to continue accepting transactions whether or not there are any up-to-date replicas; but in the meaning Postgres ascribes to the term, it does not make sense. You should just use async mode if that behavior is what you want. Possibly we need to rename "synchronous replication", or document it better. And I don't have any objection in principle to developing additional replication modes that offer different sets of guarantees and performance tradeoffs. But for the synchronous mode that we've got, the proposed switch is insane, and asking for it merely proves that you don't understand the difference between async and sync modes. regards, tom lane
On 01/08/2014 02:46 PM, Andres Freund wrote: >> db0->db1 in synchronous mode >> >> The idea is that we know that data on db0 is not written until we know for a >> fact that db1 also has that data. That is great and a guarantee of data >> integrity between the two nodes. > > That guarantee is never there. The only thing guaranteed is that the > client isn't notified of the commit until db1 has received the data. Well ugh on that.. but that is for another reply. > > That's why you should configure a second standby as another (candidate) > synchronous replica, also listed in synchronous_standby_names. I don't have a response to this that does not involve a great deal of sarcasm. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote: >> Using the model (as I understand it) that is being discussed we have >> increased our failure rate because the moment db1:down we also lose db0. The >> node db0 may be up but if it isn't going to process transactions it is >> useless. I can tell you that I have exactly 0 customers that would want that >> model because a single node failure would cause a double node failure. > That's why you should configure a second standby as another (candidate) > synchronous replica, also listed in synchronous_standby_names. Right. If you want to tolerate one node failure, *and* have a guarantee that committed data is on at least two nodes, you need at least three nodes. Simple arithmetic. If you only have two nodes, you only get to have one of those properties. regards, tom lane
* Andres Freund (andres@2ndquadrant.com) wrote: > That's why you should configure a second standby as another (candidate) > synchronous replica, also listed in synchronous_standby_names. Perhaps we should stress in the docs that this is, in fact, the *only* reasonable mode in which to run with sync rep on? Where there are multiple replicas, because otherwise Drake is correct that you'll just end up having both nodes go offline if the slave fails. Thanks, Stephen
On 01/08/2014 02:49 PM, Tom Lane wrote: > Then you don't understand the point of sync mode, and you shouldn't be > using it. The point is *exactly* to refuse to commit transactions unless > we can guarantee the data's been replicated. I understand exactly that and I don't disagree, except in the case where it is going to bring down the master (see my further reply). I now remember arguing about this a few years ago when we started down the sync path. Anyway, perhaps this is just something of a knob that can be turned. We don't have to continue the argument. Thank you for considering what I was saying. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 2014-01-08 17:56:37 -0500, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > That's why you should configure a second standby as another (candidate) > > synchronous replica, also listed in synchronous_standby_names. > > Perhaps we should stress in the docs that this is, in fact, the *only* > reasonable mode in which to run with sync rep on? Where there are > multiple replicas, because otherwise Drake is correct that you'll just > end up having both nodes go offline if the slave fails. Which, as it happens, is actually documented. http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION 25.2.7.3. Planning for High Availability "Commits made when synchronous_commit is set to on or remote_write will wait until the synchronous standby responds. The response may never occur if the last, or only, standby should crash. The best solution for avoiding data loss is to ensure you don't lose your last remaining synchronous standby. This can be achieved by naming multiple potential synchronous standbys using synchronous_standby_names. The first named standby will be used as the synchronous standby. Standbys listed after this will take over the role of synchronous standby if the first one should fail." Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/08/2014 01:49 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> If we really want auto-degrading sync rep, then we'd (at a minimum) need >> a way to determine *from the replica* whether or not it was in degraded >> mode when the master died. What good do messages to the master log do >> you if the master no longer exists? > > How would it be possible for a replica to know whether the master had > committed more transactions while communication was lost, if the master > dies without ever restoring communication? It sounds like pie in the > sky from here ... Oh, right. Because the main reason for a sync replica degrading is that it's down. In which case it isn't going to record anything. This would still be useful for sync rep candidates, though, and I'll document why below. But first, lemme demolish the case for auto-degrade. So here's the case that we can't possibly solve for auto-degrade. Anyone who wants auto-degrade needs to come up with a solution for this case as a first requirement: 1. A data center network/power event starts. 2. The sync replica goes down. 3. A short time later, the master goes down. 4. Data center power is restored. 5. The master is fried and is a permanent loss. The replica is ok, though. Question: how does the DBA know whether data has been lost or not? With current sync rep, it's easy: no data was lost, because the master stopped accepting writes once the replica went down. If we support auto-degrade, though, there's no way to know; the replica doesn't have that information, and anything which was on the master is permanently lost. And the point several people have made is: if you can live with indeterminancy, then you're better off with async rep in the first place. Now, what we COULD definitely use is a single-command way of degrading the master when the sync replica is down. Something like "ALTER SYSTEM DEGRADE SYNC". Right now you have to push a change to the conf file and reload, and there's no way to salvage the transaction which triggered the sync failure. This would be a nice 9.5 feature. HOWEVER, we've already kind of set up an indeterminate situation with allowing sync rep groups and candidate sync rep servers. Consider this: 1. Master server A is configured with sync replica B and candidate sync replica C 2. A rolling power/network failure event occurs, which causes B and C to go down sometime before A, and all of them to go down before the application does. 3. On restore, only C is restorable; both A and B are a total loss. Again, we have no way to know whether or not C was in sync replication when it went down. If C went down before B, then we've lost data; if B went down before C, we haven't. But we can't find out. *This* is where it would be useful to have C log whenever it went into (or out of) synchronous mode. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-01-08 14:52:07 -0800, Joshua D. Drake wrote: > On 01/08/2014 02:46 PM, Andres Freund wrote: > >>The idea is that we know that data on db0 is not written until we know for a > >>fact that db1 also has that data. That is great and a guarantee of data > >>integrity between the two nodes. > > > >That guarantee is never there. The only thing guaranteed is that the > >client isn't notified of the commit until db1 has received the data. > > Well ugh on that.. but that is for another reply. You do realize that locally you have the same guarantees? If the client didn't receive a reply to a COMMIT you won't know whether the tx committed or not. If that's not sufficient you need to use 2pc and a transaction manager. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Andres Freund (andres@2ndquadrant.com) wrote: > On 2014-01-08 17:56:37 -0500, Stephen Frost wrote: > > * Andres Freund (andres@2ndquadrant.com) wrote: > > > That's why you should configure a second standby as another (candidate) > > > synchronous replica, also listed in synchronous_standby_names. > > > > Perhaps we should stress in the docs that this is, in fact, the *only* > > reasonable mode in which to run with sync rep on? Where there are > > multiple replicas, because otherwise Drake is correct that you'll just > > end up having both nodes go offline if the slave fails. > > Which, as it happens, is actually documented. I'm aware, my point was simply that we should state, up-front in 25.2.7.3 *and* where we document synchronous_standby_names, that it requires at least three servers to be involved to be a workable solution. Perhaps we should even log a warning if only one value is found in synchronous_standby_names... Thanks, Stephen
Stephen, > I'm aware, my point was simply that we should state, up-front in > 25.2.7.3 *and* where we document synchronous_standby_names, that it > requires at least three servers to be involved to be a workable > solution. It's a workable solution with 2 servers. That's a "low-availability, high-integrity" solution; the user has chosen to double their risk of not accepting writes against never losing a write. That's a perfectly valid configuration, and I believe that NTT runs several applications this way. In fact, that can already be looked at as a kind of "auto-degrade" mode: if there aren't two nodes, then the database goes read-only. Might I also point out that transactions are synchronous or not individually? The sensible configuration is for only the important writes being synchronous -- in which case auto-degrade makes even less sense. I really think that demand for auto-degrade is coming from users who don't know what sync rep is for in the first place. The fact that other vendors are offering auto-degrade as a feature instead of the ginormous foot-gun it is adds to the confusion, but we can't help that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Stephen Frost <sfrost@snowman.net> writes: > I'm aware, my point was simply that we should state, up-front in > 25.2.7.3 *and* where we document synchronous_standby_names, that it > requires at least three servers to be involved to be a workable > solution. It only requires that if your requirements include both redundant data storage and tolerating single-node failure. Now admittedly, most people who want replication want it so they can have failure tolerance, but I don't think it's insane to say that you want to stop accepting writes if either node of a 2-node server drops out. If you can only afford two nodes, and you need guaranteed redundancy for business reasons, then that's where you end up. Or in short, I'm against throwing warnings for this kind of setup. I do agree that we need some doc improvements, since this is evidently not clear enough yet. regards, tom lane
Josh, * Josh Berkus (josh@agliodbs.com) wrote: > > I'm aware, my point was simply that we should state, up-front in > > 25.2.7.3 *and* where we document synchronous_standby_names, that it > > requires at least three servers to be involved to be a workable > > solution. > > It's a workable solution with 2 servers. That's a "low-availability, > high-integrity" solution; the user has chosen to double their risk of > not accepting writes against never losing a write. That's a perfectly > valid configuration, and I believe that NTT runs several applications > this way. I really don't agree with that when the standby going offline can take out the master. Note that I didn't say we shouldn't allow it, but I don't think we should accept that it's a real-world solution. > I really think that demand for auto-degrade is coming from users who > don't know what sync rep is for in the first place. The fact that other > vendors are offering auto-degrade as a feature instead of the ginormous > foot-gun it is adds to the confusion, but we can't help that. Do you really feel that a WARNING and increasing the docs to point out that three systems are necessary, particularly under the 'high availability' documentation and options, is a bad idea? I fail to see how that does anything but clarify the use-case for our users. Thanks, Stephen
On 01/08/2014 03:18 PM, Stephen Frost wrote: > Do you really feel that a WARNING and increasing the docs to point > out that three systems are necessary, particularly under the 'high > availability' documentation and options, is a bad idea? I fail to see > how that does anything but clarify the use-case for our users. I think the warning is dumb, and that the suggested documentation change is insufficient. If we're going to clarify things, then we need to have a full-on several-page doc showing several examples of different sync rep configurations and explaining their tradeoffs (including the different sync modes and per-transaction sync). Anything short of that is just going to muddy the waters further. Mind you, someone needs to take a machete to the HA section of the docs anyway. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > HOWEVER, we've already kind of set up an indeterminate situation with > allowing sync rep groups and candidate sync rep servers. Consider this: > 1. Master server A is configured with sync replica B and candidate sync > replica C > 2. A rolling power/network failure event occurs, which causes B and C to > go down sometime before A, and all of them to go down before the > application does. > 3. On restore, only C is restorable; both A and B are a total loss. > Again, we have no way to know whether or not C was in sync replication > when it went down. If C went down before B, then we've lost data; if B > went down before C, we haven't. But we can't find out. *This* is where > it would be useful to have C log whenever it went into (or out of) > synchronous mode. Good point, but C can't solve this for you just by logging. If C was the first to go down, it has no way to know whether A and B committed more transactions before dying; and it's unlikely to have logged its own crash, either. More fundamentally, if you want to survive the failure of M out of N nodes, you need a sync configuration that guarantees data is on at least M+1 nodes before reporting commit. The above example doesn't meet that, so it's not surprising that you're screwed. What we lack, and should work on, is a way for sync mode to have M larger than one. AFAICS, right now we'll report commit as soon as there's one up-to-date replica, and some high-reliability cases are going to want more. regards, tom lane
As I understand it what is being suggested is that if a subscriber or target goes down, then the master will just sit there and wait. When I read that, I read that the master will no longer process write transactions. If I am wrong in that understanding then cool. If I am not then that is a serious problem with a production scenario. There is an expectation that a master will continue to function if the target is down, synchronous or not.
On 01/08/2014 01:55 PM, Tom Lane wrote:Sync mode is about providing a guarantee that the data exists on more than
one server *before* we tell the client it's committed. If you don't need
that guarantee, you shouldn't be using sync mode. If you do need it,
it's not clear to me why you'd suddenly not need it the moment the going
actually gets tough.
On 01/08/2014 03:27 PM, Tom Lane wrote: > Good point, but C can't solve this for you just by logging. If C was the > first to go down, it has no way to know whether A and B committed more > transactions before dying; and it's unlikely to have logged its own crash, > either. Sure. But if we *knew* that C was not in synchronous mode when it went down, then we'd expect some data loss. As you point out, though, the converse is not true; even if C was in sync mode, we don't know that there's been no data loss, since B could come back up as a sync replica before going down again. > What we lack, and should work on, is a way for sync mode to have M larger > than one. AFAICS, right now we'll report commit as soon as there's one > up-to-date replica, and some high-reliability cases are going to want > more. Yeah, we talked about having this when sync rep originally went in. It involves a LOT more bookeeping on the master though, which is why nobody has been willing to attempt it -- and why we went with the single-replica solution in the first place. Especially since most people who want "quorum sync" really want MM replication anyway. "Sync N times" is really just a guarantee against data loss as long as you lose N-1 servers or fewer. And it becomes an even lower-availability solution if you don't have at least N+1 replicas. For that reason, I'd like to see some realistic actual user demand before we take the idea seriously. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Andres Freund (andres@2ndquadrant.com) wrote:Perhaps we should stress in the docs that this is, in fact, the *only*
> That's why you should configure a second standby as another (candidate)
> synchronous replica, also listed in synchronous_standby_names.
reasonable mode in which to run with sync rep on?
Josh Berkus <josh@agliodbs.com> writes: > On 01/08/2014 03:27 PM, Tom Lane wrote: >> What we lack, and should work on, is a way for sync mode to have M larger >> than one. AFAICS, right now we'll report commit as soon as there's one >> up-to-date replica, and some high-reliability cases are going to want >> more. > "Sync N times" is really just a guarantee against data loss as long as > you lose N-1 servers or fewer. And it becomes an even > lower-availability solution if you don't have at least N+1 replicas. > For that reason, I'd like to see some realistic actual user demand > before we take the idea seriously. Sure. I wasn't volunteering to implement it, just saying that what we've got now is not designed to guarantee data survival across failure of more than one server. Changing things around the margins isn't going to improve such scenarios very much. It struck me after re-reading your example scenario that the most likely way to figure out what you had left would be to see if some additional system (think Nagios monitor, or monitors) had records of when the various database servers went down. This might be what you were getting at when you said "logging", but the key point is it has to be logging done on an external server that could survive failure of the database server. postmaster.log ain't gonna do it. regards, tom lane
On 1/8/14, 6:05 PM, Tom Lane wrote: > Josh Berkus<josh@agliodbs.com> writes: >> >On 01/08/2014 03:27 PM, Tom Lane wrote: >>> >>What we lack, and should work on, is a way for sync mode to have M larger >>> >>than one. AFAICS, right now we'll report commit as soon as there's one >>> >>up-to-date replica, and some high-reliability cases are going to want >>> >>more. >> >"Sync N times" is really just a guarantee against data loss as long as >> >you lose N-1 servers or fewer. And it becomes an even >> >lower-availability solution if you don't have at least N+1 replicas. >> >For that reason, I'd like to see some realistic actual user demand >> >before we take the idea seriously. > Sure. I wasn't volunteering to implement it, just saying that what > we've got now is not designed to guarantee data survival across failure > of more than one server. Changing things around the margins isn't > going to improve such scenarios very much. > > It struck me after re-reading your example scenario that the most > likely way to figure out what you had left would be to see if some > additional system (think Nagios monitor, or monitors) had records > of when the various database servers went down. This might be > what you were getting at when you said "logging", but the key point > is it has to be logging done on an external server that could survive > failure of the database server. postmaster.log ain't gonna do it. Yeah, and I think that the logging command that was suggested allows for that *if configured correctly*. Automatic degradation to async is useful for protecting you against all modes of a single failure: Master fails, you've gotthe replica. Replica fails, you've got the master. But fit hits the shan as soon as you get a double failure, and that double failure can be very subtle. Josh's case is notsubtle: You lost power AND the master died. You KNOW you have two failures. But what happens if there's a network blip that's not large enough to notice (but large enough to degrade your replication)and the master dies? Now you have no clue if you've lost data. Compare this to async: if the master goes down (one failure), you have zero clue if you lost data or not. At least with auto-degredationyou know you have to have 2 failures to suffer data loss. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus <josh@agliodbs.com> wrote: > Stephen, > > >> I'm aware, my point was simply that we should state, up-front in >> 25.2.7.3 *and* where we document synchronous_standby_names, that it >> requires at least three servers to be involved to be a workable >> solution. > > It's a workable solution with 2 servers. That's a "low-availability, > high-integrity" solution; the user has chosen to double their risk of > not accepting writes against never losing a write. That's a perfectly > valid configuration, and I believe that NTT runs several applications > this way. > > In fact, that can already be looked at as a kind of "auto-degrade" mode: > if there aren't two nodes, then the database goes read-only. > > Might I also point out that transactions are synchronous or not > individually? The sensible configuration is for only the important > writes being synchronous -- in which case auto-degrade makes even less > sense. > > I really think that demand for auto-degrade is coming from users who > don't know what sync rep is for in the first place. The fact that other > vendors are offering auto-degrade as a feature instead of the ginormous > foot-gun it is adds to the confusion, but we can't help that. > I think the problem here is that we tend to have a limited view of "the right way to use synch rep". If I have 5 nodes, and I set 1 synchronous and the other 3 asynchronous, I've set up a "known successor" in the event that the leader fails. In this scenario though, if the "successor" fails, you actually probably want to keep accepting writes; since you weren't using synchronous for durability but for operational simplicity. I suspect there are probably other scenarios where users are willing to trade latency for improved and/or directed durability but not at the extent of availability, don't you? In fact there are entire systems that provide that type of thing. I feel like it's worth mentioning that there's a nice primer on tunable consistency in the Riak docs; strongly recommended. http://docs.basho.com/riak/1.1.0/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/. I'm not entirely sure how well it maps into our problem space, but it at least gives you a sane working model to think about. If you were trying to explain the Postgres case, async is like the N value (I want the data to end up on this many nodes eventually) and sync is like the W value (it must be written to this many nodes, or it should fail). Of course, we only offer an R = 1, W = 1 or 2, and N = all. And it's worse than that, because we have golden nodes. This isn't to say there isn't a lot of confusion around the issue. Designing, implementing, and configuring different guarantees in the presence of node failures is a non-trivial problem. Still, I'd prefer to see Postgres head in the direction of providing more options in this area rather than drawing a firm line at being a CP-oriented system. Robert Treat play: xzilla.net work: omniti.com
From: "Andres Freund" <andres@2ndquadrant.com> > On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote: >> If we have the following: >> >> db0->db1:down >> >> Using the model (as I understand it) that is being discussed we have >> increased our failure rate because the moment db1:down we also lose db0. >> The >> node db0 may be up but if it isn't going to process transactions it is >> useless. I can tell you that I have exactly 0 customers that would want >> that >> model because a single node failure would cause a double node failure. > > That's why you should configure a second standby as another (candidate) > synchronous replica, also listed in synchronous_standby_names. Let me ask a (probably) stupid question. How is the sync rep different from RAID-1? When I first saw sync rep, I expected that it would provide the same guarantees as RAID-1 in terms of durability (data is always mirrored on two servers) and availability (if one server goes down, another server continues full service). The cost is reasonable with RAID-1. The sync rep requires high cost to get both durability and availability --- three servers. Am I expecting too much? Regards MauMau
On 01/09/2014 05:09 AM, Robert Treat wrote: > On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Stephen, >> >> >>> I'm aware, my point was simply that we should state, up-front in >>> 25.2.7.3 *and* where we document synchronous_standby_names, that it >>> requires at least three servers to be involved to be a workable >>> solution. >> It's a workable solution with 2 servers. That's a "low-availability, >> high-integrity" solution; the user has chosen to double their risk of >> not accepting writes against never losing a write. That's a perfectly >> valid configuration, and I believe that NTT runs several applications >> this way. >> >> In fact, that can already be looked at as a kind of "auto-degrade" mode: >> if there aren't two nodes, then the database goes read-only. >> >> Might I also point out that transactions are synchronous or not >> individually? The sensible configuration is for only the important >> writes being synchronous -- in which case auto-degrade makes even less >> sense. >> >> I really think that demand for auto-degrade is coming from users who >> don't know what sync rep is for in the first place. The fact that other >> vendors are offering auto-degrade as a feature instead of the ginormous >> foot-gun it is adds to the confusion, but we can't help that. >> > I think the problem here is that we tend to have a limited view of > "the right way to use synch rep". If I have 5 nodes, and I set 1 > synchronous and the other 3 asynchronous, I've set up a "known > successor" in the event that the leader fails. But there is no guarantee that the synchronous replica actually is ahead of async ones. > In this scenario > though, if the "successor" fails, you actually probably want to keep > accepting writes; since you weren't using synchronous for durability > but for operational simplicity. I suspect there are probably other > scenarios where users are willing to trade latency for improved and/or > directed durability but not at the extent of availability, don't you? > Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/09/2014 12:05 AM, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: >> On 2014-01-08 17:56:37 -0500, Stephen Frost wrote: >>> * Andres Freund (andres@2ndquadrant.com) wrote: >>>> That's why you should configure a second standby as another (candidate) >>>> synchronous replica, also listed in synchronous_standby_names. >>> Perhaps we should stress in the docs that this is, in fact, the *only* >>> reasonable mode in which to run with sync rep on? Where there are >>> multiple replicas, because otherwise Drake is correct that you'll just >>> end up having both nodes go offline if the slave fails. >> Which, as it happens, is actually documented. > I'm aware, my point was simply that we should state, up-front in > 25.2.7.3 *and* where we document synchronous_standby_names, that it > requires at least three servers to be involved to be a workable > solution. > > Perhaps we should even log a warning if only one value is found in > synchronous_standby_names... You can have only one name in synchronous_standby_names and have multiple slaves connecting with that name Also, I can attest that I have had clients who want exactly that - a system stop until admin intervention in case of a designated sync standby failing. And they actually run more than one standby, they just want to make sure that sync rep to 2nd data center always happens. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/09/2014 01:57 PM, MauMau wrote: > From: "Andres Freund" <andres@2ndquadrant.com> >> On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote: >>> If we have the following: >>> >>> db0->db1:down >>> >>> Using the model (as I understand it) that is being discussed we have >>> increased our failure rate because the moment db1:down we also lose >>> db0. The >>> node db0 may be up but if it isn't going to process transactions it is >>> useless. I can tell you that I have exactly 0 customers that would >>> want that >>> model because a single node failure would cause a double node failure. >> >> That's why you should configure a second standby as another (candidate) >> synchronous replica, also listed in synchronous_standby_names. > > Let me ask a (probably) stupid question. How is the sync rep > different from RAID-1? > > When I first saw sync rep, I expected that it would provide the same > guarantees as RAID-1 in terms of durability (data is always mirrored > on two servers) and availability (if one server goes down, another > server continues full service). What you describe is most like A-sync rep. Sync rep makes sure that data is always replicated before confirming to writer. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/08/2014 11:49 PM, Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> On 01/08/2014 01:55 PM, Tom Lane wrote: >>> Sync mode is about providing a guarantee that the data exists on more than >>> one server *before* we tell the client it's committed. If you don't need >>> that guarantee, you shouldn't be using sync mode. If you do need it, >>> it's not clear to me why you'd suddenly not need it the moment the going >>> actually gets tough. >> As I understand it what is being suggested is that if a subscriber or >> target goes down, then the master will just sit there and wait. When I >> read that, I read that the master will no longer process write >> transactions. If I am wrong in that understanding then cool. If I am not >> then that is a serious problem with a production scenario. There is an >> expectation that a master will continue to function if the target is >> down, synchronous or not. > Then you don't understand the point of sync mode, and you shouldn't be > using it. The point is *exactly* to refuse to commit transactions unless > we can guarantee the data's been replicated. For single host scenario this would be similar to asking for a mode which turns fsync=off in case of disk failure :) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/09/2014 02:01 AM, Jim Nasby wrote: > On 1/8/14, 6:05 PM, Tom Lane wrote: >> Josh Berkus<josh@agliodbs.com> writes: >>> >On 01/08/2014 03:27 PM, Tom Lane wrote: >>>> >>What we lack, and should work on, is a way for sync mode to have >>>> M larger >>>> >>than one. AFAICS, right now we'll report commit as soon as >>>> there's one >>>> >>up-to-date replica, and some high-reliability cases are going to >>>> want >>>> >>more. >>> >"Sync N times" is really just a guarantee against data loss as long as >>> >you lose N-1 servers or fewer. And it becomes an even >>> >lower-availability solution if you don't have at least N+1 replicas. >>> >For that reason, I'd like to see some realistic actual user demand >>> >before we take the idea seriously. >> Sure. I wasn't volunteering to implement it, just saying that what >> we've got now is not designed to guarantee data survival across failure >> of more than one server. Changing things around the margins isn't >> going to improve such scenarios very much. >> >> It struck me after re-reading your example scenario that the most >> likely way to figure out what you had left would be to see if some >> additional system (think Nagios monitor, or monitors) had records >> of when the various database servers went down. This might be >> what you were getting at when you said "logging", but the key point >> is it has to be logging done on an external server that could survive >> failure of the database server. postmaster.log ain't gonna do it. > > Yeah, and I think that the logging command that was suggested allows > for that *if configured correctly*. *But* for relying on this, we would also need to make logging *synchronous*, which would probably not go down well with many people, as it makes things even more fragile from availability viewpoint (and slower as well). Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
From: "Hannu Krosing" <hannu@2ndQuadrant.com> > On 01/09/2014 01:57 PM, MauMau wrote: >> Let me ask a (probably) stupid question. How is the sync rep >> different from RAID-1? >> >> When I first saw sync rep, I expected that it would provide the same >> guarantees as RAID-1 in terms of durability (data is always mirrored >> on two servers) and availability (if one server goes down, another >> server continues full service). > What you describe is most like A-sync rep. > > Sync rep makes sure that data is always replicated before confirming to > writer. Really? RAID-1 is a-sync? Regards MauMau
On 01/09/2014 04:15 PM, MauMau wrote: > From: "Hannu Krosing" <hannu@2ndQuadrant.com> >> On 01/09/2014 01:57 PM, MauMau wrote: >>> Let me ask a (probably) stupid question. How is the sync rep >>> different from RAID-1? >>> >>> When I first saw sync rep, I expected that it would provide the same >>> guarantees as RAID-1 in terms of durability (data is always mirrored >>> on two servers) and availability (if one server goes down, another >>> server continues full service). >> What you describe is most like A-sync rep. >> >> Sync rep makes sure that data is always replicated before confirming to >> writer. > > Really? RAID-1 is a-sync? Not exactly, as there is no "master" just controller writing to two equal disks. But having a "degraded" mode makes it more like async - it continues even with single disk and syncs later if and when the 2nd disk comes back. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Thu, Jan 9, 2014 at 04:55:22PM +0100, Hannu Krosing wrote: > On 01/09/2014 04:15 PM, MauMau wrote: > > From: "Hannu Krosing" <hannu@2ndQuadrant.com> > >> On 01/09/2014 01:57 PM, MauMau wrote: > >>> Let me ask a (probably) stupid question. How is the sync rep > >>> different from RAID-1? > >>> > >>> When I first saw sync rep, I expected that it would provide the same > >>> guarantees as RAID-1 in terms of durability (data is always mirrored > >>> on two servers) and availability (if one server goes down, another > >>> server continues full service). > >> What you describe is most like A-sync rep. > >> > >> Sync rep makes sure that data is always replicated before confirming to > >> writer. > > > > Really? RAID-1 is a-sync? > Not exactly, as there is no "master" just controller writing to two > equal disks. > > But having a "degraded" mode makes it > more like async - it continues even with single disk and syncs later if > and when the 2nd disk comes back. I think RAID-1 is a very good comparison because it is successful technology and has similar issues. RAID-1 is like Postgres synchronous_standby_names mode in the sense that the RAID-1 controller will not return success until writes have happened on both mirrors, but it is unlike synchronous_standby_names in that it will degrade and continue writes even when it can't write to both mirrors. What is being discussed is to allow the RAID-1 behavior in Postgres. One issue that came up in discussions is the insufficiency of writing a degrade notice in a server log file because the log file isn't durable from server failures, meaning you don't know if a fail-over to the slave lost commits. The degrade message has to be stored durably against a server failure, e.g. on a pager, probably using a command like we do for archive_command, and has to return success before the server continues in degrade mode. I assume degraded RAID-1 controllers inform administrators in the same way. I think RAID-1 controllers operate successfully with this behavior because they are seen as durable and authoritative in reporting the status of mirrors, while with Postgres, there is no central authority that can report that degrade status of master/slaves. Another concern with degrade mode is that once Postgres enters degrade mode, how does it get back to synchronous_standby_names mode? We could have each commit wait for the timeout before continuing, but that is going to make degrade mode unusably slow. Would there be an admin command? With a timeout to force degrade mode, a temporary network outage could cause degrade mode, while our current behavior would recover synchronous_standby_names mode once the network was repaired. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 01/08/2014 01:49 PM, Tom Lane wrote:Oh, right. Because the main reason for a sync replica degrading is that
> Josh Berkus <josh@agliodbs.com> writes:
>> If we really want auto-degrading sync rep, then we'd (at a minimum) need
>> a way to determine *from the replica* whether or not it was in degraded
>> mode when the master died. What good do messages to the master log do
>> you if the master no longer exists?
>
> How would it be possible for a replica to know whether the master had
> committed more transactions while communication was lost, if the master
> dies without ever restoring communication? It sounds like pie in the
> sky from here ...
it's down. In which case it isn't going to record anything. This would
still be useful for sync rep candidates, though, and I'll document why
below. But first, lemme demolish the case for auto-degrade.
So here's the case that we can't possibly solve for auto-degrade.
Anyone who wants auto-degrade needs to come up with a solution for this
case as a first requirement:
1. A data center network/power event starts.
2. The sync replica goes down.
3. A short time later, the master goes down.
4. Data center power is restored.
5. The master is fried and is a permanent loss. The replica is ok, though.
Question: how does the DBA know whether data has been lost or not?
On Thu, Jan 9, 2014 at 09:36:47AM -0800, Jeff Janes wrote: > Oh, right. Because the main reason for a sync replica degrading is that > it's down. In which case it isn't going to record anything. This would > still be useful for sync rep candidates, though, and I'll document why > below. But first, lemme demolish the case for auto-degrade. > > So here's the case that we can't possibly solve for auto-degrade. > Anyone who wants auto-degrade needs to come up with a solution for this > case as a first requirement: > > > It seems like the only deterministically useful thing to do is to send a NOTICE > to the *client* that the commit has succeeded, but in degraded mode, so keep > your receipts and have your lawyer's number handy. Whether anyone is willing > to add code to the client to process that message is doubtful, as well as > whether the client will even ever receive it if we are in the middle of a major > disruption. I don't think clients are the right place for notification. Clients running on a single server could have fsync=off set by the admin or lying drives and never know it. I can't imagine a client only wiling to run if synchronous_standby_names is set. The synchronous slave is something the administrator has set up and is responsible for, so the administrator should be notified. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Robert, > I think the problem here is that we tend to have a limited view of > "the right way to use synch rep". If I have 5 nodes, and I set 1 > synchronous and the other 3 asynchronous, I've set up a "known > successor" in the event that the leader fails. In this scenario > though, if the "successor" fails, you actually probably want to keep > accepting writes; since you weren't using synchronous for durability > but for operational simplicity. I suspect there are probably other > scenarios where users are willing to trade latency for improved and/or > directed durability but not at the extent of availability, don't you? That's a workaround for a completely different limitation though; the inability to designate a specific async replica as "first". That is, if there were some way to do so, you would be using that rather than sync rep. Extending the capabilities of that workaround is not something I would gladly do until I had exhausted other options. The other problem is that *many* users think they can get improved availability, consistency AND durability on two nodes somehow, and to heck with the CAP theorem (certain companies are happy to foster this illusion). Having a simple, easily-accessable auto-degrade without treading degrade as a major monitoring event will feed this self-deception. I know I already have to explain the difference between "synchronous" and "simultaneous" to practically every one of my clients for whom I set up replication. Realistically, degrade shouldn't be something that happens inside a single PostgreSQL node, either the master or the replica. It should be controlled by some external controller which is capable of deciding on degrade or not based on a more complex set of circumstances (e.g. "Is the replica actually down or just slow?"). Certainly this is the case with Cassandra, VoltDB, Riak, and the other "serious" multinode databases. > This isn't to say there isn't a lot of confusion around the issue. > Designing, implementing, and configuring different guarantees in the > presence of node failures is a non-trivial problem. Still, I'd prefer > to see Postgres head in the direction of providing more options in > this area rather than drawing a firm line at being a CP-oriented > system. I'm not categorically opposed to having any form of auto-degrade at all; what I'm opposed to is a patch which adds auto-degrade **without adding any additional monitoring or management infrastructure at all**. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 8 January 2014 21:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> I'm torn on whether we should cave to popular demand on this; but >> if we do, we sure need to be very clear in the documentation about >> what a successful return from a commit request means. Sooner or >> later, Murphy's Law being what it is, if we do this someone will >> lose the primary and blame us because the synchronous replica is >> missing gobs of transactions that were successfully committed. > > I'm for not caving. I think people who are asking for this don't > actually understand what they'd be getting. Agreed. Just to be clear, I made this mistake initially. Now I realise Heikki was right and if you think about it long enough, you will too. If you still disagree, think hard, read the archives until you do. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1/9/14, 9:01 AM, Hannu Krosing wrote: >> Yeah, and I think that the logging command that was suggested allows >> >for that*if configured correctly*. > *But* for relying on this, we would also need to make logging > *synchronous*, > which would probably not go down well with many people, as it makes things > even more fragile from availability viewpoint (and slower as well). Not really... you only care about monitoring performance when the standby has gone AWOL *and* you haven't sent a notificationyet. Once you've notified once you're done. So in this case the master won't go down unless you have a double fault: standby goes down AND you can't get to your monitoring. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian <bruce@momjian.us> wrote: > > I think RAID-1 is a very good comparison because it is successful > technology and has similar issues. > > RAID-1 is like Postgres synchronous_standby_names mode in the sense that > the RAID-1 controller will not return success until writes have happened > on both mirrors, but it is unlike synchronous_standby_names in that it > will degrade and continue writes even when it can't write to both > mirrors. What is being discussed is to allow the RAID-1 behavior in > Postgres. > > One issue that came up in discussions is the insufficiency of writing a > degrade notice in a server log file because the log file isn't durable > from server failures, meaning you don't know if a fail-over to the slave > lost commits. The degrade message has to be stored durably against a > server failure, e.g. on a pager, probably using a command like we do for > archive_command, and has to return success before the server continues > in degrade mode. I assume degraded RAID-1 controllers inform > administrators in the same way. Here I think if user is aware from beginning that this is the behaviour, then may be the importance of message is not very high. What I want to say is that if we provide a UI in such a way that user decides during setup of server the behavior that is required by him. For example, if we provide a new parameter available_synchronous_standby_names along with current parameter and ask user to use this new parameter, if he wishes to synchronously commit transactions on another server when it is available, else it will operate as a standalone sync master. > I think RAID-1 controllers operate successfully with this behavior > because they are seen as durable and authoritative in reporting the > status of mirrors, while with Postgres, there is no central authority > that can report that degrade status of master/slaves. > > Another concern with degrade mode is that once Postgres enters degrade > mode, how does it get back to synchronous_standby_names mode? It will get back to mode where it will commit the transactions to another server before commit completes when all thegap in WAL is resolved. I think in new new mode it will operate as if there is no synchronous_standby_names. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 10, 2014 at 3:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 8 January 2014 21:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Kevin Grittner <kgrittn@ymail.com> writes: >>> I'm torn on whether we should cave to popular demand on this; but >>> if we do, we sure need to be very clear in the documentation about >>> what a successful return from a commit request means. Sooner or >>> later, Murphy's Law being what it is, if we do this someone will >>> lose the primary and blame us because the synchronous replica is >>> missing gobs of transactions that were successfully committed. >> >> I'm for not caving. I think people who are asking for this don't >> actually understand what they'd be getting. > > Agreed. > > > Just to be clear, I made this mistake initially. Now I realise Heikki > was right and if you think about it long enough, you will too. If you > still disagree, think hard, read the archives until you do. +1. I see far more potential in having a N-sync solution from the usability viewpoint, and consistency with the existing mechanisms in place. A synchronous apply mode would be nice as well. -- Michael
On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote: > On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > > I think RAID-1 is a very good comparison because it is successful > > technology and has similar issues. > > > > RAID-1 is like Postgres synchronous_standby_names mode in the sense that > > the RAID-1 controller will not return success until writes have happened > > on both mirrors, but it is unlike synchronous_standby_names in that it > > will degrade and continue writes even when it can't write to both > > mirrors. What is being discussed is to allow the RAID-1 behavior in > > Postgres. > > > > One issue that came up in discussions is the insufficiency of writing a > > degrade notice in a server log file because the log file isn't durable > > from server failures, meaning you don't know if a fail-over to the slave > > lost commits. The degrade message has to be stored durably against a > > server failure, e.g. on a pager, probably using a command like we do for > > archive_command, and has to return success before the server continues > > in degrade mode. I assume degraded RAID-1 controllers inform > > administrators in the same way. > > Here I think if user is aware from beginning that this is the behaviour, > then may be the importance of message is not very high. > What I want to say is that if we provide a UI in such a way that user > decides during setup of server the behavior that is required by him. > > For example, if we provide a new parameter > available_synchronous_standby_names along with current parameter > and ask user to use this new parameter, if he wishes to synchronously > commit transactions on another server when it is available, else it will > operate as a standalone sync master. I know there was a desire to remove this TODO item, but I think we have brought up enough new issues that we can keep it to see if we can come up with a solution. I have added a link to this discussion on the TODO item. I think we will need at least four new GUC variables: * timeout control for degraded mode * command to run during switch to degraded mode * command to run during switch from degraded mode * read-only variable to report degraded mode -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 10 January 2014 15:47, Bruce Momjian <bruce@momjian.us> wrote: > I know there was a desire to remove this TODO item, but I think we have > brought up enough new issues that we can keep it to see if we can come > up with a solution. Can you summarise what you think the new issues are? All I see is some further rehashing of old discussions. There is already a solution to the "problem" because the docs are already very clear that you need multiple standbys to achieve commit guarantees AND high availability. RTFM is usually used as some form of put down, but that is what needs to happen here. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/10/2014 05:09 PM, Simon Riggs wrote: > On 10 January 2014 15:47, Bruce Momjian <bruce@momjian.us> wrote: > >> I know there was a desire to remove this TODO item, but I think we have >> brought up enough new issues that we can keep it to see if we can come >> up with a solution. > Can you summarise what you think the new issues are? All I see is some > further rehashing of old discussions. > > There is already a solution to the "problem" because the docs are > already very clear that you need multiple standbys to achieve commit > guarantees AND high availability. RTFM is usually used as some form of > put down, but that is what needs to happen here. If we want to get the guarantees that often come up in "sync rep" discussions - namely that you can assume that your change is applied on standby when commit returns - then we could implement this by returning LSN from commit at protocol level and having an option in queries on standby to wait for this LSN (again passed on wire below the level of query) to be applied. This can be mostly hidden in drivers and would need very little effort from end user to use. basically you tell the driver that one connection is bound as "the slave" of another and driver can manage using the right LSNs. That is the last LSN received from master is always attached to queries on slaves. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/10/2014 07:47 AM, Bruce Momjian wrote: > I know there was a desire to remove this TODO item, but I think we have > brought up enough new issues that we can keep it to see if we can come > up with a solution. I have added a link to this discussion on the TODO > item. > > I think we will need at least four new GUC variables: > > * timeout control for degraded mode > * command to run during switch to degraded mode > * command to run during switch from degraded mode > * read-only variable to report degraded mode > I know I am the one that instigated all of this so I want to be very clear on what I and what I am confident that my customers would expect. If a synchronous slave goes down, the master continues to operate. That is all. I don't care if it is configurable (I would be fine with that). I don't care if it is not automatic (e.g; slave goes down and we have to tell the master to continue). I have read through this thread more than once, and I have also went back to the docs. I understand why we do it the way we do it. I also understand that from a business requirement for 99% of CMD's customers, it's wrong. At least in the sense of providing continuity of service. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 1/10/14, 12:59 PM, Joshua D. Drake wrote: > I know I am the one that instigated all of this so I want to be very clear on what I and what I am confident that my customerswould expect. > > If a synchronous slave goes down, the master continues to operate. That is all. I don't care if it is configurable (I wouldbe fine with that). I don't care if it is not automatic (e.g; slave goes down and we have to tell the master to continue). > > I have read through this thread more than once, and I have also went back to the docs. I understand why we do it the waywe do it. I also understand that from a business requirement for 99% of CMD's customers, it's wrong. At least in the senseof providing continuity of service. +1 I understand that this is a degredation of full-on sync rep. But there is definite value added with sync-rep that can automatically(or at least easily) degrade over async; it protects you from single failures. I fully understand that it willnot protect you from a double failure. That's OK in many cases. Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote: > > On 01/10/2014 07:47 AM, Bruce Momjian wrote: > > >I know there was a desire to remove this TODO item, but I think we have > >brought up enough new issues that we can keep it to see if we can come > >up with a solution. I have added a link to this discussion on the TODO > >item. > > > >I think we will need at least four new GUC variables: > > > >* timeout control for degraded mode > >* command to run during switch to degraded mode > >* command to run during switch from degraded mode > >* read-only variable to report degraded mode > > > > I know I am the one that instigated all of this so I want to be very clear > on what I and what I am confident that my customers would expect. > > If a synchronous slave goes down, the master continues to operate. That is > all. I don't care if it is configurable (I would be fine with that). I don't > care if it is not automatic (e.g; slave goes down and we have to tell the > master to continue). Would you please explain, as precise as possible, what the advantages of using a synchronous standby would be in such a scenario? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Andres Freund (andres@2ndquadrant.com) wrote: > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote: > > If a synchronous slave goes down, the master continues to operate. That is > > all. I don't care if it is configurable (I would be fine with that). I don't > > care if it is not automatic (e.g; slave goes down and we have to tell the > > master to continue). > > Would you please explain, as precise as possible, what the advantages of > using a synchronous standby would be in such a scenario? In a degraded/failure state, things continue to *work*. In a non-degraded/failure state, you're able to handle a system failure and know that you didn't lose any transactions. Tom's point is correct, that you will fail on the "have two copies of everything" in this mode, but that could certainly be acceptable in the case where there is a system failure. As pointed out by someone previously, that's how RAID-1 works (which I imagine quite a few of us use). I've been thinking about this a fair bit and I've come to like the RAID1 analogy. Stinks that we can't keep things going (automatically) if either side fails, but perhaps we will one day... Thanks, Stephen
On 2014-01-10 17:02:08 -0500, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote: > > > If a synchronous slave goes down, the master continues to operate. That is > > > all. I don't care if it is configurable (I would be fine with that). I don't > > > care if it is not automatic (e.g; slave goes down and we have to tell the > > > master to continue). > > > > Would you please explain, as precise as possible, what the advantages of > > using a synchronous standby would be in such a scenario? > > In a degraded/failure state, things continue to *work*. In a > non-degraded/failure state, you're able to handle a system failure and > know that you didn't lose any transactions. Why do you know that you didn't loose any transactions? Trivial network hiccups, a restart of a standby, IO overload on the standby all can cause a very short interruptions in the walsender connection - leading to degradation. > As pointed out by someone > previously, that's how RAID-1 works (which I imagine quite a few of us > use). I don't think that argument makes much sense. Raid-1 isn't safe as-is. It's only safe if you use some sort of journaling or similar ontop. If you issued a write during a crash you normally will just get either the version from before or the version after the last write back, depending on the state on the individual disks and which disk is treated as authoritative by the raid software. And even if you disregard that, there's not much outside influence that can lead to loosing connection to a disk drive inside a raid outside an actually broken drive. Any network connection is normally kept *outside* the leven at which you build raids. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Friday, January 10, 2014, Andres Freund wrote:
On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > > If a synchronous slave goes down, the master continues to operate. That is
> > > all. I don't care if it is configurable (I would be fine with that). I don't
> > > care if it is not automatic (e.g; slave goes down and we have to tell the
> > > master to continue).
> >
> > Would you please explain, as precise as possible, what the advantages of
> > using a synchronous standby would be in such a scenario?
>
> In a degraded/failure state, things continue to *work*. In a
> non-degraded/failure state, you're able to handle a system failure and
> know that you didn't lose any transactions.
Why do you know that you didn't loose any transactions? Trivial network
hiccups, a restart of a standby, IO overload on the standby all can
cause a very short interruptions in the walsender connection - leading
to degradation.
> As pointed out by someone
> previously, that's how RAID-1 works (which I imagine quite a few of us
> use).
I don't think that argument makes much sense. Raid-1 isn't safe
as-is. It's only safe if you use some sort of journaling or similar
ontop. If you issued a write during a crash you normally will just get
either the version from before or the version after the last write back,
depending on the state on the individual disks and which disk is treated
as authoritative by the raid software.
And even if you disregard that, there's not much outside influence that
can lead to loosing connection to a disk drive inside a raid outside an
actually broken drive. Any network connection is normally kept *outside*
the leven at which you build raids.
On 01/10/2014 01:49 PM, Andres Freund wrote: >> >> I know I am the one that instigated all of this so I want to be very clear >> on what I and what I am confident that my customers would expect. >> >> If a synchronous slave goes down, the master continues to operate. That is >> all. I don't care if it is configurable (I would be fine with that). I don't >> care if it is not automatic (e.g; slave goes down and we have to tell the >> master to continue). > > Would you please explain, as precise as possible, what the advantages of > using a synchronous standby would be in such a scenario? Current behavior: db01->sync->db02 Transactions are happening. Everything is happy. Website is up. Orders are being made. db02 goes down. It doesn't matter why. It is down. Because it is down, db01 for all intents and purposes is also down because we are using sync replication. We have just lost continuity of service, we can no longer accept orders, we can no longer allow people to log into the website, we can no longer service accounts. In short, we are out of business. Proposed behavior: db01->sync->db02 Transactions are happening. Everything is happy. Website is up. Orders are being made. db02 goes down. It doesn't matter why. It is down. db01 continues to accept orders, allow people to log into the website and we can still service accounts. The continuity of service continues. Yes, there are all kinds of things that need to be considered when that happens, that isn't the point. The point is, PostgreSQL continues its uptime guarantee and allows the business to continue to function as (if) nothing has happened. For many and I dare say the majority of businesses, this is enough. They know that if the slave goes down they can continue to operate. They know if the master goes down they can fail over. They know that while both are up they are using sync rep (with various caveats). They are happy. They like that it is simple and just works. They continue to use PostgreSQL. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote: > db02 goes down. It doesn't matter why. It is down. db01 continues to accept > orders, allow people to log into the website and we can still service > accounts. The continuity of service continues. Why is that configuration advantageous over a async configuration is the question. Why, with those requirements, are you using a synchronous standby at all? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/10/2014 02:33 PM, Andres Freund wrote: > On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote: >> db02 goes down. It doesn't matter why. It is down. db01 continues to accept >> orders, allow people to log into the website and we can still service >> accounts. The continuity of service continues. > > Why is that configuration advantageous over a async configuration is the > question. Why, with those requirements, are you using a synchronous > standby at all? +1 > > Greetings, > > Andres Freund > -- Adrian Klaver adrian.klaver@gmail.com
On 01/10/2014 02:33 PM, Andres Freund wrote: > > On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote: >> db02 goes down. It doesn't matter why. It is down. db01 continues to accept >> orders, allow people to log into the website and we can still service >> accounts. The continuity of service continues. > > Why is that configuration advantageous over a async configuration is the > question. Why, with those requirements, are you using a synchronous > standby at all? If the master goes down, I can fail over knowing that as many of my transactions as possible have been replicated. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
Hi, On 2014-01-10 17:28:55 -0500, Stephen Frost wrote: > > Why do you know that you didn't loose any transactions? Trivial network > > hiccups, a restart of a standby, IO overload on the standby all can > > cause a very short interruptions in the walsender connection - leading > > to degradation. > You know that you haven't *lost* any by virtue of the master still being > up. The case you describe is a double-failure scenario- the link between > the master and slave has to go away AND the master must accept a > transaction and then fail independently. Unfortunately network outages do correlate with other system faults. What you're wishing for really is the "I like the world to be friendly to me" mode. Even if you have only disk problems, quite often if your disks die, you can continue to write (especially with a BBU), but uncached reads fail. So the walsender connection errors out because a read failed, and youre degrading into async mode. *Because* your primary is about to die. > > > As pointed out by someone > > > previously, that's how RAID-1 works (which I imagine quite a few of us > > > use). > > > > I don't think that argument makes much sense. Raid-1 isn't safe > > as-is. It's only safe if you use some sort of journaling or similar > > ontop. If you issued a write during a crash you normally will just get > > either the version from before or the version after the last write back, > > depending on the state on the individual disks and which disk is treated > > as authoritative by the raid software. > Uh, you need a decent raid controller then and we're talking about after a > transaction commit/sync. Yes, if you have a BBU that memory is authoritative in most cases. But in that case the argument of having two disks is pretty much pointless, the SPOF suddenly became the battery + ram. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:Why is that configuration advantageous over a async configuration is the
> db02 goes down. It doesn't matter why. It is down. db01 continues to accept
> orders, allow people to log into the website and we can still service
> accounts. The continuity of service continues.
question.
Why, with those requirements, are you using a synchronous
standby at all?
On 2014-01-10 14:44:28 -0800, Joshua D. Drake wrote: > > On 01/10/2014 02:33 PM, Andres Freund wrote: > > > >On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote: > >>db02 goes down. It doesn't matter why. It is down. db01 continues to accept > >>orders, allow people to log into the website and we can still service > >>accounts. The continuity of service continues. > > > >Why is that configuration advantageous over a async configuration is the > >question. Why, with those requirements, are you using a synchronous > >standby at all? > > If the master goes down, I can fail over knowing that as many of my > transactions as possible have been replicated. It's not like async replication mode delays sending data to the standby in any way. Really, the commits themselves are sent to the server at exactly the same speed independent of sync/async. The only thing that's delayed is the *notificiation* of the client that sent the commit. Not the commit itself. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Friday, January 10, 2014, Andres Freund wrote:
Hi,
On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
> > Why do you know that you didn't loose any transactions? Trivial network
> > hiccups, a restart of a standby, IO overload on the standby all can
> > cause a very short interruptions in the walsender connection - leading
> > to degradation.
> You know that you haven't *lost* any by virtue of the master still being
> up. The case you describe is a double-failure scenario- the link between
> the master and slave has to go away AND the master must accept a
> transaction and then fail independently.
Unfortunately network outages do correlate with other system
faults. What you're wishing for really is the "I like the world to be
friendly to me" mode.
Even if you have only disk problems, quite often if your disks die, you
can continue to write (especially with a BBU), but uncached reads
fail. So the walsender connection errors out because a read failed, and
youre degrading into async mode. *Because* your primary is about to die.
> > > As pointed out by someone
> > > previously, that's how RAID-1 works (which I imagine quite a few of us
> > > use).
> >
> > I don't think that argument makes much sense. Raid-1 isn't safe
> > as-is. It's only safe if you use some sort of journaling or similar
> > ontop. If you issued a write during a crash you normally will just get
> > either the version from before or the version after the last write back,
> > depending on the state on the individual disks and which disk is treated
> > as authoritative by the raid software.
> Uh, you need a decent raid controller then and we're talking about after a
> transaction commit/sync.
Yes, if you have a BBU that memory is authoritative in most cases. But
in that case the argument of having two disks is pretty much pointless,
the SPOF suddenly became the battery + ram.
On 01/10/2014 02:47 PM, Andres Freund wrote: > Really, the commits themselves are sent to the server at exactly the > same speed independent of sync/async. The only thing that's delayed is > the *notificiation* of the client that sent the commit. Not the commit > itself. Which is irrelevant to the point that if the standby goes down, we are now out of business. Any continuous replication should not be a SPOF. The current behavior guarantees that a two node sync cluster is a SPOF. The proposed behavior removes that. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 01/10/2014 02:57 PM, Stephen Frost wrote: > Yes, if you have a BBU that memory is authoritative in most cases. But > in that case the argument of having two disks is pretty much pointless, > the SPOF suddenly became the battery + ram. > > > If that is a concern then use multiple controllers. Certainly not > unheard of- look at SANs... > And in PostgreSQL we obviously have the option of having a third or fourth standby but that isn't the problem we are trying to solve. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 01/10/2014 11:59 PM, Joshua D. Drake wrote: > > On 01/10/2014 02:57 PM, Stephen Frost wrote: > >> Yes, if you have a BBU that memory is authoritative in most >> cases. But >> in that case the argument of having two disks is pretty much >> pointless, >> the SPOF suddenly became the battery + ram. >> >> >> If that is a concern then use multiple controllers. Certainly not >> unheard of- look at SANs... >> > > And in PostgreSQL we obviously have the option of having a third or > fourth standby but that isn't the problem we are trying to solve. The problem you are trying to solve is a controller with enough Battery Backed Cache RAM to cache the entire database but with write-though mode. And you want it to degrade to write-back in case of disk failure so that you can continue while the disk is broken. People here are telling you that it would not be safe, use at least RAID-1 if you want availability Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/10/2014 02:59 PM, Joshua D. Drake wrote: > > On 01/10/2014 02:47 PM, Andres Freund wrote: > >> Really, the commits themselves are sent to the server at exactly the >> same speed independent of sync/async. The only thing that's delayed is >> the *notificiation* of the client that sent the commit. Not the commit >> itself. > > Which is irrelevant to the point that if the standby goes down, we are > now out of business. > > Any continuous replication should not be a SPOF. The current behavior > guarantees that a two node sync cluster is a SPOF. The proposed behavior > removes that. Again, if that's your goal, then use async replication. I really don't understand the use-case here. The purpose of sync rep is to know determinatively whether or not you have lost data when disaster strikes. If knowing for certain isn't important to you, then use async. BTW, people are using RAID1 as an analogy to 2-node sync replication. That's a very bad analogy, because in RAID1 you have a *single* controller which is capable of determining if the disks are in a failed state or not, and this is all happening on a single node where things like network outages aren't a consideration. It's really not the same situation at all. Also, frankly, I absolutely can't count the number of times I've had to rescue a customer or family member who had RAID1 but wan't monitoring syslog, and so one of their disks had been down for months without them knowning it. Heck, I've done this myself. So ... the Filesystem geeks have already been through this. Filesystem clustering started out with systems like DRBD, which includes an auto-degrade option. However, DBRD with auto-degrade is widely considered untrustworthy and is a significant portion of why DBRD isn't trusted today. From here, clustered filesystems went in two directions: RHCS added layers of monitoring and management to make auto-degrade a safer option than it is with DRBD (and still not the default option). Scalable clustered filesystems added N(M) quorum commit in order to support more than 2 nodes. Either of these courses are reasonable for us to pursue. What's a bad idea is adding an auto-degrade option without any tools to manage and monitor it, which is what this patch does by my reading. If I'm wrong, then someone can point it out to me. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 01/10/2014 01:49 PM, Andres Freund wrote: > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote: >> >> On 01/10/2014 07:47 AM, Bruce Momjian wrote: >> >>> I know there was a desire to remove this TODO item, but I think we have >>> brought up enough new issues that we can keep it to see if we can come >>> up with a solution. I have added a link to this discussion on the TODO >>> item. >>> >>> I think we will need at least four new GUC variables: >>> >>> * timeout control for degraded mode >>> * command to run during switch to degraded mode >>> * command to run during switch from degraded mode >>> * read-only variable to report degraded mode I would argue that we don't need the first. We just want a command to switch synchronous/degraded, and a variable (or function) to report on degraded mode. If we have those things, then it becomes completely possible to have an external monitoring framework, which is capable of answering questions like "is the replica down or just slow?", control degrade. Oh, wait! We DO have such a command. It's called ALTER SYSTEM SET! Recently committed. So this is really a solvable issue if one is willing to use an external utility. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 01/10/2014 03:17 PM, Josh Berkus wrote: >> Any continuous replication should not be a SPOF. The current behavior >> guarantees that a two node sync cluster is a SPOF. The proposed behavior >> removes that. > > Again, if that's your goal, then use async replication. I think I have gone about this the wrong way. Async does not meet the technical or business requirements that I have. Sync does except that it increases the possibility of an outage. That is the requirement I am trying to address. > > The purpose of sync rep is to know determinatively whether or not you > have lost data when disaster strikes. If knowing for certain isn't > important to you, then use async. PostgreSQL Sync replication increases the possibility of an outage. That is incorrect behavior. I want sync because on the chance that the master goes down, I have as much data as possible to fail over to. However, I can't use sync because it increases the possibility that my business will not be able to function on the chance that the standby goes down. > > What's a bad idea is adding an auto-degrade option without any tools to > manage and monitor it, which is what this patch does by my reading. If This we absolutely agree on. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 01/10/2014 03:38 PM, Joshua D. Drake wrote: > > On 01/10/2014 03:17 PM, Josh Berkus wrote: > >>> Any continuous replication should not be a SPOF. The current behavior >>> guarantees that a two node sync cluster is a SPOF. The proposed behavior >>> removes that. >> >> Again, if that's your goal, then use async replication. > > I think I have gone about this the wrong way. Async does not meet the > technical or business requirements that I have. Sync does except that it > increases the possibility of an outage. That is the requirement I am > trying to address. > >> >> The purpose of sync rep is to know determinatively whether or not you >> have lost data when disaster strikes. If knowing for certain isn't >> important to you, then use async. > > PostgreSQL Sync replication increases the possibility of an outage. That > is incorrect behavior. > > I want sync because on the chance that the master goes down, I have as > much data as possible to fail over to. However, I can't use sync because > it increases the possibility that my business will not be able to > function on the chance that the standby goes down. > >> >> What's a bad idea is adding an auto-degrade option without any tools to >> manage and monitor it, which is what this patch does by my reading. If > > This we absolutely agree on. As I see it the state of replication in Postgres is as follows. 1) Async. Runs at the speed of the master as it does not have to wait on the standby to signal a successful commit. There is some degree of offset between master and standby(s) due to latency. 2) Sync. Runs at the speed of the standby + latency between master and standby. This is counter balanced by knowledge that the master and standby are in the same state. As Josh Berkus pointed out there is a loop hole in this when multiple standbys are involved. The topic under discussion is an intermediate mode between 1 and 2. There seems to be a consensus that this is not unreasonable. The issue seems to be how to achieve this with ideas falling into roughly two camps. A) Change the existing sync mode to allow the master and standby fall out of sync should a standby fall over. B) Create a new mode that does this without changing the existing sync mode. My two cents would be to implement B. Sync to me is a contract that master and standby are in sync at any point in time. Anything else should be called something else. Then it is up to the documentation to clearly point out the benefits/pitfalls. If you want to implement something as important as replication without reading the docs then the results are on you. > > JD > > -- Adrian Klaver adrian.klaver@gmail.com
Adrian, * Adrian Klaver (adrian.klaver@gmail.com) wrote: > A) Change the existing sync mode to allow the master and standby > fall out of sync should a standby fall over. I'm not sure that anyone is argueing for this.. > B) Create a new mode that does this without changing the existing sync mode. > > My two cents would be to implement B. Sync to me is a contract that > master and standby are in sync at any point in time. Anything else > should be called something else. Then it is up to the documentation > to clearly point out the benefits/pitfalls. If you want to implement > something as important as replication without reading the docs then > the results are on you. The issue is that there are folks who are argueing, essentially, that "B" is worthless, wrong, and no one should want it and therefore we shouldn't have it. Thanks, Stephen
On 01/10/2014 04:25 PM, Stephen Frost wrote: > Adrian, > > > * Adrian Klaver (adrian.klaver@gmail.com) wrote: >> A) Change the existing sync mode to allow the master and standby >> fall out of sync should a standby fall over. > > I'm not sure that anyone is argueing for this.. Looks like here, unless I am really missing the point: http://www.postgresql.org/message-id/52D07466.6070005@commandprompt.com "Proposed behavior: db01->sync->db02 Transactions are happening. Everything is happy. Website is up. Orders are being made. db02 goes down. It doesn't matter why. It is down. db01 continues to accept orders, allow people to log into the website and we can still service accounts. The continuity of service continues. Yes, there are all kinds of things that need to be considered when that happens, that isn't the point. The point is, PostgreSQL continues its uptime guarantee and allows the business to continue to function as (if) nothing has happened. For many and I dare say the majority of businesses, this is enough. They know that if the slave goes down they can continue to operate. They know if the master goes down they can fail over. They know that while both are up they are using sync rep (with various caveats). They are happy. They like that it is simple and just works. They continue to use PostgreSQL. " > >> B) Create a new mode that does this without changing the existing sync mode. >> >> My two cents would be to implement B. Sync to me is a contract that >> master and standby are in sync at any point in time. Anything else >> should be called something else. Then it is up to the documentation >> to clearly point out the benefits/pitfalls. If you want to implement >> something as important as replication without reading the docs then >> the results are on you. > > The issue is that there are folks who are argueing, essentially, that > "B" is worthless, wrong, and no one should want it and therefore we > shouldn't have it. Well you will not please everyone, just displease the least. > > Thanks, > > Stephen > -- Adrian Klaver adrian.klaver@gmail.com
Adrian, * Adrian Klaver (adrian.klaver@gmail.com) wrote: > On 01/10/2014 04:25 PM, Stephen Frost wrote: > >* Adrian Klaver (adrian.klaver@gmail.com) wrote: > >>A) Change the existing sync mode to allow the master and standby > >>fall out of sync should a standby fall over. > > > >I'm not sure that anyone is argueing for this.. > > Looks like here, unless I am really missing the point: Elsewhere in the thread, JD agreed that having it as an independent option was fine. > Well you will not please everyone, just displease the least. Well, sure, but we do generally try to reach concensus. :) Thanks, Stephen
On 01/10/2014 04:38 PM, Stephen Frost wrote: > Adrian, > > * Adrian Klaver (adrian.klaver@gmail.com) wrote: >> On 01/10/2014 04:25 PM, Stephen Frost wrote: >>> * Adrian Klaver (adrian.klaver@gmail.com) wrote: >>>> A) Change the existing sync mode to allow the master and standby >>>> fall out of sync should a standby fall over. >>> >>> I'm not sure that anyone is argueing for this.. >> >> Looks like here, unless I am really missing the point: > > Elsewhere in the thread, JD agreed that having it as an independent > option was fine. Yes. I am fine with an independent option. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 1/10/14, 6:19 PM, Adrian Klaver wrote: > 1) Async. Runs at the speed of the master as it does not have to wait on the standby to signal a successful commit. Thereis some degree of offset between master and standby(s) due to latency. > > 2) Sync. Runs at the speed of the standby + latency between master and standby. This is counter balanced by knowledge thatthe master and standby are in the same state. As Josh Berkus pointed out there is a loop hole in this when multiple standbysare involved. > > The topic under discussion is an intermediate mode between 1 and 2. There seems to be a consensus that this is not unreasonable. That's not what's actually under debate; allow me to restate as option 3: 3) Sync. Everything you said, plus: "If for ANY reason the master can not talk to the slave it becomes read-only." That's the current state. What many people want is something along the lines of what you said in 2: The slave ALWAYS has everything the master does(at least on disk) unless the connection between master and slave fails. The reason people want this is it protects you against a *single* fault. If just the master blows up, you have a 100% reliableslave. If the connection (or the slave itself) blows up, the master is still working. I agree that there's a non-obvious gotcha here: in the case of a master failure you might also have experienced a connectionfailure, and without some kind of 3rd party involved you have no way to know that. We should make best efforts to make that gotcha as clear to users as we can. But just because some users will blindly ignorethat doesn't mean we flat-out shouldn't support those that will understand the gotcha and accept it's limitations. BTW, if ALTER SYSTEM SET actually does make it possible to implement automated failover without directly adding it to Postgresthen I think a good compromise would be to have an external project that does just that and have the docs referencethat project and explain why we haven't built it in. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 01/10/2014 04:48 PM, Joshua D. Drake wrote: > > On 01/10/2014 04:38 PM, Stephen Frost wrote: >> Adrian, >> >> * Adrian Klaver (adrian.klaver@gmail.com) wrote: >>> On 01/10/2014 04:25 PM, Stephen Frost wrote: >>>> * Adrian Klaver (adrian.klaver@gmail.com) wrote: >>>>> A) Change the existing sync mode to allow the master and standby >>>>> fall out of sync should a standby fall over. >>>> >>>> I'm not sure that anyone is argueing for this.. >>> >>> Looks like here, unless I am really missing the point: >> >> Elsewhere in the thread, JD agreed that having it as an independent >> option was fine. > > Yes. I am fine with an independent option. I missed that. What confused me and seems to be generally confusing is the overloading of the term sync: "Proposed behavior: db01->sync->db02 " In my mind if that is an independent option it should have different name. I propose Schrödinger:) > > JD > > > -- Adrian Klaver adrian.klaver@gmail.com
On Wed, 2014-01-08 at 17:56 -0500, Stephen Frost wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: > > That's why you should configure a second standby as another (candidate) > > synchronous replica, also listed in synchronous_standby_names. > > Perhaps we should stress in the docs that this is, in fact, the *only* > reasonable mode in which to run with sync rep on? Where there are > multiple replicas, because otherwise Drake is correct that you'll just > end up having both nodes go offline if the slave fails. It's not unreasonable to run with only two if the writers are consuming from a reliable message queue (or another system that maintains its own reliable persistence). Then you can just continue processing messages after you have repaired your replication pair.
On Fri, Jan 10, 2014 at 03:17:34PM -0800, Josh Berkus wrote: > The purpose of sync rep is to know determinatively whether or not you > have lost data when disaster strikes. If knowing for certain isn't > important to you, then use async. > > BTW, people are using RAID1 as an analogy to 2-node sync replication. > That's a very bad analogy, because in RAID1 you have a *single* > controller which is capable of determining if the disks are in a failed > state or not, and this is all happening on a single node where things > like network outages aren't a consideration. It's really not the same > situation at all. > > Also, frankly, I absolutely can't count the number of times I've had to > rescue a customer or family member who had RAID1 but wan't monitoring > syslog, and so one of their disks had been down for months without them > knowning it. Heck, I've done this myself. > > So ... the Filesystem geeks have already been through this. Filesystem > clustering started out with systems like DRBD, which includes an > auto-degrade option. However, DBRD with auto-degrade is widely > considered untrustworthy and is a significant portion of why DBRD isn't > trusted today. > > >From here, clustered filesystems went in two directions: RHCS added > layers of monitoring and management to make auto-degrade a safer option > than it is with DRBD (and still not the default option). Scalable > clustered filesystems added N(M) quorum commit in order to support more > than 2 nodes. Either of these courses are reasonable for us to pursue. > > What's a bad idea is adding an auto-degrade option without any tools to > manage and monitor it, which is what this patch does by my reading. If > I'm wrong, then someone can point it out to me. Yes, my big take-away from the discussion is that informing the admin in a durable way is a requirement for this degraded mode. You are right that many ignore RAID degradation warnings, but with the warnings heeded, degraded functionality can be useful. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Fri, Jan 10, 2014 at 03:27:10PM -0800, Josh Berkus wrote: > On 01/10/2014 01:49 PM, Andres Freund wrote: > > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote: > >> > >> On 01/10/2014 07:47 AM, Bruce Momjian wrote: > >> > >>> I know there was a desire to remove this TODO item, but I think we have > >>> brought up enough new issues that we can keep it to see if we can come > >>> up with a solution. I have added a link to this discussion on the TODO > >>> item. > >>> > >>> I think we will need at least four new GUC variables: > >>> > >>> * timeout control for degraded mode > >>> * command to run during switch to degraded mode > >>> * command to run during switch from degraded mode > >>> * read-only variable to report degraded mode > > I would argue that we don't need the first. We just want a command to > switch synchronous/degraded, and a variable (or function) to report on > degraded mode. If we have those things, then it becomes completely > possible to have an external monitoring framework, which is capable of > answering questions like "is the replica down or just slow?", control > degrade. > > Oh, wait! We DO have such a command. It's called ALTER SYSTEM SET! > Recently committed. So this is really a solvable issue if one is > willing to use an external utility. How would that work? Would it be a tool in contrib? There already is a timeout, so if a tool checked more frequently than the timeout, it should work. The durable notification of the admin would happen in the tool, right? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Fri, Jan 10, 2014 at 9:17 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote: >> Here I think if user is aware from beginning that this is the behaviour, >> then may be the importance of message is not very high. >> What I want to say is that if we provide a UI in such a way that user >> decides during setup of server the behavior that is required by him. >> >> For example, if we provide a new parameter >> available_synchronous_standby_names along with current parameter >> and ask user to use this new parameter, if he wishes to synchronously >> commit transactions on another server when it is available, else it will >> operate as a standalone sync master. > > I know there was a desire to remove this TODO item, but I think we have > brought up enough new issues that we can keep it to see if we can come > up with a solution. I am not telling any such thing, rather I am suggesting some other way for this new mode. > I have added a link to this discussion on the TODO > item. > > I think we will need at least four new GUC variables: > > * timeout control for degraded mode > * command to run during switch to degraded mode > * command to run during switch from degraded mode > * read-only variable to report degraded mode Okay, this is one way of providing this new mode, others could be: a. Have just one GUC sync_standalone_mode = true|false and make this as PGC_POSTMASTER parameter, so that user is only allowed to set this mode at startup. Even if we don't want it as Postmaster parameter, we can mention to users that they can change this parameter only before server reaches current situation. I understand that without any alarm or some other way, it is difficult for user to know and change it, but I think in that case he should set it before server startup. b. On above lines, instead of boolean parameter, provide a parameter similar to current one such as available_synchronous_standby_names, setting of this should follow what I said in point a. The benefit in this as compare to 'a' is that it appears to be more like what we currently have. I think if we try to solve this problem by providing a way so that user can change it at runtime or when the problem actually occurred, it can make the UI more complex and difficult for us to provide a way so that user can be alerted on such situation. We can keep our options open so that if tomorrow, we can find any reasonable way, then we can provide it to user a mechanism for changing this at runtime, but I don't think it is stopping us from providing a way with which user can get the benefit of this mode by providing start time parameter. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Jan 11, 2014 at 01:29:23PM +0530, Amit Kapila wrote: > Okay, this is one way of providing this new mode, others could be: > > a. > Have just one GUC sync_standalone_mode = true|false and make > this as PGC_POSTMASTER parameter, so that user is only > allowed to set this mode at startup. Even if we don't want it as > Postmaster parameter, we can mention to users that they can > change this parameter only before server reaches current situation. > I understand that without any alarm or some other way, it is difficult > for user to know and change it, but I think in that case he should > set it before server startup. > > b. > On above lines, instead of boolean parameter, provide a parameter > similar to current one such as available_synchronous_standby_names, > setting of this should follow what I said in point a. The benefit in this > as compare to 'a' is that it appears to be more like what we currently have. > > I think if we try to solve this problem by providing a way so that user > can change it at runtime or when the problem actually occurred, it can > make the UI more complex and difficult for us to provide a way so that > user can be alerted on such situation. We can keep our options open > so that if tomorrow, we can find any reasonable way, then we can > provide it to user a mechanism for changing this at runtime, but I don't > think it is stopping us from providing a way with which user can get the > benefit of this mode by providing start time parameter. I am not sure how this would work. Right now we wait for one of the synchronous_standby_names servers to verify the writes. We need some way of telling the system how long to wait before continuing in degraded mode. Without a timeout and admin notification, it doesn't seem much better than our async mode, which is what many people were complaining about. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Jan11, 2014, at 01:48 , Joshua D. Drake <jd@commandprompt.com> wrote: > On 01/10/2014 04:38 PM, Stephen Frost wrote: >> Adrian, >> >> * Adrian Klaver (adrian.klaver@gmail.com) wrote: >>> On 01/10/2014 04:25 PM, Stephen Frost wrote: >>>> * Adrian Klaver (adrian.klaver@gmail.com) wrote: >>>>> A) Change the existing sync mode to allow the master and standby >>>>> fall out of sync should a standby fall over. >>>> >>>> I'm not sure that anyone is argueing for this.. >>> >>> Looks like here, unless I am really missing the point: >> >> Elsewhere in the thread, JD agreed that having it as an independent >> option was fine. > > Yes. I am fine with an independent option. Hm, I was about to suggest that you can set statement_timeout before doing COMMIT to limit the amount of time you want to wait for the standby to respond. Interestingly, however, that doesn't seem to work, which is weird, since AFAICS statement_timeout simply generates a query cancel requester after the timeout has elapsed, and cancelling the COMMIT with Ctrl-C in psql *does* work. I'm quite probably missing something, but what? best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > Hm, I was about to suggest that you can set statement_timeout before > doing COMMIT to limit the amount of time you want to wait for the > standby to respond. Interestingly, however, that doesn't seem to work, > which is weird, since AFAICS statement_timeout simply generates a > query cancel requester after the timeout has elapsed, and cancelling > the COMMIT with Ctrl-C in psql *does* work. > I'm quite probably missing something, but what? finish_xact_command() disables statement timeout before committing. Not sure about the pros and cons of doing that later in the sequence. regards, tom lane
On 2014-01-11 18:28:31 +0100, Florian Pflug wrote: > Hm, I was about to suggest that you can set statement_timeout before > doing COMMIT to limit the amount of time you want to wait for the > standby to respond. Interestingly, however, that doesn't seem to work, > which is weird, since AFAICS statement_timeout simply generates a > query cancel requester after the timeout has elapsed, and cancelling > the COMMIT with Ctrl-C in psql *does* work. I think that'd be a pretty bad API since you won't know whether the commit failed or succeeded but replication timed out. There very well might have been longrunning constraint triggers or such taking a long time. So it really would need a separate GUC. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/01/14 13:25, Stephen Frost wrote: > Adrian, > > > * Adrian Klaver (adrian.klaver@gmail.com) wrote: >> A) Change the existing sync mode to allow the master and standby >> fall out of sync should a standby fall over. > > I'm not sure that anyone is argueing for this.. > >> B) Create a new mode that does this without changing the existing sync mode. >> >> My two cents would be to implement B. Sync to me is a contract that >> master and standby are in sync at any point in time. Anything else >> should be called something else. Then it is up to the documentation >> to clearly point out the benefits/pitfalls. If you want to implement >> something as important as replication without reading the docs then >> the results are on you. > > The issue is that there are folks who are argueing, essentially, that > "B" is worthless, wrong, and no one should want it and therefore we > shouldn't have it. > We have some people who clearly do want it (and seemed to have provided sensible arguments about why it might be worthwhile), and the others who say they should not. My 2c is: The current behavior in CAP theorem speak is 'Cap' - i.e focused on consistency at the expense of availability. A reasonable thing to want. The other behavior being asked for is 'cAp' - i.e focused on availability. Also a reasonable configuration to want. Now the desire to use sync rather than async is to achieve as much consistency as possible, which is also reasonable. I think an option to control whether we operate 'Cap' or 'cAp' (defaulting to the current 'Cap' I guess) is probably the best solution. Regards Mark
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes [slightly rearranged] > My 2c is: > The current behavior in CAP theorem speak is 'Cap' - i.e focused on > consistency at the expense of availability. A reasonable thing to want. > The other behavior being asked for is 'cAp' - i.e focused on > availability. Also a reasonable configuration to want. > I think an option to control whether we operate 'Cap' or 'cAp' > (defaulting to the current 'Cap' I guess) is probably the best solution. The above is all perfectly reasonable. The argument that's not been made to my satisfaction is that the proposed patch is a good implementation of 'cAp'-optimized behavior. In particular, > ... Now the desire to > use sync rather than async is to achieve as much consistency as > possible, which is also reasonable. I don't think that the existing sync mode is designed to do that, and simply lobotomizing it as proposed doesn't get you there. I think we need a replication mode that's been designed *from the ground up* with cAp priorities in mind. There may end up being only a few actual differences in behavior --- but I fear that some of those differences will be crucial. regards, tom lane
On 01/10/2014 06:27 PM, Bruce Momjian wrote: > How would that work? Would it be a tool in contrib? There already is a > timeout, so if a tool checked more frequently than the timeout, it > should work. The durable notification of the admin would happen in the > tool, right? Well, you know what tool *I'm* planning to use. Thing is, when we talk about auto-degrade, we need to determine things like "Is the replica down or is this just a network blip"? and take action according to the user's desired configuration. This is not something, realistically, that we can do on a single request. Whereas it would be fairly simple for an external monitoring utility to do: 1. decide replica is offline for the duration (several poll attempts have failed) 2. Send ALTER SYSTEM SET to the master and change/disable the synch_replicas. Such a tool would *also* be capable of detecting when the synchronous replica was back up and operating, and switch back to sync mode, something we simply can't do inside Postgres. And it would be a lot easier to configure an external tool with monitoring system integration so that it can alert the DBA to degradation in a way which the DBA was liable to actually see (which is NOT the Postgres log). In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sun, Jan 12, 2014 at 8:48 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 01/10/2014 06:27 PM, Bruce Momjian wrote: >> How would that work? Would it be a tool in contrib? There already is a >> timeout, so if a tool checked more frequently than the timeout, it >> should work. The durable notification of the admin would happen in the >> tool, right? > > Well, you know what tool *I'm* planning to use. > > Thing is, when we talk about auto-degrade, we need to determine things > like "Is the replica down or is this just a network blip"? and take > action according to the user's desired configuration. This is not > something, realistically, that we can do on a single request. Whereas > it would be fairly simple for an external monitoring utility to do: > > 1. decide replica is offline for the duration (several poll attempts > have failed) > > 2. Send ALTER SYSTEM SET to the master and change/disable the > synch_replicas. Will it possible in current mechanism, because presently master will not accept any new command when the sync replicais not available? Or is there something else also which needs to be done along with above 2 points to make it possible. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote: > In other words, if we're going to have auto-degrade, the most > intelligent place for it is in > RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* > place. Anything we do *inside* Postgres is going to have a really, > really hard time determining when to degrade. Well, one goal I was considering is that if a commit is hung waiting for slave sync confirmation, and the timeout happens, then the mode is changed to degraded and the commit returns success. I am not sure how you would do that in an external tool, meaning there is going to be period where commits fail, unless you think there is a way that when the external tool changes the mode to degrade that all hung commits complete. That would be nice. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, Jan 11, 2014 at 9:41 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, Jan 11, 2014 at 01:29:23PM +0530, Amit Kapila wrote: >> Okay, this is one way of providing this new mode, others could be: >> >> a. >> Have just one GUC sync_standalone_mode = true|false and make >> this as PGC_POSTMASTER parameter, so that user is only >> allowed to set this mode at startup. Even if we don't want it as >> Postmaster parameter, we can mention to users that they can >> change this parameter only before server reaches current situation. >> I understand that without any alarm or some other way, it is difficult >> for user to know and change it, but I think in that case he should >> set it before server startup. >> >> b. >> On above lines, instead of boolean parameter, provide a parameter >> similar to current one such as available_synchronous_standby_names, >> setting of this should follow what I said in point a. The benefit in this >> as compare to 'a' is that it appears to be more like what we currently have. >> >> I think if we try to solve this problem by providing a way so that user >> can change it at runtime or when the problem actually occurred, it can >> make the UI more complex and difficult for us to provide a way so that >> user can be alerted on such situation. We can keep our options open >> so that if tomorrow, we can find any reasonable way, then we can >> provide it to user a mechanism for changing this at runtime, but I don't >> think it is stopping us from providing a way with which user can get the >> benefit of this mode by providing start time parameter. > > I am not sure how this would work. Right now we wait for one of the > synchronous_standby_names servers to verify the writes. We need some > way of telling the system how long to wait before continuing in degraded > mode. Without a timeout and admin notification, it doesn't seem much > better than our async mode, which is what many people were complaining > about. It is better than async mode in a way such that in async mode it never waits for commits to be written to standby, but in this new mode it will do so unless it is not possible (all sync standby's goes down). Can't we use existing wal_sender_timeout, or even if user expects a different timeout because for this new mode, he expects master to wait more before it start operating like standalone sync master, we can provide a new parameter. With this the definition of new mode is to provide maximum availability. We can define the behavior in this new mode as: a. It will operate like current synchronous master till one of the standby mentioned in available_synchronous_standby_namesis available. b. If none is available, then it will start operating link current async master, which means that if any async standbyis configured, then it will start sending WAL to that standby asynchronously, else if none is configured, it willstart operating in a standalone master. c. We can even provide a new parameter replication_mode here (non persistent), which will tell to user that master hasswitched its mode, this can be made available by view. Update the value of parameter when server switches to new mode. d. When one of the standby mentioned in available_synchronous_standby_names comes back and able to resolve all WAL difference,then it will again switch back to sync mode, where it will write to that standby before Commit finishes. Afterswitch, it will update the replication_mode parameter. Now I think with above definition and behavior, it can switch to new mode and will be able to provide information if user wants it by using view. In above behaviour, the tricky part would be point 'd' where it has to switch back to sync mode when one of the sync standby become available, but I think we can workout design for that if you are positive about the above definition and behaviour as defined by 4 points. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Jan11, 2014, at 18:53 , Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-01-11 18:28:31 +0100, Florian Pflug wrote: >> Hm, I was about to suggest that you can set statement_timeout before >> doing COMMIT to limit the amount of time you want to wait for the >> standby to respond. Interestingly, however, that doesn't seem to work, >> which is weird, since AFAICS statement_timeout simply generates a >> query cancel requester after the timeout has elapsed, and cancelling >> the COMMIT with Ctrl-C in psql *does* work. > > I think that'd be a pretty bad API since you won't know whether the > commit failed or succeeded but replication timed out. There very well > might have been longrunning constraint triggers or such taking a long > time. You could still distinguish these cases because the COMMIT would succeed with a WARNING if the timeout elapses while waiting for the standby, just as it does for query cancellations already. I'm not saying that this is a great API, though - I brought it up only because I accepting cancellation requests but ignoring timeouts seems a bit inconsistent to me. best regards, Florian Pflug
All, I'm leading this off with a review of the features offered by the actual patch submitted. My general discussion of the issues of Sync Degrade, which justifies my specific suggestions below, follows that. Rajeev, please be aware that other hackers may have different opinions than me on what needs to change about the patch, so you should collect all opinions before changing code. ======================= > Add a new parameter : > synchronous_standalone_master = on | off I think this is a TERRIBLE name for any such parameter. What does "synchronous standalone" even mean? A better name for the parameter would be "auto_degrade_sync_replication" or "synchronous_timeout_action = error | degrade", or something similar. It would be even better for this to be a mode of synchronous_commit, except that synchronous_commit is heavily overloaded already. Some issues raised by this log script: LOG: standby "tx0113" is now the synchronous standby with priority 1 LOG: waiting for standby synchronization <-- standby wal receiver on the standby is killed (SIGKILL) LOG: unexpected EOF on standby connection LOG: not waiting for standby synchronization <-- restart standby so that it connects again LOG: standby "tx0113" is now the synchronous standby with priority 1 LOG: waiting for standby synchronization <-- standby wal receiver is first stopped (SIGSTOP) to make sure The "not waiting for standby synchronization" message should be marked something stronger than LOG. I'd like ERROR. Second, you have the master resuming sync rep when the standby reconnects. How do you determine when it's safe to do that? You're making the assumption that you have a failing sync standby instead of one which simply can't keep up with the master, or a flakey network connection (see discussion below). > a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. > > b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. I'm not at all clear what the difference between these two commands is.When would one be excuted, and when would the otherbe executed? Also, renaming ... Missing features: a) we should at least send committing clients a WARNING if they have commited a synchronous transaction and we are in degraded mode. I know others have dismissed this idea as too "talky", but from my perspective, the agreement with the client for each synchronous commit is being violated, so each and every synchronous commit should report failure to sync. Also, having a warning on every commit would make it easier to troubleshoot degraded mode for users who have ignored the other warnings we give them. b) pg_stat_replication needs to show degraded mode in some way, or we need pg_sync_rep_degraded(), or (ideally) both. I'm also wondering if we need a more sophisticated approach to wal_sender_timeout to go with all this. ======================= On 01/11/2014 08:33 PM, Bruce Momjian wrote: > On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote: >> In other words, if we're going to have auto-degrade, the most >> intelligent place for it is in >> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >> place. Anything we do *inside* Postgres is going to have a really, >> really hard time determining when to degrade. > > Well, one goal I was considering is that if a commit is hung waiting for > slave sync confirmation, and the timeout happens, then the mode is > changed to degraded and the commit returns success. I am not sure how > you would do that in an external tool, meaning there is going to be > period where commits fail, unless you think there is a way that when the > external tool changes the mode to degrade that all hung commits > complete. That would be nice. Realistically, though, that's pretty unavoidable. Any technique which waits a reasonable interval to determine that the replica isn't going to respond is liable to go beyond the application's timeout threshold anyway. There are undoubtedly exceptions to that, but it will be the case a lot of the time -- how many applications are willing to wait *minutes* for a COMMIT? I also don't see any way to allow the hung transactions to commit without allowing the walsender to make a decision on degrading. As I've outlined elsewhere (and below), the walsender just doesn't have enough information to make a good decision. On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode in a way such that in async mode it never > waits for commits to be written to standby, but in this new mode it will > do so unless it is not possible (all sync standby's goes down). > Can't we use existing wal_sender_timeout, or even if user expects a > different timeout because for this new mode, he expects master to wait > more before it start operating like standalone sync master, we can provide > a new parameter. One of the reasons that there's so much disagreement about this feature is that most of the folks strongly in favor of auto-degrade are thinking *only* of the case that the standby is completely down. There are many other reasons for a sync transaction to hang, and the walsender has absolutely no way of knowing which is the case. For example: * Transient network issues * Standby can't keep up with master * Postgres bug * Storage/IO issues (think EBS) * Standby is restarting You don't want to handle all of those issues the same way as far as sync rep is concerned. For example, if the standby is restaring, you probably want to wait instead of degrading. There's also the issue that this patch, and necessarily any walsender-level auto-degrade, has IMHO no safe way to resume sync replication. This means that any use who has a network or storage blip once a day (again, think AWS) would be constantly in degraded mode, even though both the master and the replica are up and running -- and it will come as a complete surprise to them when the lose the master and discover that they've lost data. This is why, as I've said, any auto-degrade patch needs to treat auto-degrade as a major event, and alert users in all ways reasonable. See my concrete proposals at the beginning of this email for what I mean. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode > in a way such that in async mode it never > > waits for commits to be written to standby, but in this new mode it will > > do so unless it is not possible (all sync standby's goes down). > > Can't we use existing wal_sender_timeout, or even if user expects a > > different timeout because for this new mode, he expects master to wait > > more before it start operating like standalone sync master, we can provide > > a new parameter. > > One of the reasons that there's so much disagreement about this feature > is that most of the folks strongly in favor of auto-degrade are thinking > *only* of the case that the standby is completely down. There are many > other reasons for a sync transaction to hang, and the walsender has > absolutely no way of knowing which is the case. For example: Uhh, yea, no, I'm pretty sure those in favor of auto-degrade are very specifically thinking of cases like "Standby is restarting", which is not a reason for the master to fall over. > * Transient network issues > * Standby can't keep up with master > * Postgres bug > * Storage/IO issues (think EBS) > * Standby is restarting > > You don't want to handle all of those issues the same way as far as sync > rep is concerned. For example, if the standby is restaring, you > probably want to wait instead of degrading. *What*?! Certainly not in any kind of OLTP-type system; a system restart can easily take minutes. Clearly, you want to resume once the standby is back up, which I feel like the people against an auto-degrade mode are missing, but holding up a commit until the standby finishes rebooting isn't practical. > There's also the issue that this patch, and necessarily any > walsender-level auto-degrade, has IMHO no safe way to resume sync > replication. This means that any use who has a network or storage blip > once a day (again, think AWS) would be constantly in degraded mode, even > though both the master and the replica are up and running -- and it will > come as a complete surprise to them when the lose the master and > discover that they've lost data. I don't follow this logic at all- why is there no safe way to resume? You wait til the slave is caught up fully and then go back to sync mode. If that turns out to be an extended problem then an alarm needs to be raised, of course. Thanks, Stephen
Josh Berkus <josh@agliodbs.com> wrote: >> Add a new parameter : > >> synchronous_standalone_master = on | off > > I think this is a TERRIBLE name for any such parameter. What does > "synchronous standalone" even mean? A better name for the parameter > would be "auto_degrade_sync_replication" or > "synchronous_timeout_action > = error | degrade", or something similar. It would be even better for > this to be a mode of synchronous_commit, except that synchronous_commit > is heavily overloaded already. +1 > a) we should at least send committing clients a WARNING if they have > commited a synchronous transaction and we are in degraded mode. > > I know others have dismissed this idea as too "talky", but from my > perspective, the agreement with the client for each synchronous commit > is being violated, so each and every synchronous commit should report > failure to sync. Also, having a warning on every commit would make it > easier to troubleshoot degraded mode for users who have ignored the > other warnings we give them. I agree that every synchronous commit on a master which is configured for synchronous replication which returns without persistingthe work of the transaction on both the (local) primary and a synchronous replica should issue a WARNING. Thatsaid, the API for some connectors (like JDBC) puts the burden on the application or its framework to check for warningseach time and do something reasonable if found; I fear that a Venn diagram of those shops which would use this newfeature and those shops that don't rigorously look for and reasonably deal with warnings would have significant overlap. > b) pg_stat_replication needs to show degraded mode in some way, or we > need pg_sync_rep_degraded(), or (ideally) both. +1 Since this new feature, where enabled, would cause synchronous replication to provide no guarantees beyond what asynchronousreplication does[1], but would tend to cause people to have an *expectation* that they have some additional protection,I think proper documentation will be a big challenge. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] If I understand correctly, this is what the feature is intended to provide: - A transaction successfully committed on the primary is guaranteed to be visible on the replica? No, in all modes. - A transaction successfully committed on the primary is guaranteed *not* to be visible on the replica? No, in all modes. - A the work of a transaction which has not returned from a commit request may be visible on the primary and/or the standby? Yes in all modes. - A failure of the primary is guaranteed not to lose successfully committed transactions when failing over to the replica? Yes for sync rep without this feature, no for async or when this feature is used. If things are going well up tothe moment of primary failure, the feature improves the odds (versus async) that successfully committed transactions willnot be lost, or may reduce the number of successfully committed transactions lost. - A failure of the replica allows transactions on the primary to continue? Read only for sync rep without this feature ifthe last sync standby has failed, read only for some interval and then read write with this feature or if there is stillanother working sync rep target, all transactions without interruption with async.
On 01/12/2014 12:35 PM, Stephen Frost wrote: > * Josh Berkus (josh@agliodbs.com) wrote: >> You don't want to handle all of those issues the same way as far as sync >> rep is concerned. For example, if the standby is restaring, you >> probably want to wait instead of degrading. > > *What*?! Certainly not in any kind of OLTP-type system; a system > restart can easily take minutes. Clearly, you want to resume once the > standby is back up, which I feel like the people against an auto-degrade > mode are missing, but holding up a commit until the standby finishes > rebooting isn't practical. Well, then that becomes a reason to want better/more configurability. In the couple of sync rep sites I admin, I *would* want to wait. >> There's also the issue that this patch, and necessarily any >> walsender-level auto-degrade, has IMHO no safe way to resume sync >> replication. This means that any use who has a network or storage blip >> once a day (again, think AWS) would be constantly in degraded mode, even >> though both the master and the replica are up and running -- and it will >> come as a complete surprise to them when the lose the master and >> discover that they've lost data. > > I don't follow this logic at all- why is there no safe way to resume? > You wait til the slave is caught up fully and then go back to sync mode. > If that turns out to be an extended problem then an alarm needs to be > raised, of course. So, if you have auto-resume, how do you handle the "flaky network" case?And how would an alarm be raised? On 01/12/2014 12:51 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: >> I know others have dismissed this idea as too "talky", but from my >> perspective, the agreement with the client for each synchronous >> commit is being violated, so each and every synchronous commit >> should report failure to sync. Also, having a warning on every >> commit would make it easier to troubleshoot degraded mode for users >> who have ignored the other warnings we give them. > > I agree that every synchronous commit on a master which is configured > for synchronous replication which returns without persisting the work > of the transaction on both the (local) primary and a synchronous > replica should issue a WARNING. That said, the API for some > connectors (like JDBC) puts the burden on the application or its > framework to check for warnings each time and do something reasonable > if found; I fear that a Venn diagram of those shops which would use > this new feature and those shops that don't rigorously look for and > reasonably deal with warnings would have significant overlap. Oh, no question. However, having such a WARNING would help with interactive troubleshooting once a problem has been identified, and that's my main reason for wanting it. Imagine the case where you have auto-degrade and a flaky network. The user would experience problems as performance problems; that is, some commits take minutes on-again, off-again. They wouldn't necessarily even LOOK at the sync rep settings. So next step is to try walking through a sample transaction on the command line, and then the DBA/consultant gets WARNING messages, which gives an idea where the real problem lies. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > Well, then that becomes a reason to want better/more configurability. I agree with this- the challenge is figuring out what those options should be and how we should document them. > In the couple of sync rep sites I admin, I *would* want to wait. That's certainly an interesting data point. One of the specific use-cases that I'm thinking of is to auto-degrade on a graceful shutdown of the slave for upgrades and/or maintenance. Perhaps we don't need *auto* degrade in that case, but then an actual failure of the slave will also bring down the master. > > I don't follow this logic at all- why is there no safe way to resume? > > You wait til the slave is caught up fully and then go back to sync mode. > > If that turns out to be an extended problem then an alarm needs to be > > raised, of course. > > So, if you have auto-resume, how do you handle the "flaky network" case? > And how would an alarm be raised? Ideally, every time there is a auto-degrade, messages are logs to log files which are monitored and notices are sent to admins about it happening, who, upon getting repeated such emails, would realize there's a problem and work to fix it. > On 01/12/2014 12:51 PM, Kevin Grittner wrote: > > Josh Berkus <josh@agliodbs.com> wrote: > >> I know others have dismissed this idea as too "talky", but from my > >> perspective, the agreement with the client for each synchronous > >> commit is being violated, so each and every synchronous commit > >> should report failure to sync. Also, having a warning on every > >> commit would make it easier to troubleshoot degraded mode for users > >> who have ignored the other warnings we give them. > > > > I agree that every synchronous commit on a master which is configured > > for synchronous replication which returns without persisting the work > > of the transaction on both the (local) primary and a synchronous > > replica should issue a WARNING. That said, the API for some > > connectors (like JDBC) puts the burden on the application or its > > framework to check for warnings each time and do something reasonable > > if found; I fear that a Venn diagram of those shops which would use > > this new feature and those shops that don't rigorously look for and > > reasonably deal with warnings would have significant overlap. > > Oh, no question. However, having such a WARNING would help with > interactive troubleshooting once a problem has been identified, and > that's my main reason for wanting it. I'm in the camp of this being too 'talky'. > Imagine the case where you have auto-degrade and a flaky network. The > user would experience problems as performance problems; that is, some > commits take minutes on-again, off-again. They wouldn't necessarily > even LOOK at the sync rep settings. So next step is to try walking > through a sample transaction on the command line, and then the > DBA/consultant gets WARNING messages, which gives an idea where the real > problem lies. Or they look in the logs which hopefully say that their slave keeps getting disconnected... Thanks, Stephen
> On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode > in a way such that in async mode it never >> waits for commits to be written to standby, but in this new mode it will >> do so unless it is not possible (all sync standby's goes down). >> Can't we use existing wal_sender_timeout, or even if user expects a >> different timeout because for this new mode, he expects master to wait >> more before it start operating like standalone sync master, we can provide >> a new parameter. > > One of the reasons that there's so much disagreement about this feature > is that most of the folks strongly in favor of auto-degrade are thinking > *only* of the case that the standby is completely down. There are many > other reasons for a sync transaction to hang, and the walsender has > absolutely no way of knowing which is the case. For example: > > * Transient network issues > * Standby can't keep up with master > * Postgres bug > * Storage/IO issues (think EBS) > * Standby is restarting > > You don't want to handle all of those issues the same way as far as sync > rep is concerned. For example, if the standby is restaring, you > probably want to wait instead of degrading. I think it might be difficult to differentiate the cases except may be by having a separate timeout for this mode, sothat it can wait more when server runs in this mode. OTOH why can't we define this new mode such that it will behavesame for all cases, basically we can tell whenever sync standby is not available (n/w issue or m/c down), it will behave as master in async mode. Here I think the important point would be to gracefully allow resuming sync standbywhen it tries to reconnect (we can allow to reconnect if it can resolve all WAL differences.) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 13th January 2013, Josh Berkus Wrote: > I'm leading this off with a review of the features offered by the > actual patch submitted. My general discussion of the issues of Sync > Degrade, which justifies my specific suggestions below, follows that. > Rajeev, please be aware that other hackers may have different opinions > than me on what needs to change about the patch, so you should collect > all opinions before changing code. Thanks for reviewing and providing the first level of comments. Surely We'll collect all feedback to improve this patch. > > > Add a new parameter : > > > synchronous_standalone_master = on | off > > I think this is a TERRIBLE name for any such parameter. What does > "synchronous standalone" even mean? A better name for the parameter > would be "auto_degrade_sync_replication" or "synchronous_timeout_action > = error | degrade", or something similar. It would be even better for > this to be a mode of synchronous_commit, except that synchronous_commit > is heavily overloaded already. Yes we can change this parameter name. Some of the suggestion in order to degrade the mode1. Auto-degrade using some sortof configuration parameter as done in current patch.2. Expose the configuration variable to a new SQL-callable functionsas suggested by Heikki.3. Or using ALTER SYSTEM SET as suggested by others. > Some issues raised by this log script: > > LOG: standby "tx0113" is now the synchronous standby with priority 1 > LOG: waiting for standby synchronization > <-- standby wal receiver on the standby is killed (SIGKILL) > LOG: unexpected EOF on standby connection > LOG: not waiting for standby synchronization > <-- restart standby so that it connects again > LOG: standby "tx0113" is now the synchronous standby with priority 1 > LOG: waiting for standby synchronization > <-- standby wal receiver is first stopped (SIGSTOP) to make sure > > The "not waiting for standby synchronization" message should be marked > something stronger than LOG. I'd like ERROR. Yes we can change this to ERROR. > Second, you have the master resuming sync rep when the standby > reconnects. How do you determine when it's safe to do that? You're > making the assumption that you have a failing sync standby instead of > one which simply can't keep up with the master, or a flakey network > connection (see discussion below). Yes this can be further improved so that only if we make sure that synchronous Standby has caught up with master node (may require a better design), then only master can be upgraded to Synchronous mode by one of the method discussed above. > > a. Master_to_standalone_cmd: To be executed before master > switches to standalone mode. > > > > b. Master_to_sync_cmd: To be executed before master switches > from > sync mode to standalone mode. > > I'm not at all clear what the difference between these two commands is. > When would one be excuted, and when would the other be executed? Also, > renaming ... There is typo mistake in above explain, meaning of two commands are: a. Master_to_standalone_cmd: To be executed during degradation of sync mode. b. Master_to_sync_cmd: To be executed before upgrade or restoration of mode. These two commands are per the TODO item to inform DBA. But as per Heikki suggestion, we should not use this mechanism to inform DBA rather We should some have some sort of generic trap system, instead of adding this one particular extra config option specifically for this feature. This looks to be better idea so we can have further discussion to come with proper design. > Missing features: > > a) we should at least send committing clients a WARNING if they have > commited a synchronous transaction and we are in degraded mode. Yes it is great idea. > One of the reasons that there's so much disagreement about this feature > is that most of the folks strongly in favor of auto-degrade are > thinking > *only* of the case that the standby is completely down. There are many > other reasons for a sync transaction to hang, and the walsender has > absolutely no way of knowing which is the case. For example: > > * Transient network issues > * Standby can't keep up with master > * Postgres bug > * Storage/IO issues (think EBS) > * Standby is restarting > > You don't want to handle all of those issues the same way as far as > sync rep is concerned. For example, if the standby is restaring, you > probably want to wait instead of degrading. I think if we support to have some external SQL-callable functions as Heikki suggested to degrade instead of auto-degrade then user can handle at-least some of the above scenarios if not all based on their experience and observation. Thanks and Regards, Kumar Rajeev Rastogi
> On Sun, Jan 12, Amit Kapila wrote: > >> How would that work? Would it be a tool in contrib? There already > >> is a timeout, so if a tool checked more frequently than the timeout, > >> it should work. The durable notification of the admin would happen > >> in the tool, right? > > > > Well, you know what tool *I'm* planning to use. > > > > Thing is, when we talk about auto-degrade, we need to determine > things > > like "Is the replica down or is this just a network blip"? and take > > action according to the user's desired configuration. This is not > > something, realistically, that we can do on a single request. > Whereas > > it would be fairly simple for an external monitoring utility to do: > > > > 1. decide replica is offline for the duration (several poll attempts > > have failed) > > > > 2. Send ALTER SYSTEM SET to the master and change/disable the > > synch_replicas. > > Will it possible in current mechanism, because presently master will > not accept any new command when the sync replica is not available? > Or is there something else also which needs to be done along with > above 2 points to make it possible. Since there is not WAL written for ALTER SYSTEM SET command, then it should be able to handle this command even thoughsync replica is not available. Thanks and Regards, Kumar Rajeev Rastogi
On Jan12, 2014, at 04:18 , Josh Berkus <josh@agliodbs.com> wrote: > Thing is, when we talk about auto-degrade, we need to determine things > like "Is the replica down or is this just a network blip"? and take > action according to the user's desired configuration. This is not > something, realistically, that we can do on a single request. Whereas > it would be fairly simple for an external monitoring utility to do: > > 1. decide replica is offline for the duration (several poll attempts > have failed) > > 2. Send ALTER SYSTEM SET to the master and change/disable the > synch_replicas. > > In other words, if we're going to have auto-degrade, the most > intelligent place for it is in > RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* > place. Anything we do *inside* Postgres is going to have a really, > really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. best regards, Florian Pflug
On 01/13/2014 04:12 PM, Florian Pflug wrote: > On Jan12, 2014, at 04:18 , Josh Berkus <josh@agliodbs.com> wrote: >> Thing is, when we talk about auto-degrade, we need to determine things >> like "Is the replica down or is this just a network blip"? and take >> action according to the user's desired configuration. This is not >> something, realistically, that we can do on a single request. Whereas >> it would be fairly simple for an external monitoring utility to do: >> >> 1. decide replica is offline for the duration (several poll attempts >> have failed) >> >> 2. Send ALTER SYSTEM SET to the master and change/disable the >> synch_replicas. >> >> In other words, if we're going to have auto-degrade, the most >> intelligent place for it is in >> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >> place. Anything we do *inside* Postgres is going to have a really, >> really hard time determining when to degrade. > +1 > > This is also how 2PC works, btw - the database provides the building > blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager > to deal with issues that require a whole-cluster perspective. > ++1 I like Simons idea to have a pg_xxx function for switching between replication modes, which should be enough to support a monitor daemon doing the switching. Maybe we could have an 'syncrep_taking_too_long_command' GUC which could be used to alert such a monitoring daemon, so it can immediately check weather to a) switch master to async rep or standalone mode (in case of sync slave becoming unavailable) or b) to failover to slave (in almost equally likely case that it was the master which became disconnected from the world and slave is available) or c) do something else depending on circumstances/policy :) NB! Note that in case of b) 'syncrep_taking_too_long_command' will very likely also not reach the monitor daemon, so it can not relay on this as main trigger! Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/13/2014 10:12 AM, Hannu Krosing wrote: >>> In other words, if we're going to have auto-degrade, the most >>> intelligent place for it is in >>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >>> place. Anything we do *inside* Postgres is going to have a really, >>> really hard time determining when to degrade. >> +1 >> >> This is also how 2PC works, btw - the database provides the building >> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager >> to deal with issues that require a whole-cluster perspective. >> > > ++1 +1 > > I like Simons idea to have a pg_xxx function for switching between > replication modes, which should be enough to support a monitor > daemon doing the switching. > > Maybe we could have an 'syncrep_taking_too_long_command' GUC > which could be used to alert such a monitoring daemon, so it can > immediately check weather to > I would think that would be a column in pg_stat_replication. Basically last_ack or something like that. > a) switch master to async rep or standalone mode (in case of sync slave > becoming unavailable) Yep. > > or > > b) to failover to slave (in almost equally likely case that it was the > master > which became disconnected from the world and slave is available) > > or I think this should be left to external tools. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On 1/13/14, 12:21 PM, Joshua D. Drake wrote: > > On 01/13/2014 10:12 AM, Hannu Krosing wrote: >>>> In other words, if we're going to have auto-degrade, the most >>>> intelligent place for it is in >>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >>>> place. Anything we do *inside* Postgres is going to have a really, >>>> really hard time determining when to degrade. >>> +1 >>> >>> This is also how 2PC works, btw - the database provides the building >>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager >>> to deal with issues that require a whole-cluster perspective. >>> >> >> ++1 > > +1 Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turnoff sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 2014-01-13 15:14:21 -0600, Jim Nasby wrote: > On 1/13/14, 12:21 PM, Joshua D. Drake wrote: > > > >On 01/13/2014 10:12 AM, Hannu Krosing wrote: > >>>>In other words, if we're going to have auto-degrade, the most > >>>>intelligent place for it is in > >>>>RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* > >>>>place. Anything we do *inside* Postgres is going to have a really, > >>>>really hard time determining when to degrade. > >>>+1 > >>> > >>>This is also how 2PC works, btw - the database provides the building > >>>blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager > >>>to deal with issues that require a whole-cluster perspective. > >>> > >> > >>++1 > > > >+1 > > Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when weturn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally. I think it'd be a fairly simple patch to re-check the state of syncrep config in SyncRepWaitForLsn(). Alternatively you can just write code to iterate over the procarray and sets Proc->syncRepState to SYNC_REP_WAIT_CANCELLED or such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/13/2014 01:14 PM, Jim Nasby wrote: > > On 1/13/14, 12:21 PM, Joshua D. Drake wrote: >> >> On 01/13/2014 10:12 AM, Hannu Krosing wrote: >>>>> In other words, if we're going to have auto-degrade, the most >>>>> intelligent place for it is in >>>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >>>>> place. Anything we do *inside* Postgres is going to have a really, >>>>> really hard time determining when to degrade. >>>> +1 >>>> >>>> This is also how 2PC works, btw - the database provides the building >>>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager >>>> to deal with issues that require a whole-cluster perspective. >>>> >>> >>> ++1 >> >> +1 > > Josh, what do you think of the upthread idea of being able to recover > in-progress transactions that are waiting when we turn off sync rep? I'm > thinking that would be a very good feature to have... and it's not > something you can easily do externally. I think it is extremely valuable, else we have lost those transactions which is exactly what we don't want. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc "In a time of universal deceit - telling the truth is a revolutionary act.", George Orwell
On Jan13, 2014, at 22:30 , "Joshua D. Drake" <jd@commandprompt.com> wrote: > On 01/13/2014 01:14 PM, Jim Nasby wrote: >> >> On 1/13/14, 12:21 PM, Joshua D. Drake wrote: >>> >>> On 01/13/2014 10:12 AM, Hannu Krosing wrote: >>>>>> In other words, if we're going to have auto-degrade, the most >>>>>> intelligent place for it is in >>>>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* >>>>>> place. Anything we do *inside* Postgres is going to have a really, >>>>>> really hard time determining when to degrade. >>>>> +1 >>>>> >>>>> This is also how 2PC works, btw - the database provides the building >>>>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager >>>>> to deal with issues that require a whole-cluster perspective. >>>>> >>>> >>>> ++1 >>> >>> +1 >> >> Josh, what do you think of the upthread idea of being able to recover >> in-progress transactions that are waiting when we turn off sync rep? I'm >> thinking that would be a very good feature to have... and it's not >> something you can easily do externally. > > I think it is extremely valuable, else we have lost those transactions which > is exactly what we don't want. We *have* to "recover" waiting transaction upon switching off sync rep. A transaction that waits for a sync standby to respond has already committed locally (i.e., updated the clog), it just hasn't updated the proc array yet, and thus is still seen as in-progress by the rest of the system. But rolling back the transaction is nevertheless *impossible* at that point (except by PITR, and hence the quoted around reciver). So the only alternative to "recovering" them, i.e. have them abort their waiting, is to let them linger indefinitely, still holding their locks, preventing xmin from advancing, etc, until either the client disconnects or the server is restarted. best regards, Florian Pflug
ISTM the consensus is that we need better monitoring/administration interfaces so that people can script the behavior they want in external tools. Also, a new synchronous apply replication mode would be handy, but that'd be a whole different patch. We don't have a patch on the table that we could consider committing any time soon, so I'm going to mark this as rejected in the commitfest app. - Heikki
On 01/24/2014 12:47 PM, Heikki Linnakangas wrote: > ISTM the consensus is that we need better monitoring/administration > interfaces so that people can script the behavior they want in external > tools. Also, a new synchronous apply replication mode would be handy, > but that'd be a whole different patch. We don't have a patch on the > table that we could consider committing any time soon, so I'm going to > mark this as rejected in the commitfest app. I don't feel that "we'll never do auto-degrade" is determinative; several hackers were for auto-degrade, and they have a good use-case argument. However, we do have consensus that we need more scaffolding than this patch supplies in order to make auto-degrade *safe*. I encourage the submitter to resumbit and improved version of this patch (one with more monitorability) for 9.5 CF1. That'll give us a whole dev cycle to argue about it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Jan24, 2014, at 22:29 , Josh Berkus <josh@agliodbs.com> wrote: > On 01/24/2014 12:47 PM, Heikki Linnakangas wrote: >> ISTM the consensus is that we need better monitoring/administration >> interfaces so that people can script the behavior they want in external >> tools. Also, a new synchronous apply replication mode would be handy, >> but that'd be a whole different patch. We don't have a patch on the >> table that we could consider committing any time soon, so I'm going to >> mark this as rejected in the commitfest app. > > I don't feel that "we'll never do auto-degrade" is determinative; > several hackers were for auto-degrade, and they have a good use-case > argument. However, we do have consensus that we need more scaffolding > than this patch supplies in order to make auto-degrade *safe*. > > I encourage the submitter to resumbit and improved version of this patch > (one with more monitorability) for 9.5 CF1. That'll give us a whole > dev cycle to argue about it. There seemed to be at least some support for having way to manually degrade from sync rep to async rep via something like ALTER SYSTEM SET synchronous_commit='local'; Doing that seems unlikely to meet much resistant on grounds of principle, so it seems to me that working on that would be the best way forward for the submitter. I don't know how hard it would be to pull this off, though. best regards, Florian Pflug
On 01/24/2014 10:29 PM, Josh Berkus wrote: > On 01/24/2014 12:47 PM, Heikki Linnakangas wrote: >> ISTM the consensus is that we need better monitoring/administration >> interfaces so that people can script the behavior they want in external >> tools. Also, a new synchronous apply replication mode would be handy, >> but that'd be a whole different patch. We don't have a patch on the >> table that we could consider committing any time soon, so I'm going to >> mark this as rejected in the commitfest app. > I don't feel that "we'll never do auto-degrade" is determinative; > several hackers were for auto-degrade, and they have a good use-case > argument. Auto-degrade may make sense together with synchronous apply mentioned by Heikki. I do not see much use for synchronous-(noapply)-if-you-can mode, though it may make some sense in some scenarios if sync failure is accompanied by loud screaming ("hey DBA, we are writing checks with no money in the bank, do something fast!") Perhaps some kind of sync-with-timeout mode, where timing out results with a "weak error" (something between current warning and error) returned to client and/or where it causes and external command to be run which could then be used to flood admins mailbox :) > However, we do have consensus that we need more scaffolding > than this patch supplies in order to make auto-degrade *safe*. > > I encourage the submitter to resumbit and improved version of this patch > (one with more monitorability) for 9.5 CF1. That'll give us a whole > dev cycle to argue about it. > Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 01/25/2014, Josh Berkus wrote: > > ISTM the consensus is that we need better monitoring/administration > > interfaces so that people can script the behavior they want in > > external tools. Also, a new synchronous apply replication mode would > > be handy, but that'd be a whole different patch. We don't have a > patch > > on the table that we could consider committing any time soon, so I'm > > going to mark this as rejected in the commitfest app. > > I don't feel that "we'll never do auto-degrade" is determinative; > several hackers were for auto-degrade, and they have a good use-case > argument. However, we do have consensus that we need more scaffolding > than this patch supplies in order to make auto-degrade *safe*. > > I encourage the submitter to resumbit and improved version of this > patch (one with more monitorability) for 9.5 CF1. That'll give us a > whole dev cycle to argue about it. I shall rework to improve this patch. Below are the summarization of all discussions, which will be used as input for improving the patch: 1. Method of degrading the synchronous mode:a. Expose the configuration variable to a new SQL-callable functions.b. UsingALTER SYSTEM SET.c. Auto-degrade using some sort of configuration parameter as done in current patch.d. Or may be combinationof above, which DBA can use depending on their use-cases. We can discuss further to decide on one of the approach. 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master. 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. I shall propose a better approach for this. 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. 5. Please add more if I am missing something. Thanks and Regards, Kumar Rajeev Rastogi
On Sun, Jan 26, 2014 at 10:56 PM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: > On 01/25/2014, Josh Berkus wrote: >> > ISTM the consensus is that we need better monitoring/administration >> > interfaces so that people can script the behavior they want in >> > external tools. Also, a new synchronous apply replication mode would >> > be handy, but that'd be a whole different patch. We don't have a >> patch >> > on the table that we could consider committing any time soon, so I'm >> > going to mark this as rejected in the commitfest app. >> >> I don't feel that "we'll never do auto-degrade" is determinative; >> several hackers were for auto-degrade, and they have a good use-case >> argument. However, we do have consensus that we need more scaffolding >> than this patch supplies in order to make auto-degrade *safe*. >> >> I encourage the submitter to resumbit and improved version of this >> patch (one with more monitorability) for 9.5 CF1. That'll give us a >> whole dev cycle to argue about it. > > I shall rework to improve this patch. Below are the summarization of all > discussions, which will be used as input for improving the patch: > > 1. Method of degrading the synchronous mode: > a. Expose the configuration variable to a new SQL-callable functions. > b. Using ALTER SYSTEM SET. > c. Auto-degrade using some sort of configuration parameter as done in current patch. > d. Or may be combination of above, which DBA can use depending on their use-cases. > > We can discuss further to decide on one of the approach. > > 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with themaster. > > 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. > > I shall propose a better approach for this. > > 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. > > 5. Please add more if I am missing something. All of those things have been mentioned, but I'm not sure we have consensus on which of them we actually want to do, or how. Figuring that out seems like the next step. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/26/2014 07:56 PM, Rajeev rastogi wrote: > I shall rework to improve this patch. Below are the summarization of all > discussions, which will be used as input for improving the patch: > > 1. Method of degrading the synchronous mode: > a. Expose the configuration variable to a new SQL-callable functions. > b. Using ALTER SYSTEM SET. > c. Auto-degrade using some sort of configuration parameter as done in current patch. > d. Or may be combination of above, which DBA can use depending on their use-cases. > > We can discuss further to decide on one of the approach. > > 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with themaster. > > 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system. > > I shall propose a better approach for this. > > 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode. > > 5. Please add more if I am missing something. I think we actually need two degrade modes: A. degrade once: if the sync standby connection is ever lost, degrade and do not resync. B. reconnect: if the sync standby catches up again, return it to sync status. The reason you'd want "degrade once" is to avoid the "flaky network" issue where you're constantly degrading then reattaching the sync standby, resulting in horrible performance. If we did offer "degrade once" though, we'd need some easy way to determine that the master was in a state of permanent degrade, and a command to make it resync. Discuss? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com