Thread: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Robert Haas
Date:
On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Make a hard state change from catchup to streaming mode. > More useful state change for monitoring purposes, plus a > required change for synchronous replication patch. As far as I can see, this patch was not posted or discussed before commit, and I'm not sure it's the behavior everyone wants. It has the effect of preventing the system from ever going backwards from "streaming" to "catchup". Is that what we want? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: > Simon Riggs <simon@2ndquadrant.com> wrote: >> Make a hard state change from catchup to streaming mode. >> More useful state change for monitoring purposes, plus a >> required change for synchronous replication patch. > > As far as I can see, this patch was not posted or discussed before > commit, and I'm not sure it's the behavior everyone wants. It has > the effect of preventing the system from ever going backwards from > "streaming" to "catchup". Is that what we want? We are looking at moving to streaming replication instead of WAL file shipping, but we often have WAN outages. These can last minutes, hours, or even a few days. What would be the impact of this patch on us during and after such outages? I don't know how well such experience generalizes, but my personal experience with replication technology is that "hard state changes" tend to make things more "clunky" and introduce odd issues at the state transitions. Where different message types are intermingled without such hard state changes, I've seen more graceful behavior. Of course, take that with a grain of salt -- I haven't read the patch and am talking in generalities based on having written a couple serious replication tools in the past, and having used a few others. -Kevin
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Make a hard state change from catchup to streaming mode. >> More useful state change for monitoring purposes, plus a >> required change for synchronous replication patch. > As far as I can see, this patch was not posted or discussed before > commit, and I'm not sure it's the behavior everyone wants. It has the > effect of preventing the system from ever going backwards from > "streaming" to "catchup". Is that what we want? That seems like a very bad idea from here. Being able to go back to catchup after loss of the streaming connection is essential for robustness. If we now have to restart the slave for that to happen, it's not an improvement. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Simon Riggs
Date:
On Fri, 2011-02-18 at 10:41 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> Make a hard state change from catchup to streaming mode. > >> More useful state change for monitoring purposes, plus a > >> required change for synchronous replication patch. > > > As far as I can see, this patch was not posted or discussed before > > commit, and I'm not sure it's the behavior everyone wants. It has the > > effect of preventing the system from ever going backwards from > > "streaming" to "catchup". Is that what we want? > > That seems like a very bad idea from here. Being able to go back to > catchup after loss of the streaming connection is essential for > robustness. If we now have to restart the slave for that to happen, > it's not an improvement. If we lose the connection then we start another walsender process, so that is a non-issue. We need a hard state change at one particular point to avoid sync rep requestors from hanging for hours when the standby is connected but a long way from being caught up. This was a part of my sync rep patch that it was easier to break out and commit early. There has never been any comment or objection to this concept and the same feature has existed in my patches for months. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Robert Haas
Date:
On Fri, Feb 18, 2011 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Make a hard state change from catchup to streaming mode. >>> More useful state change for monitoring purposes, plus a >>> required change for synchronous replication patch. > >> As far as I can see, this patch was not posted or discussed before >> commit, and I'm not sure it's the behavior everyone wants. It has the >> effect of preventing the system from ever going backwards from >> "streaming" to "catchup". Is that what we want? > > That seems like a very bad idea from here. Being able to go back to > catchup after loss of the streaming connection is essential for > robustness. If we now have to restart the slave for that to happen, > it's not an improvement. No, that's not the case where it matters. The state would get reset on reconnect. The problem is when, say, the master server is generating WAL at a rate which exceeds the network bandwidth of the link between the master and the standby. The previous coding will make us flip back into the catchup state when that happens. Actually, the old code is awfully sensitive, and knowing that you are not caught up is not really enough information: you need to know how far behind you are, and how long you've been behind for. I'm guessing that Simon intended this patch to deal with that problem, but it's not the right fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Robert Haas
Date:
On Fri, Feb 18, 2011 at 10:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > We need a hard state change at one particular point to avoid sync rep > requestors from hanging for hours when the standby is connected but a > long way from being caught up. That's a matter of opinion. I think there are a number of people here who would say that what you need is a good way to know when you've caught up, and that you shouldn't enable sync rep until that happens. What you're proposing would be useful too, if it didn't break other cases, but it does. This is precisely why it's a bad idea for us to be trying to do this kind of engineering at the very last minute. > This was a part of my sync rep patch that it was easier to break out and > commit early. There has never been any comment or objection to this > concept and the same feature has existed in my patches for months. You posted the latest version of your sync rep patch on January 15th, after months of inactivity. Heikki reviewed it on the 21st, and there it sat un-updated for three weeks. If your expectation is that any portion of that patch to which nobody specifically objected is fair game to commit without further discussion, I don't think that's the way it works around here. Post your patches and we'll review them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Simon Riggs
Date:
On Fri, 2011-02-18 at 09:41 -0600, Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > > Simon Riggs <simon@2ndquadrant.com> wrote: > > >> Make a hard state change from catchup to streaming mode. > >> More useful state change for monitoring purposes, plus a > >> required change for synchronous replication patch. > > > > As far as I can see, this patch was not posted or discussed before > > commit, and I'm not sure it's the behavior everyone wants. It has > > the effect of preventing the system from ever going backwards from > > "streaming" to "catchup". Is that what we want? > > We are looking at moving to streaming replication instead of WAL > file shipping, but we often have WAN outages. These can last > minutes, hours, or even a few days. What would be the impact of > this patch on us during and after such outages? None at all. The patch introduces no behavioural changes, only a useful, but minor re-categorisation of what is already happening so that its easier to monitor what happens following startup of a standby. > I don't know how well such experience generalizes, but my personal > experience with replication technology is that "hard state changes" > tend to make things more "clunky" and introduce odd issues at the > state transitions. Where different message types are intermingled > without such hard state changes, I've seen more graceful behavior. > > Of course, take that with a grain of salt -- I haven't read the > patch and am talking in generalities based on having written a > couple serious replication tools in the past, and having used a few > others. I respect your experience. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: Re: [COMMITTERS] pgsql: Make a hard state change from catchup to streaming mode.
From
Robert Haas
Date:
On Fri, Feb 18, 2011 at 11:50 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 18, 2011 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Fri, Feb 18, 2011 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> Make a hard state change from catchup to streaming mode. >>>> More useful state change for monitoring purposes, plus a >>>> required change for synchronous replication patch. >> >>> As far as I can see, this patch was not posted or discussed before >>> commit, and I'm not sure it's the behavior everyone wants. It has the >>> effect of preventing the system from ever going backwards from >>> "streaming" to "catchup". Is that what we want? >> >> That seems like a very bad idea from here. Being able to go back to >> catchup after loss of the streaming connection is essential for >> robustness. If we now have to restart the slave for that to happen, >> it's not an improvement. > > No, that's not the case where it matters. The state would get reset > on reconnect. The problem is when, say, the master server is > generating WAL at a rate which exceeds the network bandwidth of the > link between the master and the standby. The previous coding will > make us flip back into the catchup state when that happens. > > Actually, the old code is awfully sensitive, and knowing that you are > not caught up is not really enough information: you need to know how > far behind you are, and how long you've been behind for. I'm guessing > that Simon intended this patch to deal with that problem, but it's not > the right fix. So am I the only one who thinks this needs to be reverted? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company