Thread: max_standby_delay considered harmful
I've finally wrapped my head around exactly what the max_standby_delay code is doing, and I'm not happy with it. The way that code is designed is that the setting represents a maximum allowed difference between the standby server's system clock and the commit timestamps it is reading from the WAL log; whenever this difference exceeds the setting, we'll kill standby queries in hopes of catching up faster. Now, I can see the attraction of defining it that way, for certain use-cases. However, I think it is too fragile and too badly implemented to be usable in the real world; and it certainly can't be the default operating mode. There are three really fundamental problems with it: 1. The timestamps we are reading from the log might be historical, if we are replaying from archive rather than reading a live SR stream. In the current implementation that means zero grace period for standby queries. Now if your only interest is catching up as fast as possible, that could be a sane behavior, but this is clearly not the only possible interest --- in fact, if that's all you care about, why did you allow standby queries at all? 2. There could be clock skew between the master and slave servers. If the master's clock is a minute or so ahead of the slave's, again we get into a situation where standby queries have zero grace period, even though killing them won't do a darn thing to permit catchup. If the master is behind the slave then we have an artificially inflated grace period, which is going to slow down the slave. 3. There could be significant propagation delay from master to slave, if the WAL stream is being transmitted with pg_standby or some such. Again this results in cutting into the standby queries' grace period, for no defensible reason. In addition to these fundamental problems there's a fatal implementation problem: the actual comparison is not to the master's current clock reading, but to the latest commit, abort, or checkpoint timestamp read from the WAL. Thus, if the last commit was more than max_standby_delay seconds ago, zero grace time. Now if the master is really idle then there aren't going to be any conflicts anyway, but what if it's running only long-running queries? Or what happens when it was idle for awhile and then starts new queries? Zero grace period, that's what. We could possibly improve matters for the SR case by having walsender transmit the master's current clock reading every so often (probably once per activity cycle), outside the WAL stream proper. The receiver could subtract off its own clock reading in order to measure the skew, and then we could cancel queries if the de-skewed transmission time falls too far behind. However this doesn't do anything to fix the cases where we aren't reading (and caught up to) a live SR broadcast. I'm inclined to think that we should throw away all this logic and just have the slave cancel competing queries if the replay process waits more than max_standby_delay seconds to acquire a lock. This is simple, understandable, and behaves the same whether we're reading live data or not. Putting in something that tries to maintain a closed-loop maximum delay between master and slave seems like a topic for future research rather than a feature we have to have in 9.0. And in any case we'd still want the plain max delay for non-SR cases, AFAICS, because there's no sane way to use closed-loop logic in other cases. Comments? regards, tom lane
On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote: > I've finally wrapped my head around exactly what the max_standby_delay > code is doing, and I'm not happy with it. Yes, I don't think I'd call it perfect yet. > have the slave cancel competing queries if the replay process waits > more than max_standby_delay seconds to acquire a lock. This is simple, > understandable, and behaves the same whether we're reading live data or > not. I have no objection, and would welcome, adding another behaviour, since that just gives us a better chance of having this feature do something useful. > I'm inclined to think that we should throw away all this logic HS has been through 2 Alphas with the current behaviour and it will go through 0 Alphas with the newly proposed behaviour. At this stage of proceedings, that is extremely dangerous and I don't wish to do that. The likelihood that we replace it with something worse seems fairly high/certain: snap decision making never quite considers all angles. Phrases like "throw away all this logic" don't give me confidence that people that agree with that perspective would understand what they are signing up to. > Putting in something that tries to maintain a closed-loop maximum > delay between master and slave seems like a topic for future research > rather than a feature we have to have in 9.0. And in any case we'd > still want the plain max delay for non-SR cases, AFAICS, because there's > no sane way to use closed-loop logic in other cases. I will be looking for ways to improve this over time. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote: > >> I've finally wrapped my head around exactly what the max_standby_delay >> code is doing, and I'm not happy with it. > > Yes, I don't think I'd call it perfect yet. > >> have the slave cancel competing queries if the replay process waits >> more than max_standby_delay seconds to acquire a lock. This is simple, >> understandable, and behaves the same whether we're reading live data or >> not. > > I have no objection, and would welcome, adding another behaviour, since > that just gives us a better chance of having this feature do something > useful. > >> I'm inclined to think that we should throw away all this logic > > HS has been through 2 Alphas with the current behaviour and it will go > through 0 Alphas with the newly proposed behaviour. At this stage of > proceedings, that is extremely dangerous and I don't wish to do that. > The likelihood that we replace it with something worse seems fairly > high/certain: snap decision making never quite considers all angles. > Phrases like "throw away all this logic" don't give me confidence that > people that agree with that perspective would understand what they are > signing up to. I'm not really sure how much serious testing outside of the small set of people mostly interested in one or another specific aspect of HS/SR has been actually done with the alphas to be honest. I just started testing HS yesterday and I already ran twice into the general issue tom is complaining about with max_standby_delay... Stefan
On Mon, 2010-05-03 at 18:54 +0200, Stefan Kaltenbrunner wrote: > I'm not really sure how much serious testing outside of the small set of > people mostly interested in one or another specific aspect of HS/SR has > been actually done with the alphas to be honest. > I just started testing HS yesterday and I already ran twice into the > general issue tom is complaining about with max_standby_delay... I guarantee that if that proposal goes in, people will complain about that also. Last minute behaviour changes are bad news. I don't object to adding something, just don't take anything away. It's not like the code for it is pages long or anything. The trade off is HA or queries and two modes make sense for user choice. -- Simon Riggs www.2ndQuadrant.com
* Simon Riggs (simon@2ndQuadrant.com) wrote: > I guarantee that if that proposal goes in, people will complain about > that also. Last minute behaviour changes are bad news. I don't object to > adding something, just don't take anything away. It's not like the code > for it is pages long or anything. I have to disagree with this. If it goes into 9.0 this way then we're signing up to support it for *years*. With something as fragile as the existing setup (as outlined by Tom), that's probably not a good idea. We've not signed up to support the existing behaviour at all yet- alpha's aren't a guarentee of what we're going to release. > The trade off is HA or queries and two modes make sense for user choice. The option isn't being thrown out, it's just being made to depend on something which is alot easier to measure while still being very useful for the trade-off you're talking about. I don't really see a downside to this, to be honest. Perhaps you could speak to the specific user experience difference that you think there would be from this change? +1 from me on Tom's proposal. Thanks, Stephen
On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm inclined to think that we should throw away all this logic and just > have the slave cancel competing queries if the replay process waits > more than max_standby_delay seconds to acquire a lock. What if we somehow get into a situation where the replay process is waiting for a lock over and over and over again, because it keeps killing conflicting processes but something restarts them and they take locks over again? It seems hard to ensure that replay will make adequate progress with any substantially non-zero value of max_standby_delay under this definition. ...Robert
On Mon, 2010-05-03 at 13:13 -0400, Stephen Frost wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: > > I guarantee that if that proposal goes in, people will complain about > > that also. Last minute behaviour changes are bad news. I don't object to > > adding something, just don't take anything away. It's not like the code > > for it is pages long or anything. > > I have to disagree with this. If it goes into 9.0 this way then we're > signing up to support it for *years*. With something as fragile as the > existing setup (as outlined by Tom), that's probably not a good idea. > We've not signed up to support the existing behaviour at all yet- > alpha's aren't a guarentee of what we're going to release. That's a great argument, either way. We will have to live with 9.0 for many years and so that's why I mention having both. Make a choice either way and we take a risk. Why? > > The trade off is HA or queries and two modes make sense for user choice. > > The option isn't being thrown out, it's just being made to depend on > something which is alot easier to measure while still being very useful > for the trade-off you're talking about. I don't really see a downside > to this, to be honest. Perhaps you could speak to the specific user > experience difference that you think there would be from this change? > > +1 from me on Tom's proposal. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2010-05-03 at 13:21 -0400, Robert Haas wrote: > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm inclined to think that we should throw away all this logic and just > > have the slave cancel competing queries if the replay process waits > > more than max_standby_delay seconds to acquire a lock. > > What if we somehow get into a situation where the replay process is > waiting for a lock over and over and over again, because it keeps > killing conflicting processes but something restarts them and they > take locks over again? It seems hard to ensure that replay will make > adequate progress with any substantially non-zero value of > max_standby_delay under this definition. That is one argument against, and a reason why just one route is bad. We already have more than one way, so another option is useful -- Simon Riggs www.2ndQuadrant.com
On Mon, 2010-05-03 at 13:13 -0400, Stephen Frost wrote: > Perhaps you could speak to the specific user > experience difference that you think there would be from this change? The difference is really to do with the weight you give to two different considerations * avoid query cancellations * avoid having recovery fall behind, so that failover time is minimised Some people recognise the trade-offs and are planning multiple standby servers dedicated to different roles/objectives. Some people envisage Hot Standby as a platform for running very fast SELECTs, for which retrying the query is a reasonable possibility and for whom keeping the standby as up-to-date as possible is an important consideration from a data freshness perspective. Others view HS as a weapon against long running queries. My initial view was that the High Availability goal/role should be the default or most likely mode of operation. I would say that the current max_standby_delay favours the HA route since it specifically limits the amount by which server can fall behind. Tom's proposed behaviour (has also been proposed before) favours the avoid query cancellation route though could lead to huge amounts of lag. I'm happy to have both options because I know this is a trade-off that solution engineers want to have control of, not something we as developers can choose ahead of time. -- Simon Riggs www.2ndQuadrant.com
* Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm inclined to think that we should throw away all this logic and just > > have the slave cancel competing queries if the replay process waits > > more than max_standby_delay seconds to acquire a lock. > > What if we somehow get into a situation where the replay process is > waiting for a lock over and over and over again, because it keeps > killing conflicting processes but something restarts them and they > take locks over again? It seems hard to ensure that replay will make > adequate progress with any substantially non-zero value of > max_standby_delay under this definition. That was my first question too- but I reread what Tom wrote and came to a different conclusion: If the reply process waits more than max_standby_delay to acquire a lock, then it will kill off *everything* it runs into from that point forward, until it's done with whatever is currently available. At that point, the 'timer' would reset back to zero. When/how that timer gets reset was a question I had, but I feel like "until nothing is available" makes sense and is what I assumed Tom was thinking. Thanks, Stephen
Simon, * Simon Riggs (simon@2ndQuadrant.com) wrote: > Tom's proposed behaviour (has also been proposed before) favours the > avoid query cancellation route though could lead to huge amounts of lag. My impression of Tom's suggestion was that it would also be a maximum amount of delay which would be allowed before killing off queries- not that it would be able to wait indefinitely until no one is blocking. Based on that, I don't know that there's really much user-seen behaviour between the two, except in 'oddball' situations, where there's a time skew between the servers, or a large lag, etc, in which case I think Tom's proposal would be more likely what's 'expected', whereas what you would get with the existing implementation (zero time delay, or far too much) would be a 'gotcha'.. Thanks, Stephen
Simon, > My initial view was that the High Availability goal/role should be the > default or most likely mode of operation. I would say that the current > max_standby_delay favours the HA route since it specifically limits the > amount by which server can fall behind. I don't understand how Tom's approach would cause the slave to be further behind than the current max_standy_delay code, and I can see ways in which it would result in less delay. So, explain? The main issue with Tom's list which struck me was that max_standby_delay was linked to the system clock. HS is going to get used by a lot of PG users who aren't running time sync on their servers, or who let it get out of whack without fixing it. I'd thought that the delay was somehow based on transaction timestamps coming from the master. Keep in mind that there will be a *lot* of people using this feature, including ones without compentent & available sysadmins. The lock method appeals to me simply because it would eliminate the "mass cancel" issues which Greg Smith was reporting every time the timer runs down. That is, it seems to me that only the oldest queries would be cancelled and not any new ones. The biggest drawback I can see to Tom's approach is possible blocking on the slave due to the lock wait from the recovery process. However, this could be managed with the new lock-waits GUC, as well as statement timeout. Overall, I think Tom's proposal gives me what I would prefer, which is degraded performance on the slave but in ways which users are used to, rather than a lot of query cancel, which will interfere with user application porting. Would the recovery lock show up in pg_locks? That would also be a good diagnostic tool. I am happy to test some of this on Amazon or GoGrid, which is what I was planning on doing anyway. P.S. can we avoid the "considered harmful" phrase? It carries a lot of baggage ... -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm inclined to think that we should throw away all this logic and just >> have the slave cancel competing queries if the replay process waits >> more than max_standby_delay seconds to acquire a lock. > What if we somehow get into a situation where the replay process is > waiting for a lock over and over and over again, because it keeps > killing conflicting processes but something restarts them and they > take locks over again? They won't be able to take locks "over again", because the lock manager won't allow requests to pass a pending previous request, except in very limited circumstances that shouldn't hold here. They'll queue up behind the replay process's lock request, not in front of it. (If that isn't the case, it needs to be fixed, quite independently of this concern.) regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Based on that, I don't know that there's really much user-seen behaviour > between the two, except in 'oddball' situations, where there's a time > skew between the servers, or a large lag, etc, in which case I think Certainly that one particular case can be solved by making the servers be in time sync a prereq for HS working (in the traditional way). And by "prereq" I mean a "user beware" documentation warning. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201005031539 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkvfJr0ACgkQvJuQZxSWSsgSRwCgwAZpKJDqHX28y90rCx/CNXDt JGgAoO9JeoBacvTJ09UJ+o1Nek3KtcYR =gvch -----END PGP SIGNATURE-----
On Mon, 2010-05-03 at 15:32 -0400, Stephen Frost wrote: > Simon, > > * Simon Riggs (simon@2ndQuadrant.com) wrote: > > Tom's proposed behaviour (has also been proposed before) favours the > > avoid query cancellation route though could lead to huge amounts of lag. > > My impression of Tom's suggestion was that it would also be a maximum > amount of delay which would be allowed before killing off queries- not > that it would be able to wait indefinitely until no one is blocking. > Based on that, I don't know that there's really much user-seen behaviour > between the two, except in 'oddball' situations, where there's a time > skew between the servers, or a large lag, etc, in which case I think > Tom's proposal would be more likely what's 'expected', whereas what you > would get with the existing implementation (zero time delay, or far too > much) would be a 'gotcha'.. If recovery waits for max_standby_delay every time something gets in its way, it should be clear that if many things get in its way it will progressively fall behind. There is no limit to this and it can always fall further behind. It does result in fewer cancelled queries and I do understand many may like that. That is *significantly* different from how it works now. (Plus: If there really was no difference, why not leave it as is?) The bottom line is this is about conflict resolution. There is simply no way to resolve conflicts without favouring one or other of the protagonists. Whatever mechanism you come up with that favours one will, disfavour the other. I'm happy to give choices, but I'm not happy to force just one kind of conflict resolution. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2010-05-03 at 15:39 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm inclined to think that we should throw away all this logic and just > >> have the slave cancel competing queries if the replay process waits > >> more than max_standby_delay seconds to acquire a lock. > > > What if we somehow get into a situation where the replay process is > > waiting for a lock over and over and over again, because it keeps > > killing conflicting processes but something restarts them and they > > take locks over again? > > They won't be able to take locks "over again", because the lock manager > won't allow requests to pass a pending previous request, except in > very limited circumstances that shouldn't hold here. They'll queue > up behind the replay process's lock request, not in front of it. > (If that isn't the case, it needs to be fixed, quite independently > of this concern.) Most conflicts aren't lock-manager locks, they are snapshot conflicts, though clearly different workloads will have different characteristics. Some conflicts are buffer conflicts and the semantics of buffer cleanup locks and many other internal locks are that shared locks queue jump past exclusive lock requests. Not something we should touch, now at least. I understand that you aren't impressed by everything about the current patch but rushed changes may not help either. -- Simon Riggs www.2ndQuadrant.com
On Mon, May 3, 2010 at 3:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm inclined to think that we should throw away all this logic and just >>> have the slave cancel competing queries if the replay process waits >>> more than max_standby_delay seconds to acquire a lock. > >> What if we somehow get into a situation where the replay process is >> waiting for a lock over and over and over again, because it keeps >> killing conflicting processes but something restarts them and they >> take locks over again? > > They won't be able to take locks "over again", because the lock manager > won't allow requests to pass a pending previous request, except in > very limited circumstances that shouldn't hold here. They'll queue > up behind the replay process's lock request, not in front of it. > (If that isn't the case, it needs to be fixed, quite independently > of this concern.) Well, the new backends needn't try to take "the same" locks as the existing backends - the point is that in the worst case this proposal means waiting max_standby_delay for EACH replay that requires taking a lock. And that might be a LONG time. One idea I had while thinking this over was to bound the maximum amount of unapplied WAL rather than the absolute amount of time lag. Now, that's a little fruity, because your WAL volume might fluctuate considerably, so you wouldn't really know how far the slave was behind the master chronologically. However, it would avoid all the time skew issues, and it would also more accurately model the idea of a bound on recovery time should we need to promote the standby to master, so maybe it works out to a win. You could still end up stuck semi-permanently behind, but never by more than N segments. Stephen's idea of a mode where we wait up to max_standby_delay for a lock but then kill everything in our path until we've caught up again is another possible way of approaching this problem, although it may lead to "kill storms". Some of that may be inevitable, though: a bound on WAL lag has the same issue - if the primary is generating WAL faster than the standby can apply it, the standby will eventually decide to slaughter everything in its path. ...Robert
Greg, Robert, > Certainly that one particular case can be solved by making the > servers be in time sync a prereq for HS working (in the traditional way). > And by "prereq" I mean a "user beware" documentation warning. > Last I checked, you work with *lots* of web developers and web companies. I'm sure you can see the issue with the above. > Stephen's idea of a mode where we wait up to max_standby_delay for a > lock but then kill everything in our path until we've caught up again > is another possible way of approaching this problem, although it may > lead to "kill storms". Personally, I thought that the kill storms were exactly what was wrong with max_standby_delay. That is, with MSD, no matter *what* your settings or traffic are, you're going to get query cancel occasionally. I don't see the issue with Tom's approach from a wait perspective. The max wait becomes 1.001X max_standby_delay; there's no way I can think of that replay would wait longer than that. I've yet to see an explanation why it would be longer. Simon's assertion that not all operations take a conventional lock is a much more serious potential flaw. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Simon Riggs wrote: > On Mon, 2010-05-03 at 13:13 -0400, Stephen Frost wrote: > > > Perhaps you could speak to the specific user > > experience difference that you think there would be from this change? > > The difference is really to do with the weight you give to two different > considerations > > * avoid query cancellations > * avoid having recovery fall behind, so that failover time is minimised > > Some people recognise the trade-offs and are planning multiple standby > servers dedicated to different roles/objectives. I understand Simon's point that the two behaviors have different benefits. However, I believe few users will be able to understand when to use which. As I remember, 9.0 has two behaviors: o master delays vacuum cleanupo slave delays WAL application and in 9.1 we will be adding: o slave communicates snapshots to master How would this figure into what we ultimately want in 9.1? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Mon, 2010-05-03 at 15:04 -0700, Josh Berkus wrote: > I don't see the issue with Tom's approach from a wait perspective. The > max wait becomes 1.001X max_standby_delay; there's no way I can think of > that replay would wait longer than that. I've yet to see an explanation > why it would be longer. Yes, the max wait on any *one* blocker will be max_standby_delay. But if you wait for two blockers, then the total time by which the standby lags will now be 2*max_standby_delay. Add a third, fourth etc and the standby lag keeps rising. We need to avoid confusing these two measurables * standby lag - defined as the total delay from when a WAL record is written to the time the WAL record is applied. This includes both transfer time and any delays imposed by Hot Standby. * standby query delay - defined as the time that recovery will wait for a query to complete before a cancellation takes place. (We could complicate this by asking what happens when recovery is blocked twice by the same query? Would it wait twice, or does it have to track how much it has waited for each query in total so far?) Currently max_standby_delay seeks to constrain the standby lag to a particular value, as a way of providing a bounded time for failover, and also to constrain the amount of WAL that needs to be stored as the lag increases. Currently, there is no guaranteed minimum query delay given to each query. If every query is guaranteed its requested query delay then the standby lag will be unbounded. Less cancellations, higher lag. Some people do want this, though is not currently available. We can do this with two new GUCs: * standby_query_delay - USERSET parameter that allows user to specify a guaranteed query delay, anywhere from 0 to maximum_standby_query_delay * max_standby_query_delay - SIGHUP parameter - parameter exists to provide DBA with a limit on the USERSET standby_query_delay, though I can see some would say this is optional Current behaviour is same as global settings of standby_query_delay = 0 max_standby_query_delay = 0 max_standby_delay = X So if people want minimal cancellations they would specify standby_query_delay = Y (e.g. 30) max_standby_query_delay = Z (e.g. 300) max_standby_delay = -1 -- Simon Riggs www.2ndQuadrant.com
On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote: > As I remember, 9.0 has two behaviors: > > o master delays vacuum cleanup > o slave delays WAL application > > and in 9.1 we will be adding: > > o slave communicates snapshots to master > How would this figure into what we ultimately want in 9.1? We would still want all options, since "slave communicates snapshot to master" doesn't solve the problem it just moves the problem elsewhere. It's a question of which factors the user wishes to emphasise for their specific use. > I understand Simon's point that the two behaviors have different > benefits. However, I believe few users will be able to understand when > to use which. If users can understand how to set NDISTINCT for a column, they can understand this. It's not about complexity of UI, its about solving problems. When people hit an issue, I don't want to be telling people "we thought you wouldn't understand it, so we removed the parachute". They might not understand it *before* they hit a problem, so what? But users certainly will afterwards and won't say "thanks" if you prevent an option for them, especially for the stated reason. (My point about ndistinct: 99% of users have no idea that exists or when to use it, but it still exists as an option because it solves a known issue, just like this.) -- Simon Riggs www.2ndQuadrant.com
On Tue, May 4, 2010 at 4:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > option for them, especially for the stated reason. (My point about > ndistinct: 99% of users have no idea that exists or when to use it, but > it still exists as an option because it solves a known issue, just like > this.) Slightly OT, but funnily enough, when I was up in New York a couple of weeks ago with Bruce and a couple of other folks, I started talking with a DBA up there about his frustrations with PostgreSQL, and - I'm not making this up - the first example he gave me of something he wished he could do in PG to improve query planning was manually override ndistinct estimates. He was pleased to here that we'll have that in 9.0 and I was pleased to be able to tell him it was my patch. If you'd asked me what the odds that someone picking a missing feature would have come up with that one were, I'd have said a billion-to-one against. But I'm not making this up. To be honest, I am far from convinced that the existing behavior is a good one and I'm in favor of modifying it or ripping it out altogether if we can think of something better. But it has to really be better, of course, not just trading one set of pain points for another. ...Robert
On Tue, 2010-05-04 at 07:13 -0400, Robert Haas wrote: > On Tue, May 4, 2010 at 4:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > option for them, especially for the stated reason. (My point about > > ndistinct: 99% of users have no idea that exists or when to use it, but > > it still exists as an option because it solves a known issue, just like > > this.) > > Slightly OT, but funnily enough, when I was up in New York a couple of > weeks ago with Bruce and a couple of other folks, I started talking > with a DBA up there about his frustrations with PostgreSQL, and - I'm > not making this up - the first example he gave me of something he > wished he could do in PG to improve query planning was manually > override ndistinct estimates. He was pleased to here that we'll have > that in 9.0 and I was pleased to be able to tell him it was my patch. > If you'd asked me what the odds that someone picking a missing feature > would have come up with that one were, I'd have said a billion-to-one > against. But I'm not making this up. It matches my experience. I think its a testament to the expertise of our users as well to the hackers that have done so much to make that the top of user's lists for change. > To be honest, I am far from convinced that the existing behavior is a > good one and I'm in favor of modifying it or ripping it out altogether > if we can think of something better. But it has to really be better, > of course, not just trading one set of pain points for another. The only way I see as genuine better rather than just a different mix of trade-offs is to come up with ways where there are no conflicts. Hannu came up with one, using filesystem snapshots, but we haven't had time to implement that yet. -- Simon Riggs www.2ndQuadrant.com
* Simon Riggs (simon@2ndQuadrant.com) wrote: > If recovery waits for max_standby_delay every time something gets in its > way, it should be clear that if many things get in its way it will > progressively fall behind. There is no limit to this and it can always > fall further behind. It does result in fewer cancelled queries and I do > understand many may like that. Guess I wasn't very clear in my previous description of what I *think* the change would be (Tom, please jump in if I've got this wrong..). Recovery wouldn't wait max_standby_delay every time; I agree, that would be a big change in behaviour and could make it very difficult for the slave to keep up. Rather, recovery would proceed as normal until it encounters a lock, at which point it would start a counting down from max_standby_delay, if the lock is released before it hits that, then it will move on, if another lock is encoutered, it would start counting down from where it left off last time. If it hits zero, it'll cancel the other query, and any other queries that get in the way, until it's caught up again completely. Once recovery is fully caught up, the counter would reset again to max_standby_delay. > That is *significantly* different from how it works now. (Plus: If there > really was no difference, why not leave it as is?) Because it's much more complicated the way it is, it doesn't really work as one would expect in a number of situations, and it's trying to guarantee something that it probably can't. > The bottom line is this is about conflict resolution. There is simply no > way to resolve conflicts without favouring one or other of the > protagonists. Whatever mechanism you come up with that favours one will, > disfavour the other. I'm happy to give choices, but I'm not happy to > force just one kind of conflict resolution. I don't think anyone is trying to get rid of the knob entirely; you're right, you can't please everyone all the time, so there has to be some kind of knob there which people can adjust based on their particular use case and system. This is about what exactly the knob is and how it's implemented and documented. Thanks, Stephen
On Tue, 2010-05-04 at 09:12 -0400, Stephen Frost wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: > > If recovery waits for max_standby_delay every time something gets in its > > way, it should be clear that if many things get in its way it will > > progressively fall behind. There is no limit to this and it can always > > fall further behind. It does result in fewer cancelled queries and I do > > understand many may like that. > > Guess I wasn't very clear in my previous description of what I *think* > the change would be (Tom, please jump in if I've got this wrong..). > Recovery wouldn't wait max_standby_delay every time; I agree, that would > be a big change in behaviour and could make it very difficult for the > slave to keep up. Rather, recovery would proceed as normal until it > encounters a lock, at which point it would start a counting down from > max_standby_delay, if the lock is released before it hits that, then it > will move on, if another lock is encoutered, it would start counting > down from where it left off last time. If it hits zero, it'll cancel > the other query, and any other queries that get in the way, until it's > caught up again completely. Once recovery is fully caught up, the > counter would reset again to max_standby_delay. This new clarification is almost exactly how it works already. Sounds like the existing docs need some improvement. The only difference is that max_standby_delay is measured from log timestamp. Perhaps it should work from WAL receipt timestamp rather than from log timestamp? That would make some of the problems go away without significantly changing the definition. I'll look at that. (And that conflicts are caused by more situations than just locks, but that detail doesn't alter your point). > > The bottom line is this is about conflict resolution. There is simply no > > way to resolve conflicts without favouring one or other of the > > protagonists. Whatever mechanism you come up with that favours one will, > > disfavour the other. I'm happy to give choices, but I'm not happy to > > force just one kind of conflict resolution. > > I don't think anyone is trying to get rid of the knob entirely; you're > right, you can't please everyone all the time, so there has to be some > kind of knob there which people can adjust based on their particular use > case and system. This is about what exactly the knob is and how it's > implemented and documented. I'm happy with more than one way. It'd be nice if a single parameter, giving one dimension of tuning, suited all ways people have said they would like it to behave. I've not found a way of doing that. I have no problem at all with adding additional parameters or mechanisms to cater for the multiple dimensions of control people have asked for. So your original interpretation is also valid for some users. -- Simon Riggs www.2ndQuadrant.com
Downthread, I said.. On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote: > The only difference is that max_standby_delay is measured from log > timestamp. Perhaps it should work from WAL receipt timestamp rather than > from log timestamp? That would make some of the problems go away without > significantly changing the definition. I'll look at that. Patch to implement this idea attached: for discussion, not tested yet. No docs yet. The attached patch redefines "standby delay" to be the amount of time elapsed from point of receipt to point of application. The "point of receipt" is reset every chunk of data when streaming, or every file when reading file by file. In all cases this new time is later than the latest log time we would have used previously. This addresses all of your points, as shown below. On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote: > There are three really fundamental problems with it: > > 1. The timestamps we are reading from the log might be historical, > if we are replaying from archive rather than reading a live SR stream. > In the current implementation that means zero grace period for standby > queries. Now if your only interest is catching up as fast as possible, > that could be a sane behavior, but this is clearly not the only possible > interest --- in fact, if that's all you care about, why did you allow > standby queries at all? The delay used is from time of receipt of WAL, no longer from log date. So this would no longer apply. > 2. There could be clock skew between the master and slave servers. > If the master's clock is a minute or so ahead of the slave's, again we > get into a situation where standby queries have zero grace period, even > though killing them won't do a darn thing to permit catchup. If the > master is behind the slave then we have an artificially inflated grace > period, which is going to slow down the slave. The timestamp is from standby, not master, so this would no longer apply. > 3. There could be significant propagation delay from master to slave, > if the WAL stream is being transmitted with pg_standby or some such. > Again this results in cutting into the standby queries' grace period, > for no defensible reason. The timestamp is taken immediately at the point the WAL is ready for replay, so other timing overheads would not be included. > In addition to these fundamental problems there's a fatal implementation > problem: the actual comparison is not to the master's current clock > reading, but to the latest commit, abort, or checkpoint timestamp read > from the WAL. Thus, if the last commit was more than max_standby_delay > seconds ago, zero grace time. Now if the master is really idle then > there aren't going to be any conflicts anyway, but what if it's running > only long-running queries? Or what happens when it was idle for awhile > and then starts new queries? Zero grace period, that's what. > > We could possibly improve matters for the SR case by having walsender > transmit the master's current clock reading every so often (probably > once per activity cycle), outside the WAL stream proper. The receiver > could subtract off its own clock reading in order to measure the skew, > and then we could cancel queries if the de-skewed transmission time > falls too far behind. However this doesn't do anything to fix the cases > where we aren't reading (and caught up to) a live SR broadcast. -- Simon Riggs www.2ndQuadrant.com
Attachment
On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote: > The only difference is that max_standby_delay is measured from log > timestamp. Perhaps it should work from WAL receipt timestamp rather than > from log timestamp? That would make some of the problems go away without > significantly changing the definition. I'll look at that. Patch to implement this idea posted in response to OT, upthread, so I can respond to the original complaints directly. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote: > > > As I remember, 9.0 has two behaviors: > > > > o master delays vacuum cleanup > > o slave delays WAL application > > > > and in 9.1 we will be adding: > > > > o slave communicates snapshots to master > > > How would this figure into what we ultimately want in 9.1? > > We would still want all options, since "slave communicates snapshot to > master" doesn't solve the problem it just moves the problem elsewhere. > It's a question of which factors the user wishes to emphasise for their > specific use. > > > I understand Simon's point that the two behaviors have different > > benefits. However, I believe few users will be able to understand when > > to use which. > > If users can understand how to set NDISTINCT for a column, they can > understand this. It's not about complexity of UI, its about solving > problems. When people hit an issue, I don't want to be telling people > "we thought you wouldn't understand it, so we removed the parachute". > They might not understand it *before* they hit a problem, so what? But > users certainly will afterwards and won't say "thanks" if you prevent an > option for them, especially for the stated reason. (My point about > ndistinct: 99% of users have no idea that exists or when to use it, but > it still exists as an option because it solves a known issue, just like > this.) Well, this is kind of my point --- that if few people are going to need a parameter and it is going to take us to tell them to use it, it isn't a good parameter because the other 99.9% are going to stare at the parameters and not konw what it does or how it is different from other similar parameters. Adding another parameter might help 0.1% of our users, but it is going to confuse the other 99.9%. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Tue, 2010-05-04 at 13:00 -0400, Bruce Momjian wrote: > Well, this is kind of my point --- that if few people are going to need > a parameter and it is going to take us to tell them to use it, it isn't > a good parameter because the other 99.9% are going to stare at the > parameters and not konw what it does or how it is different from other > similar parameters. Adding another parameter might help 0.1% of our > users, but it is going to confuse the other 99.9%. :-( You've missed my point. Most users of HS will need these parameters. There is no need to understand them immediately, nor do I expect them to do so. People won't understand why they exist until they've understood the actual behaviour, received some errors and *then* they will understand them, want them and need them. Just like deadlocks, ndistinct and loads of other features we provide and support. The current behaviour of max_standby_delay is designed to favour High Availability users, not query users. I doubt that users with HA concerns are only 0.1% of our users. I've accepted that some users may not put that consideration first and so adding some minor, easy to implement additional parameters will improve the behaviour for those people. Forcing just one behaviour will be bad for many people. -- Simon Riggs www.2ndQuadrant.com
Simon, > Yes, the max wait on any *one* blocker will be max_standby_delay. But if > you wait for two blockers, then the total time by which the standby lags > will now be 2*max_standby_delay. Add a third, fourth etc and the standby > lag keeps rising. I still don't see how that works. If we're locking for applying log segments, then any query which came in after the recovery lock would, presumably, wait. So you'd have a lot of degraded query performance, but no more than max_standby_delay of waiting to apply logs. I'm more interested in your assertion that there's a lot in the replication stream which doesn't take a lock; if that's the case, then implementing any part of Tom's proposal is hopeless. > * standby query delay - defined as the time that recovery will wait for > a query to complete before a cancellation takes place. (We could > complicate this by asking what happens when recovery is blocked twice by > the same query? Would it wait twice, or does it have to track how much > it has waited for each query in total so far?) Aha! Now I see the confusion. AFAIK, Tom was proposing that the pending recovery data would wait for max_standby_delay, total, then cancel *all* queries which conflicted with it. Now that we've talked this out, though, I can see that this can still result in "mass cancel" issues, just like the current max_standby_delay. The main advantage I can see to Tom's idea is that (presumably) it can be more discriminating about which queries it cancels. I agree that waiting on *each* query for "up to # time" would be a completely different behavior, and as such, should be a option for DBAs.We might make it the default option, but we wouldn'tmake it the only option. Speaking of which, was *your* more discriminating query cancel ever applied? > Currently max_standby_delay seeks to constrain the standby lag to a > particular value, as a way of providing a bounded time for failover, and > also to constrain the amount of WAL that needs to be stored as the lag > increases. Currently, there is no guaranteed minimum query delay given > to each query. Yeah, I can just see a lot of combinational issues with this. For example, what if the user's network changes in some way to retard delivery of log segments to the point where the delivery time is longer than max_standby_delay? To say nothing about system clock synch, which isn't perfect even if you have it set up. I can see DBAs who are very focussed on HA wanting a standby-lag based control anyway, when HA is far more important than the ability to run queries on the slave. But I don't that that is the largest group; I think that far more people will want to balance the two considerations. Ultimately, as you say, we would like to have all three knobs: standby lag: max time measured from master timestamp to slave timestamp application lag: max time measured from local receipt of WAL records (via log copy or recovery connection) to their application query lag: max time any query which is blocking a recovery operation can run These three, in combination, would let us cover most potential use cases. So I think you've assessed that's where we're going in the 9.1-9.2 timeframe. However, I'd say for 9.0 that "application lag" is the least confusing option and the least dependant on the DBA's server room setup. So if we can only have one of these for 9.0 (and I think going out with more than one might be too complex, especially at this late date) I think that's the way to go. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Mon, May 3, 2010 at 4:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > 1. The timestamps we are reading from the log might be historical, > 2. There could be clock skew between the master and slave servers. > 3. There could be significant propagation delay from master to slave, So it sounds like what you're expecting is for max_standby_delay to represent not the maximum lag between server commit and standby commit but rather the maximum lag introduced by conflicts. Or perhaps maximum lag introduced relative to the lag present at startup. I think it's possible to implement either of these and it would solve all three problems above: The slave maintains a static measure of how far behind it is from the master. Every time it executes a recovery operation or waits on a conflict it adds the time it spent executing or waiting. Every time it executes a commit record it subtracts the *difference* between this commit record and the last. I assume we clip at 0 so it never goes negative which has odd effects but it seems to match what I would expect to happen. In the face of a standby recovering historical logs then it would start with a assumed delay of 0. As long as the conflicts don't slow down execution of the logs so that they run slower than the server then the measured delay would stay near 0. The only time queries would be canceled would be if the conflicts are causing problems replaying the logs. In the face of clock skew it nothing changes as long as the clocks run at the same speed. In the face of an environment where the master is idle I think this scheme has the same problems you described but I think this might be manageable. Perhaps we need more timestamps in the master's log stream aside from the commit timestamps. Or perhaps we don't care about standby delay except when reading a commit record since any other record isn't actually delayed unless its commit is delayed. -- greg
On Tue, 2010-05-04 at 11:27 -0700, Josh Berkus wrote: > I still don't see how that works. ... The good news is we agree by the time we get to the bottom... ;-) > I'm more interested in your assertion that there's a lot in the > replication stream which doesn't take a lock; if that's the case, then > implementing any part of Tom's proposal is hopeless. (No, still valid, the idea is generic) > > * standby query delay - defined as the time that recovery will wait for > > a query to complete before a cancellation takes place. (We could > > complicate this by asking what happens when recovery is blocked twice by > > the same query? Would it wait twice, or does it have to track how much > > it has waited for each query in total so far?) > > Aha! Now I see the confusion. BTW, Tom's proposal was approx half a sentence long so that is the source of any confusion. > AFAIK, Tom was proposing that the > pending recovery data would wait for max_standby_delay, total, then > cancel *all* queries which conflicted with it. Now that we've talked > this out, though, I can see that this can still result in "mass cancel" > issues, just like the current max_standby_delay. The main advantage I > can see to Tom's idea is that (presumably) it can be more discriminating > about which queries it cancels. As I said to Stephen, this is exactly how it works already and wasn't what was proposed. > I agree that waiting on *each* query for "up to # time" would be a > completely different behavior, and as such, should be a option for DBAs. > We might make it the default option, but we wouldn't make it the only > option. Glad to hear you say that. > Speaking of which, was *your* more discriminating query cancel ever applied? > > > Currently max_standby_delay seeks to constrain the standby lag to a > > particular value, as a way of providing a bounded time for failover, and > > also to constrain the amount of WAL that needs to be stored as the lag > > increases. Currently, there is no guaranteed minimum query delay given > > to each query. > > Yeah, I can just see a lot of combinational issues with this. For > example, what if the user's network changes in some way to retard > delivery of log segments to the point where the delivery time is longer > than max_standby_delay? To say nothing about system clock synch, which > isn't perfect even if you have it set up. > > I can see DBAs who are very focussed on HA wanting a standby-lag based > control anyway, when HA is far more important than the ability to run > queries on the slave. But I don't that that is the largest group; I > think that far more people will want to balance the two considerations. > > Ultimately, as you say, we would like to have all three knobs: > > standby lag: max time measured from master timestamp to slave timestamp > > application lag: max time measured from local receipt of WAL records > (via log copy or recovery connection) to their application > query lag: max time any query which is blocking a recovery operation can run > > These three, in combination, would let us cover most potential use > cases. So I think you've assessed that's where we're going in the > 9.1-9.2 timeframe. > > However, I'd say for 9.0 that "application lag" is the least confusing > option and the least dependant on the DBA's server room setup. So if we > can only have one of these for 9.0 (and I think going out with more than > one might be too complex, especially at this late date) I think that's > the way to go. Before you posted, I submitted a patch on this thread to redefine max_standby_delay to depend upon the "application lag", as you've newly defined it here - though obviously I didn't call it that. That solves Tom's 3 issues. max_apply_delay might be technically more accurate term, though isn't sufficiently better parameter name as to be worth the change. That patch doesn't implement his proposal, but that can be done as well as (though IMHO not instead of). Given that two people have already misunderstood what Tom proposed, and various people are saying we need only one, I'm getting less inclined to have that at all. -- Simon Riggs www.2ndQuadrant.com
>> AFAIK, Tom was proposing that the >> pending recovery data would wait for max_standby_delay, total, then >> cancel *all* queries which conflicted with it. Now that we've talked >> this out, though, I can see that this can still result in "mass cancel" >> issues, just like the current max_standby_delay. The main advantage I >> can see to Tom's idea is that (presumably) it can be more discriminating >> about which queries it cancels. > > As I said to Stephen, this is exactly how it works already and wasn't > what was proposed. Well, it's not exactly how it works, as I understand it ... doesn't the timer running out on the slave currently cancel *all* running queries with old snapshots, regardless of what relations they touch? > Before you posted, I submitted a patch on this thread to redefine > max_standby_delay to depend upon the "application lag", as you've newly > defined it here - though obviously I didn't call it that. That solves > Tom's 3 issues. max_apply_delay might be technically more accurate term, > though isn't sufficiently better parameter name as to be worth the > change. Yeah, that looks less complicated for admins. Thanks. > That patch doesn't implement his proposal, but that can be done as well > as (though IMHO not instead of). Given that two people have already > misunderstood what Tom proposed, and various people are saying we need > only one, I'm getting less inclined to have that at all. Given your clarification on the whole set of behaviors, I'm highly dubious about the idea of implementing Tom's proposal when we're already Beta 1. It seems like a 9.1 thing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Given your clarification on the whole set of behaviors, I'm highly > dubious about the idea of implementing Tom's proposal when we're already > Beta 1. It seems like a 9.1 thing. I think you missed the point: "do nothing" is not a viable option. I was proposing something that seemed simple enough to be safe to drop into 9.0 at this point. I'm less convinced that what Simon is proposing is safe enough. regards, tom lane
On Tue, 2010-05-04 at 18:53 -0400, Tom Lane wrote: > I think you missed the point: "do nothing" is not a viable option. > I was proposing something that seemed simple enough to be safe to > drop into 9.0 at this point. I've posted a patch that meets your stated objections. If you could review that, this could be done in an hour. There are other ways, but you'll need to explain a proposal in enough detail that we're clear what you actually mean. > I'm less convinced that what Simon is proposing is safe enough. Which proposal? -- Simon Riggs www.2ndQuadrant.com
Tom Lane wrote: > 1. The timestamps we are reading from the log might be historical, > if we are replaying from archive rather than reading a live SR stream. > In the current implementation that means zero grace period for standby > queries. Now if your only interest is catching up as fast as possible, > that could be a sane behavior, but this is clearly not the only possible > interest --- in fact, if that's all you care about, why did you allow > standby queries at all? > If the standby is not current, you may not want people to execute queries against it. In some situations, returning results against obsolete data is worse than not letting the query execute at all. As I see it, the current max_standby_delay implementation includes the expectation that the results you are getting are no more than max_standby_delay behind the master, presuming that new data is still coming in. If the standby has really fallen further behind than that, there are situations where you don't want it doing anything but catching up until that is no longer the case, and you especially don't want it returning stale query data. The fact that tuning in that direction could mean the standby never actually executes any queries is something you need to monitor for--it suggests the standby isn't powerful/well connected to the master enough to keep up--but that's not necessarily the wrong behavior. Saying "I only want the standby to execute queries if it's not too far behind the master" is the answer to "why did you allow standby queries at all?" when tuning for that use case. > 2. There could be clock skew between the master and slave servers. > Not the database's problem to worry about. Document that time should be carefully sync'd and move on. I'll add that. > 3. There could be significant propagation delay from master to slave, > if the WAL stream is being transmitted with pg_standby or some such. > Again this results in cutting into the standby queries' grace period, > for no defensible reason. > Then people should adjust their max_standby_delay upwards to account for that. For high availability purposes, it's vital that the delay number be referenced to the commit records on the master. If lag is eating a portion of that, again it's something people should be monitoring for, but not something we can correct. The whole idea here is that max_standby_delay is an upper bound on how stale the data on the standby can be, and whether or not lag is a component to that doesn't impact how the database is being asked to act. > In addition to these fundamental problems there's a fatal implementation > problem: the actual comparison is not to the master's current clock > reading, but to the latest commit, abort, or checkpoint timestamp read > from the WAL. Right; this has been documented for months at http://wiki.postgresql.org/wiki/Hot_Standby_TODO and on the list before that, i.e. "If there's little activity in the master, that can lead to surprising results." The suggested long-term fix has been adding keepalive timestamps into SR, which seems to get reinvented every time somebody plays with this for a bit. The HS documentation improvements I'm working on will suggest that you make sure this doesn't happen, that people have some sort of keepalive WAL-generating activity on the master regularly, if they expect max_standby_delay to work reasonably in the face of an idle master. It's not ideal, but it's straightforward to work around in user space. > I'm inclined to think that we should throw away all this logic and just > have the slave cancel competing queries if the replay process waits > more than max_standby_delay seconds to acquire a lock. This is simple, > understandable, and behaves the same whether we're reading live data or > not. I don't consider something that allows queries to execute when not playing recent "live" data is necessarily a step forward, from the perspective of implementations preferring high-availability. It's reasonable for some people to request that the last thing a standby that's not current (<max_standby_delay behind the master, based on the last thing received) should be doing is answering any queries, when it doesn't have current data and it should be working on catchup instead. Discussion here obviously has wandered past your fundamental objections here and onto implementation trivia, but I didn't think the difference between what you expected and what's actually committed already was properly addressed before doing that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > If the standby is not current, you may not want people to execute > queries against it. In some situations, returning results against > obsolete data is worse than not letting the query execute at all. As I > see it, the current max_standby_delay implementation includes the > expectation that the results you are getting are no more than > max_standby_delay behind the master, presuming that new data is still > coming in. If the standby has really fallen further behind than that, > there are situations where you don't want it doing anything but catching > up until that is no longer the case, and you especially don't want it > returning stale query data. That is very possibly a useful thing to be able to specify, but the current implementation has *nothing whatsoever* to do with making such a guarantee. It will only kill queries that are creating a lock conflict. I would even argue that it's a bad thing to have a parameter that looks like it might do that, when it doesn't. regards, tom lane
On 5/4/10 4:26 PM, Greg Smith wrote: > > Not the database's problem to worry about. Document that time should be > carefully sync'd and move on. I'll add that. Releasing a hot standby which *only* works for users with an operational ntp implementation is highly unrealistic. Having built-in replication in PostgreSQL was supposed to give the *majority* of users a *simple* option for 2-server failover, not cater only to the high end. Every administrative requirement we add to HS/SR eliminates another set of potential users, as well as adding another set of potential failure conditions which need to be monitored. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, 2010-05-04 at 16:34 -0700, Josh Berkus wrote: > On 5/4/10 4:26 PM, Greg Smith wrote: > > > > Not the database's problem to worry about. Document that time should be > > carefully sync'd and move on. I'll add that. > > Releasing a hot standby which *only* works for users with an operational > ntp implementation is highly unrealistic. Having built-in replication > in PostgreSQL was supposed to give the *majority* of users a *simple* > option for 2-server failover, not cater only to the high end. Every > administrative requirement we add to HS/SR eliminates another set of > potential users, as well as adding another set of potential failure > conditions which need to be monitored. +1 Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
Josh Berkus wrote: > Having built-in replication in PostgreSQL was supposed to give the *majority* of users a *simple* > option for 2-server failover, not cater only to the high end. If that's your criteria, 9.0 has already failed that goal. Just the fact that you have to make your own base backup and manage that whole area alone excludes "simple" as a goal anyone can claim 9.0 meets with a straight face, long before you get to the mechanics of how HS handles query cancellation. The new replication oriented features are functional, but neither are close to simple yet. Based on the complication level of replication in other database products, I wouldn't put money on that even being possible. You can make a simpler path the default one, but the minute you want to support more than one use case the complexity involved in setting up replication explodes. Anyway, I have no idea where the idea that recommending time synchronization is a somehow a "high end" requirement, given that every OS I'm aware of makes that trivial nowadays. Slave servers that drift too far away from the master time are going to cause all sorts of problems for user apps too. Any app that gauges how long ago something happened by comparing a database timestamp with now() is going to give misleading results for example, and I know I see those all the time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > Anyway, I have no idea where the idea that recommending time > synchronization is a somehow a "high end" requirement, Considering that clock skew was only one of several scenarios in which the max_standby_delay code misbehaves, it's not that important whether you consider it highly probable or not. The code still needs a redesign, and we may as well eliminate the assumption of tight synchronization while we are at it. There's no really good reason to have that requirement in there. regards, tom lane
> Releasing a hot standby which *only* works for users with an operational > ntp implementation is highly unrealistic. Having built-in replication > in PostgreSQL was supposed to give the *majority* of users a *simple* > option for 2-server failover, not cater only to the high end. Every > administrative requirement we add to HS/SR eliminates another set of > potential users, as well as adding another set of potential failure > conditions which need to be monitored. To be completely practical, I'm saying that we should apply & test Simon's latest patch moving the delay calculation to be application lag instead of standby lag. I'm also suggesting that we should have a standby lag option for 9.1 (as well as, probably, a "wait forever" option ala Tom's suggestion). -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Tom Lane wrote: > Greg Smith <greg@2ndquadrant.com> writes: > > Anyway, I have no idea where the idea that recommending time > > synchronization is a somehow a "high end" requirement, > > Considering that clock skew was only one of several scenarios in which > the max_standby_delay code misbehaves, it's not that important whether > you consider it highly probable or not. The code still needs a > redesign, and we may as well eliminate the assumption of tight > synchronization while we are at it. There's no really good reason to > have that requirement in there. Should I be concerned that we are redesigning HS features at this stage in the release? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Tue, May 4, 2010 at 11:06 PM, Bruce Momjian <bruce@momjian.us> wrote: > Should I be concerned that we are redesigning HS features at this stage > in the release? Yep. You can decide whether you want to be concerned by the redesign itself, or by the concerns about the underlying code that are motivating the redesigns, but yes, you should definitely be concerned. ...Robert
Robert Haas wrote: > On Tue, May 4, 2010 at 11:06 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> Should I be concerned that we are redesigning HS features at this stage >> in the release? >> > > Yep. You can decide whether you want to be concerned by the redesign > itself, or by the concerns about the underlying code that are > motivating the redesigns, but yes, you should definitely be concerned. > > > Our process is shot to pieces. But then, we knew that, didn't we ;-) cheers andrew
On Wed, May 5, 2010 at 5:01 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > Our process is shot to pieces. But then, we knew that, didn't we ;-) > Franky I think these kinds of usability questions are things that we'll never have great success getting feedback on without users banging on the system. There are solutions but none of them are perfect. We a) either let releases go out with functionality that nobody's sure users will be happy with and promise that the next release will polish it, b) commit big functionality changes only at the beginning of the release cycle and live with a two-release cycle latency for such changes, or c) make last-minute changes during betas. So far we've seen a combination of (a) and (c). I think (a) has worked out well, sure we have some non-ideal behaviour in new features but people forgive and forget when the later releases polish it. I've always advocated for (b) though, I think we've been lucky with the cases where we've needed last minute adjustments that they've worked better than we had a right to expect. -- greg
On Tue, 2010-05-04 at 16:34 -0700, Josh Berkus wrote: > On 5/4/10 4:26 PM, Greg Smith wrote: > > > > Not the database's problem to worry about. Document that time should be > > carefully sync'd and move on. I'll add that. > > Releasing a hot standby which *only* works for users with an operational > ntp implementation is highly unrealistic. Having built-in replication > in PostgreSQL was supposed to give the *majority* of users a *simple* > option for 2-server failover, not cater only to the high end. Every > administrative requirement we add to HS/SR eliminates another set of > potential users, as well as adding another set of potential failure > conditions which need to be monitored. +1 Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On Tue, 2010-05-04 at 23:06 -0400, Bruce Momjian wrote: > Should I be concerned that we are redesigning HS features at this stage > in the release? We knew we had to have one final discussion on HS snapshots. This is it. Tom has raised valid issues, all of which already known. If we can address them, we should. A straightforward patch [walrcv_timestamp.patch] to address all of those points. (Posted 13 hours prior to your post. That it was ignored by all while debate continued is one point of concern, for me, though there seems to have been confusion as to what that patch actually was.) Tom has also raised a separate proposal, though that hasn't yet been properly explained and there has been much debate about what he actually meant. It is possible there is something worthwhile there, if that involves adding a new capability. Myself, Stephen, Josh and Greg say that changing max_standby_delay so there is no bounded startup time would be a bad thing, if that is its only behaviour in 9.0. I will tidy up walrcv_timestamp.patch and apply on Thu evening unless there are concise, rational objections to that patch, which I consider to be a bug fix and not blocked by beta. Tom raised 7 other main points, that following detailed investigation have resulted in 2 minor bugs, 2 unresolved questions on the patch and 1 further request for code comments. The 2 bugs affect corner cases only and so are minor. They will be fixed over next few days since not instant fixes. Open items list updated with items mentioned here, plus performance query discussed on other thread. Nothing much here likely to cause a problem if we need to go beta immediately, IMO. I am mostly unavailable for next few days. (Repairing bikeshed.) Expect at least 3 commits from me over next few days. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 5, 2010 at 3:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Expect at least 3 commits from me over next few days. I think you need to rethink the way that you decide when it's time to commit things. There is certainly no consensus on any of the things you are proposing to commit, nor have they been adequately (or, uh, at all) reviewed. Saying that your proposal addresses all of Tom's objections doesn't make it so. I am planning to read that patch and offer an opinion on it, but I haven't done so yet and I imagine Tom will weigh in at some point as well. Racing to commit a pile of code that nobody else has tested is not going to improve anything. ...Robert
Simon Riggs wrote: > The attached patch redefines "standby delay" to be the amount of time > elapsed from point of receipt to point of application. The "point of > receipt" is reset every chunk of data when streaming, or every file when > reading file by file. In all cases this new time is later than the > latest log time we would have used previously. This seems completely wrong to me. If the WAL receiver keeps receiving stuff, (last receive timestamp) - (current timestamp) would never be more than a few seconds. Regardless of how much applying the WAL has fallen behind. To accomplish what you're trying to accomplish, you would need to label each received WAL record with the timestamp when it was received, and compare the reception timestamp of the record you're applying against current timestamp. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Comments? There's currently three ways to set max_standby_delay: max_standby_delay = -1 # Query wins max_standby_delay = 0 # Recovery wins max_standby_delay > X # Query wins until lag > X. As Tom points out, the 3rd option has all sorts of problems. I very much like the behavior that max_standby_delay tries to accomplish, but I have to agree that it's not very reliable as it is. I don't like Tom's proposal either; the standby can fall behind indefinitely, and queries get a varying grace period. Let's rip out the concept of a delay altogether, and make it a boolean. If you really want your query to finish, set it to -1 (using the current max_standby_delay nomenclature). If recovery is important to you, set it to 0. If you have the monitoring in place to sensibly monitor the delay between primary and standby, and you want a limit on that, you can put together a script to flip the switch in postgresql.conf if the standby falls too much behind. It would be nice to make that settable per-session, BTW. Though as soon as you have one session using -1, the standby could fall behind. Still, it might be useful if you run both kinds of queries on the same standby. Ok, now that we've gotten over that, here's another proposal for what a delay setting could look like. Let's have a setting similar to statement_timeout, that specifies how long a statement is allowed to run until it becomes subject to killing if it conflicts with recovery (actually, it would have to be a per-transaction setting, at least in serializable mode). This would be similar to Tom's proposal, and it would have the same drawback that it would give no guarantee on how much the standby can fall behind. However, it would be easier to understand: a query gets to run for X seconds, and after that it will be killed if it gets in the way. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> Comments? > > There's currently three ways to set max_standby_delay: > > max_standby_delay = -1 # Query wins > max_standby_delay = 0 # Recovery wins > max_standby_delay > X # Query wins until lag > X. > > As Tom points out, the 3rd option has all sorts of problems. I very much > like the behavior that max_standby_delay tries to accomplish, but I have > to agree that it's not very reliable as it is. I don't like Tom's > proposal either; the standby can fall behind indefinitely, and queries > get a varying grace period. > > Let's rip out the concept of a delay altogether, and make it a boolean. > If you really want your query to finish, set it to -1 (using the current > max_standby_delay nomenclature). If recovery is important to you, set it > to 0. I can't help but insisting on it, sorry. But. The obvious solution to this problem for me is that to either make the boolean reload friendly or to have pause/resume recovery. Ideally, both. Then the default setting would be recovery wins, you pause the standby replaying to ensure your query runs to completion. Very crude setting, but 9.0 would offer easy to setup slave for *either* HA *or* off-load, and a way to mitigate somehow. The automated educated conflict solving based on some sort of timeout running for one or all the current queries seems much harder to agree upon when compared to applying existing code we tough we wouldn't yet need. Let's revisit that decision: it seems to me we need it for 9.0. Regards, -- dim
Simon Riggs wrote: > I am mostly unavailable for next few days. (Repairing bikeshed.) Hey, you're supposed to do the bikeshedding on-list! ;-) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > To accomplish what you're trying to accomplish, you would need to label > each received WAL record with the timestamp when it was received, and > compare the reception timestamp of the record you're applying against > current timestamp. Yeah, this is why I thought that closed-loop lag control was a research project. In practice, we don't have to track it at the individual record level. The real behavior of walsender is that we get a "gob" of WAL each activity cycle, and so tracking the WAL start location and receipt time for each gob ought to be sufficient. (In fact trying to ascribe any finer-grain receipt time than that to individual WAL records is probably bogus anyway.) It might be enough to remember the start location and time for the latest gob, depending on exactly what control algorithm you want to use. But the whole thing requires significant thought and testing, which we really haven't got time for now. regards, tom lane
On Wed, May 5, 2010 at 9:58 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Tom Lane wrote: >> Comments? > > There's currently three ways to set max_standby_delay: > > max_standby_delay = -1 # Query wins > max_standby_delay = 0 # Recovery wins > max_standby_delay > X # Query wins until lag > X. > > As Tom points out, the 3rd option has all sorts of problems. I very much > like the behavior that max_standby_delay tries to accomplish, but I have > to agree that it's not very reliable as it is. I don't like Tom's > proposal either; the standby can fall behind indefinitely, and queries > get a varying grace period. > > Let's rip out the concept of a delay altogether, and make it a boolean. > If you really want your query to finish, set it to -1 (using the current > max_standby_delay nomenclature). If recovery is important to you, set it > to 0. Does my proposal (upthread) to limit this by quantity of WAL rather than time have any legs, or is that impractical and/or otherwise poor? ...Robert
Robert Haas wrote: > Does my proposal (upthread) to limit this by quantity of WAL rather > than time have any legs, or is that impractical and/or otherwise poor? That would certainly be easier to implement sanely than a time-based quantity. One problem is that we don't know how much unapplied WAL there is, when you're not using streaming replication. And I'm not sure how useful that is to users - it's very hard to estimate what to set it to. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, May 5, 2010 at 12:30 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> Does my proposal (upthread) to limit this by quantity of WAL rather >> than time have any legs, or is that impractical and/or otherwise poor? > > That would certainly be easier to implement sanely than a time-based > quantity. One problem is that we don't know how much unapplied WAL there > is, when you're not using streaming replication. Hmm, that's a problem, likely fatally so. > And I'm not sure how > useful that is to users - it's very hard to estimate what to set it to. I'm not sure whether that's really an issue or not. I mean, if we say that the standby is allowed to be, say, 16MB behind the master, we know that recovery time is bounded by how long it takes to replay 16MB. Which is in some ways more defined than saying we're behind the primary by 30 min, which could take a long time to replay or not much at all. But, I guess it's moot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-05 at 16:46 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > The attached patch redefines "standby delay" to be the amount of time > > elapsed from point of receipt to point of application. The "point of > > receipt" is reset every chunk of data when streaming, or every file when > > reading file by file. In all cases this new time is later than the > > latest log time we would have used previously. > > This seems completely wrong to me. If the WAL receiver keeps receiving > stuff, (last receive timestamp) - (current timestamp) would never be > more than a few seconds. Regardless of how much applying the WAL has > fallen behind. I see your point. > To accomplish what you're trying to accomplish, you would need to label > each received WAL record with the timestamp when it was received, and > compare the reception timestamp of the record you're applying against > current timestamp. Yes, OK. Obviously doing it for every record would be silly, so sampling WAL records is the only way. If we save the timestamp every 16MB of WAL that would work for both file and streaming. Of course, if WAL was written regularly none of this would be a problem. Why not have WALSender write a new WAL record with a timestamp every X seconds? -- Simon Riggs www.2ndQuadrant.com
On Wed, 2010-05-05 at 16:58 +0300, Heikki Linnakangas wrote: > Let's have a setting similar to > statement_timeout, that specifies how long a statement is allowed to > run until it becomes subject to killing if it conflicts with recovery > (actually, it would have to be a per-transaction setting, at least in > serializable mode). This would be similar to Tom's proposal, and it > would have the same drawback that it would give no guarantee on how > much the standby can fall behind. However, it would be easier to > understand: > a query gets to run for X seconds, and after that it will be killed if > it gets in the way. If you want this, I have no problem with you getting this (though new feature alert sirens going off, presumably). I only have a problem with the suggestion that this replaces the current max_standby_delay. There is no good case for only a single option. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2010-05-05 at 06:23 -0400, Robert Haas wrote: > On Wed, May 5, 2010 at 3:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Expect at least 3 commits from me over next few days. > > I think you need to rethink the way that you decide when it's time to > commit things. There is certainly no consensus on any of the things > you are proposing to commit, nor have they been adequately (or, uh, at > all) reviewed. Saying that your proposal addresses all of Tom's > objections doesn't make it so. I am planning to read that patch and > offer an opinion on it, but I haven't done so yet and I imagine Tom > will weigh in at some point as well. Racing to commit a pile of code > that nobody else has tested is not going to improve anything. Only you have spoken of a race to commit and I have not said I would refuse to listen to you or others. Reading your words, it would be easy to forget we are a team of people whose aim is software development. It's not the OK Corral. Yesterday you berated me for unstable software. Today you oppose my promise to fix that. Why is it, we all wonder, is it that you oppose everything I say and do? No doubt you will oppose other committers in the way you oppose me... -- Simon Riggs www.2ndQuadrant.com
On a somewhat disappointing correspondence (was: max_standby_delay considered harmful)
From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote: I've refrained from comment on max_standby_delay because I have neither read the patch nor am likely to be an early adopter of HS; however, as a potential eventual user I have to say that the semantics for this GUC proposed by Simon seem sane and useful to me. Certainly the documentation would need to be clear on the pitfalls of using something other than 0 or -1, and there were technical issues raised on the thread outside the scope of the semantics of the GUC, but the issues around clock sync and transfer time ring of FUD. We sync our central router to a bank of atomic clocks around the world, and sync every server to the router -- if a server drifts we would have much bigger problems than this GUC would pose, so we monitor that and make loud noises should something drift. Are there other controls that would be useful? Undoubtedly. Should they be added to 9.0? I'm not in a position to say. I don't see the point of ripping out one potentially useful control, which *might* be sufficient for 9.0 because someone might choose to use it inappropriately. Just make sure it's documented well enough. > Yesterday you berated me for unstable software. Today you oppose > my promise to fix that. Why is it, we all wonder, is it that you > oppose everything I say and do? Robert strikes me as a top-notch project manager, and his comments struck me as totally in line with someone wearing that hat. -Kevin
Heikki, all, > There's currently three ways to set max_standby_delay: > > max_standby_delay = -1 # Query wins > max_standby_delay = 0 # Recovery wins > max_standby_delay > X # Query wins until lag > X. > > As Tom points out, the 3rd option has all sorts of problems. I very much > like the behavior that max_standby_delay tries to accomplish, but I have > to agree that it's not very reliable as it is. Wow, thanks for the summary. Based on that, I take back what I said to Greg. Because I think getting 9.0 out *on time* is more important than any of these issues, I'm revising my opinion to be more in line with Greg Smith. So, proposed path forwards. (1) We work on getting the specific bugs Tom reported fixed. (2) max_standby_delay default is 0 (3) documentation covers setting it to an integer, but warns extensively about the required sysadminning and query cancel. As in "for advanced users only". (4) discussion of other synch methods gets shifted to 9.0 Ultimately, I think we'll be going to something lock-based like what Tom suggested. However, I don't think that's doable without delaying 9.0 for 6 months, and I think that would be much worse than any current bug with 9.0. No matter how much we tinker with HS/SR, it's not going to be bulletproof until 9.1. Or, more likely, 9.2. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Heikki Linnakangas wrote: > Tom Lane wrote: > > Comments? > > There's currently three ways to set max_standby_delay: > > max_standby_delay = -1 # Query wins > max_standby_delay = 0 # Recovery wins > max_standby_delay > X # Query wins until lag > X. > > As Tom points out, the 3rd option has all sorts of problems. I very much > like the behavior that max_standby_delay tries to accomplish, but I have > to agree that it's not very reliable as it is. I don't like Tom's > proposal either; the standby can fall behind indefinitely, and queries > get a varying grace period. > > Let's rip out the concept of a delay altogether, and make it a boolean. > If you really want your query to finish, set it to -1 (using the current > max_standby_delay nomenclature). If recovery is important to you, set it > to 0. > > If you have the monitoring in place to sensibly monitor the delay > between primary and standby, and you want a limit on that, you can put > together a script to flip the switch in postgresql.conf if the standby > falls too much behind. > > It would be nice to make that settable per-session, BTW. Though as soon > as you have one session using -1, the standby could fall behind. Still, > it might be useful if you run both kinds of queries on the same standby. +1 for a boolean We are not supposed to be designing the behavior during beta, which is exactly what we are doing, and I don't think we even know what behavior we want, let alone have we implemented it. I think a boolean is very clear and it gives you the chance to optimize _one_ case, which is enough for 9.0. Let's revisit this for 9.1 when we will know a lot more than we do now. Once 9.1 reports slave snapshots back to the master, we might not need anything more than a boolean here anyway. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Re: On a somewhat disappointing correspondence (was: max_standby_delay considered harmful)
From
Bruce Momjian
Date:
Kevin Grittner wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > > I've refrained from comment on max_standby_delay because I have > neither read the patch nor am likely to be an early adopter of HS; > however, as a potential eventual user I have to say that the > semantics for this GUC proposed by Simon seem sane and useful to me. > > Certainly the documentation would need to be clear on the pitfalls > of using something other than 0 or -1, and there were technical > issues raised on the thread outside the scope of the semantics of > the GUC, but the issues around clock sync and transfer time ring of > FUD. We sync our central router to a bank of atomic clocks around > the world, and sync every server to the router -- if a server drifts > we would have much bigger problems than this GUC would pose, so we > monitor that and make loud noises should something drift. > > Are there other controls that would be useful? Undoubtedly. Should > they be added to 9.0? I'm not in a position to say. I don't see > the point of ripping out one potentially useful control, which > *might* be sufficient for 9.0 because someone might choose to use it > inappropriately. Just make sure it's documented well enough. We are not very good at _removing_ functionality/GUCs, and based on the discussion so far, I think there is a very slim chance we would get it right for 9.0, which is why I suggested converting it to a boolean and revisiting this for 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Wed, May 5, 2010 at 7:18 PM, Bruce Momjian <bruce@momjian.us> wrote: > Heikki Linnakangas wrote: >> Tom Lane wrote: >> > Comments? >> >> There's currently three ways to set max_standby_delay: >> >> max_standby_delay = -1 # Query wins >> max_standby_delay = 0 # Recovery wins >> max_standby_delay > X # Query wins until lag > X. >> >> As Tom points out, the 3rd option has all sorts of problems. I very much >> like the behavior that max_standby_delay tries to accomplish, but I have >> to agree that it's not very reliable as it is. I don't like Tom's >> proposal either; the standby can fall behind indefinitely, and queries >> get a varying grace period. >> >> Let's rip out the concept of a delay altogether, and make it a boolean. >> If you really want your query to finish, set it to -1 (using the current >> max_standby_delay nomenclature). If recovery is important to you, set it >> to 0. >> >> If you have the monitoring in place to sensibly monitor the delay >> between primary and standby, and you want a limit on that, you can put >> together a script to flip the switch in postgresql.conf if the standby >> falls too much behind. >> >> It would be nice to make that settable per-session, BTW. Though as soon >> as you have one session using -1, the standby could fall behind. Still, >> it might be useful if you run both kinds of queries on the same standby. > > +1 for a boolean > > We are not supposed to be designing the behavior during beta, which is > exactly what we are doing, and I don't think we even know what behavior > we want, let alone have we implemented it. I think a boolean is very > clear and it gives you the chance to optimize _one_ case, which is > enough for 9.0. Let's revisit this for 9.1 when we will know a lot more > than we do now. The existing behavior is probably not optimal, but I'm not seeing what benefit we get out of neutering it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > >> If you have the monitoring in place to sensibly monitor the delay > >> between primary and standby, and you want a limit on that, you can put > >> together a script to flip the switch in postgresql.conf if the standby > >> falls too much behind. > >> > >> It would be nice to make that settable per-session, BTW. Though as soon > >> as you have one session using -1, the standby could fall behind. Still, > >> it might be useful if you run both kinds of queries on the same standby. > > > > +1 for a boolean > > > > We are not supposed to be designing the behavior during beta, which is > > exactly what we are doing, and I don't think we even know what behavior > > we want, let alone have we implemented it. ?I think a boolean is very > > clear and it gives you the chance to optimize _one_ case, which is > > enough for 9.0. ?Let's revisit this for 9.1 when we will know a lot more > > than we do now. > > The existing behavior is probably not optimal, but I'm not seeing what > benefit we get out of neutering it. We get to design it right, or maybe not need it at all in 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Bruce Momjian <bruce@momjian.us> writes: > Robert Haas wrote: >> The existing behavior is probably not optimal, but I'm not seeing what >> benefit we get out of neutering it. > We get to design it right, or maybe not need it at all in 9.1. Yeah. The good thing about a boolean is that it covers the two noncontroversial cases (no-wait and wait forever), and doesn't lock us into supporting cases that we don't really know how to do well yet. regards, tom lane
Heikki Linnakangas wrote: > Let's rip out the concept of a delay altogether, and make it a boolean. > If you really want your query to finish, set it to -1 (using the current > max_standby_delay nomenclature). If recovery is important to you, set it > to 0. > So the only user options would be "allow long-running queries to block WAL application forever" and "always cancel queries on conflict?" That would be taking away the behavior I was going to suggest as the default to many customers I work with. I expect a non-trivial subset of people using this feature will set max_standby_delay to is some small number of minutes, similarly to how archive_timeout is sized now. Enough time to get reasonably sized queries executed, not so long as to allow something that might try to run for hours on the standby to increase failover catchup time very much. The way the behavior works is admittedly limited, and certainly some people are going to want to set it to either 0 or -1. But taking it away altogether is going to cripple one category of potential Hot Standby use in the field. Consider this for a second: do you really think that Simon would have waded into this coding mess, or that I would have spent as much energy as I have highlighting issues with its use, if there wasn't demand for it? If it wouldn't hurt the usefulness of PostgreSQL 9.0 significantly to cut it, I'd have suggested that myself two months ago and saved everyone (especially myself) a lot of trouble. > If you have the monitoring in place to sensibly monitor the delay > between primary and standby, and you want a limit on that, you can put > together a script to flip the switch in postgresql.conf if the standby > falls too much behind. > There's a couple of things you should do in order for max_standby_delay to working as well as it can. Watching clock sync and forcing periodic activity are two of them that always come up. Those are both trivial to script for, and something I wouldn't expect any admin to object to. If you need a script that involves changing a server setting to do something, that translates into "you can't do that" for a typical DBA. The idea of a program regularly changing a server configuration setting on a production system is one you just can't sell. That makes this idea incredibly more difficult to use in the field than any of the workarounds that cope with the known max_standby_delay issues. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > Heikki Linnakangas wrote: >> Let's rip out the concept of a delay altogether, and make it a boolean. > So the only user options would be "allow long-running queries to block > WAL application forever" and "always cancel queries on conflict? Got it in one. Obviously, this is something that would be high priority to improve in some fashion in 9.1. That doesn't mean that it's reasonable to drop in a half-baked redesign now, nor to put in the amount of work that would be required to have a really well-designed implementation, and most certainly not to uncritically ship what we've got. We have a ton of other work that has to be done to get 9.0 out the door, and this feature is something that IMO we can live without for this release. One reason I believe this isn't so critical as all that is that it only matters for cases where the operation on the master took an exclusive lock. In high-performance production scenarios that's something you try hard to avoid anyway. When you succeed, the standby behavior is moot. Even if you can't avoid exclusive locks entirely, you may be able to confine them to maintenance windows where performance doesn't matter so much ... and then that goes for the standby performance as well. regards, tom lane
On Thu, May 6, 2010 at 2:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One reason I believe this isn't so critical as all that is that it only > matters for cases where the operation on the master took an exclusive > lock. Uhm, or a vacuum ran. Or a HOT page cleanup occurred, or a btree page split deleted old tuples. -- greg
Bruce Momjian wrote: > We are not very good at _removing_ functionality/GUCs, and based on the > discussion so far, I think there is a very slim chance we would get it > right for 9.0, which is why I suggested converting it to a boolean and > revisiting this for 9.1. > There's some feedback you can only get by exposing a complicated feature to the users and seeing what they make of it. This one hasn't even had a full week to gather beta user reports. Given that it's easy to disable (just limiting the range on what is effectively a 3-way switch to two positions), I don't understand why you're pushing at this point for its removal. You could be encouraging testing instead, which I believe is needed to know exactly what the right thing to do in 9.1 is. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Stark wrote: > On Thu, May 6, 2010 at 2:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> One reason I believe this isn't so critical as all that is that it only >> matters for cases where the operation on the master took an exclusive >> lock. >> > > Uhm, or a vacuum ran. Or a HOT page cleanup occurred, or a btree page > split deleted old tuples. > Right; because there are so many regularly expected causes for query cancellation, the proposed boolean setup really hurts the ability of a server whose primary goal is high-availability to run queries of any useful duration. For years I've been hearing "my HA standby is idle, how can I put it to use?"; that's the back story of the users I thought everyone knew were the known audience waiting for this feature. If the UI for vacuum_defer_cleanup_age that prevented these things was good, I would agree that the cases where max_standby_delay does something useful are marginal. That's why I tried to get someone working on SR to provide a hook for that purpose months ago. But since the vacuum adjustment we have in completely obtuse xid units, that leaves max_standby_delay as the only tunable here that you can even think about in terms of human time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wed, May 5, 2010 at 9:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Smith <greg@2ndquadrant.com> writes: >> Heikki Linnakangas wrote: >>> Let's rip out the concept of a delay altogether, and make it a boolean. > >> So the only user options would be "allow long-running queries to block >> WAL application forever" and "always cancel queries on conflict? > > Got it in one. > > Obviously, this is something that would be high priority to improve in > some fashion in 9.1. That doesn't mean that it's reasonable to drop in > a half-baked redesign now, nor to put in the amount of work that would > be required to have a really well-designed implementation, and most > certainly not to uncritically ship what we've got. If you had a genuinely better idea for how this should work, I would be the first to endorse it, but it's becoming clear that you don't, which makes me also skeptical of your contention that we will be better off with no knob at all. I find that position not very plausible. Nor do I really see how this is backing us into any kind of a corner. If we're really concerned that we're going to suddenly come up with a much better method of controlling this behavior (and so far nobody seems close to having such a brilliant insight), then let's just put a note in the documentation saying that the setting has problems X, Y, and Z and that if we develop a better method for controlling this behavior, the GUC may be modified or removed in a future release. Ripping it out seems like a drastic overreaction, particularly considering that we're already in beta. This feature has been in the tree since December 19th when the initial Hot Standby patch was committed, and the last significant code change was on February 13th. It is now May 5th. The fact that you didn't read the patch sooner is not a reason why we should rip it out now. Yes, the current implementation is a little crufty and has some limitations. See also work_mem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Greg Smith wrote: > Heikki Linnakangas wrote: > > Let's rip out the concept of a delay altogether, and make it a boolean. > > If you really want your query to finish, set it to -1 (using the current > > max_standby_delay nomenclature). If recovery is important to you, set it > > to 0. > > > > So the only user options would be "allow long-running queries to block > WAL application forever" and "always cancel queries on conflict?" That > would be taking away the behavior I was going to suggest as the default > to many customers I work with. I expect a non-trivial subset of people > using this feature will set max_standby_delay to is some small number of > minutes, similarly to how archive_timeout is sized now. Enough time to > get reasonably sized queries executed, not so long as to allow something > that might try to run for hours on the standby to increase failover > catchup time very much. > > The way the behavior works is admittedly limited, and certainly some > people are going to want to set it to either 0 or -1. But taking it > away altogether is going to cripple one category of potential Hot > Standby use in the field. Consider this for a second: do you really > think that Simon would have waded into this coding mess, or that I would > have spent as much energy as I have highlighting issues with its use, if > there wasn't demand for it? If it wouldn't hurt the usefulness of > PostgreSQL 9.0 significantly to cut it, I'd have suggested that myself > two months ago and saved everyone (especially myself) a lot of trouble. We are not designing in a green field here. We have released beta1 and we are trying to get to 9.0 final in a few months. If this feature could have been designed easily months ago, it would have been done, but it doesn't seem to have any easy solution, and we have run out of time to fix it. As painful as it is, we need to cut our loses and move on. We have already cut features like sync replication and communicating the slave snapshot to the master; I don't see how removing this ability is any worse. We don't have time to develop this for every use case, even if those use cases are significant. If someone wants to suggest that HS is useless if max_standby_delay supports only boolean values, I am ready to suggest we remove HS as well and head to 9.0 because that would suggest that HS itself is going to be useless. The code will not be thrown away; we will bring it back for 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Robert Haas wrote: > If you had a genuinely better idea for how this should work, I would > be the first to endorse it, but it's becoming clear that you don't, > which makes me also skeptical of your contention that we will be > better off with no knob at all. I find that position not very > plausible. Nor do I really see how this is backing us into any kind > of a corner. If we're really concerned that we're going to suddenly > come up with a much better method of controlling this behavior (and so > far nobody seems close to having such a brilliant insight), then let's > just put a note in the documentation saying that the setting has > problems X, Y, and Z and that if we develop a better method for > controlling this behavior, the GUC may be modified or removed in a > future release. Ripping it out seems like a drastic overreaction, > particularly considering that we're already in beta. > > This feature has been in the tree since December 19th when the initial > Hot Standby patch was committed, and the last significant code change > was on February 13th. It is now May 5th. The fact that you didn't > read the patch sooner is not a reason why we should rip it out now. > Yes, the current implementation is a little crufty and has some > limitations. See also work_mem. I am afraid the current setting is tempting for users to enable, but will be so unpredictable that it will tarnish the repuation of HS and Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't have shipped that features. It is causing all kinds of problems." We have done that before (rarely), and it isn't a good feeling. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Greg Smith wrote: > Bruce Momjian wrote: > > We are not very good at _removing_ functionality/GUCs, and based on the > > discussion so far, I think there is a very slim chance we would get it > > right for 9.0, which is why I suggested converting it to a boolean and > > revisiting this for 9.1. > > > > There's some feedback you can only get by exposing a complicated feature > to the users and seeing what they make of it. This one hasn't even had > a full week to gather beta user reports. Given that it's easy to > disable (just limiting the range on what is effectively a 3-way switch > to two positions), I don't understand why you're pushing at this point > for its removal. You could be encouraging testing instead, which I > believe is needed to know exactly what the right thing to do in 9.1 is. Our developers can't even figure out how it behaves; it is hard to see how beta users will ever figure it out. Now is the time to remove stuff, not late in beta. Pushing decisions into the future is how betas drag. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Wed, May 5, 2010 at 11:50 PM, Bruce Momjian <bruce@momjian.us> wrote: > If someone wants to suggest that HS is useless if max_standby_delay > supports only boolean values, I am ready to suggest we remove HS as well > and head to 9.0 because that would suggest that HS itself is going to be > useless. I think HS is going to be a lot less useful than many people think, at least in 9.0. But I think ripping out max_standby_delay will make it worse. > The code will not be thrown away; we will bring it back for 9.1. If that's the case, then taking it out makes no sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian <bruce@momjian.us> wrote: > I am afraid the current setting is tempting for users to enable, but > will be so unpredictable that it will tarnish the repuation of HS and > Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't > have shipped that features. It is causing all kinds of problems." We > have done that before (rarely), and it isn't a good feeling. I am not convinced it will be unpredictable. The only caveats that I've seen so far are: - You need to run ntpd. - Queries will get cancelled like crazy if you're not using steaming replication. That just doesn't sound that bad to me, especially since the proposed alternative is: - Queries will get cancelled like crazy, period. Or else: - Replication can fall infinitely far behind and you can write a tedious and error-prone script to try to prevent it if you like. I think THAT is going to tarnish our reputation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian <bruce@momjian.us> wrote: >> I am afraid the current setting is tempting for users to enable, but >> will be so unpredictable that it will tarnish the repuation of HS and >> Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't >> have shipped that features. It is causing all kinds of problems." We >> have done that before (rarely), and it isn't a good feeling. > > I am not convinced it will be unpredictable. The only caveats that > I've seen so far are: > > - You need to run ntpd. > - Queries will get cancelled like crazy if you're not using steaming > replication. And also in situations where the master is idle for a while and then starts doing stuff. That's the most significant source of confusion, IMHO, I wouldn't mind the requirement of ntpd so much. > That just doesn't sound that bad to me, especially since the proposed > alternative is: > > - Queries will get cancelled like crazy, period. > > Or else: > > - Replication can fall infinitely far behind and you can write a > tedious and error-prone script to try to prevent it if you like. > > I think THAT is going to tarnish our reputation. The difference is that that's easy to document and understand, so the behavior won't be a surprise to anyone. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Robert Haas wrote: > On Wed, May 5, 2010 at 11:50 PM, Bruce Momjian <bruce@momjian.us> wrote: >> The code will not be thrown away; we will bring it back for 9.1. > > If that's the case, then taking it out makes no sense. I doubt we're going to bring back the same code, because it still has the same issues. But we will do something better thought-out. Or people are happy with the boolean and no-one cares anymore, that's pretty likely too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-05-05 at 17:56 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > I am mostly unavailable for next few days. (Repairing bikeshed.) > > Hey, you're supposed to do the bikeshedding on-list! ;-) That was just a joke, I'm mostly unavailable for other reasons. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 5, 2010 at 9:32 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, May 5, 2010 at 11:50 PM, Bruce Momjian <bruce@momjian.us> wrote: >> If someone wants to suggest that HS is useless if max_standby_delay >> supports only boolean values, I am ready to suggest we remove HS as well >> and head to 9.0 because that would suggest that HS itself is going to be >> useless. > > I think HS is going to be a lot less useful than many people think, at > least in 9.0. But I think ripping out max_standby_delay will make it > worse. > >> The code will not be thrown away; we will bring it back for 9.1. > > If that's the case, then taking it out makes no sense. > <mysql dba troll> I manage a bunch of different environments and I am pretty sure that in any of them if the db started seemingly randomly killing queries I would have application teams followed quickly by executives coming after me with torches and pitchforks. I can not imagine setting this value to anything other than a bool and most of the time that bool would be -1. I would only be unleashing a kill storm in utter desperation and I would probably need to explain myself in detail after. Utter desperation means I am sure I am going to have to do a impactful failover at any moment and need a slave completely up to date NOW. It is good to have the option to automatically cancel queries, but I think it is a mistake to assume many people will use it. What I would really need for instrumentation is the ability to determine *easily* how much a slave is lagging in clock time. </mysql dba troll> -- Rob Wultsch wultsch@gmail.com
On Thu, 2010-05-06 at 00:47 -0400, Robert Haas wrote: > That just doesn't sound that bad to me, especially since the proposed > alternative is: > > - Queries will get cancelled like crazy, period. > > Or else: > > - Replication can fall infinitely far behind and you can write a > tedious and error-prone script to try to prevent it if you like. > > I think THAT is going to tarnish our reputation. Yes, that will. There is no consensus to remove max_standby_delay. It could be improved with minor adjustments and it makes more sense to allow a few of those, treating them as bugs. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2010-05-05 at 23:15 -0700, Rob Wultsch wrote: > I manage a bunch of different environments and I am pretty sure that > in any of them if the db started seemingly randomly killing queries I > would have application teams followed quickly by executives coming > after me with torches and pitchforks. Fully understood and well argued, thanks for your input. HS doesn't randomly kill queries and there are documented work-arounds to control this behaviour. Removing the parameter won't help the situation at all, it will make the situation *worse* by removing control from where it's clearly needed and removing all hope of making the HS feature work in practice. There is no consensus to remove the parameter. -- Simon Riggs www.2ndQuadrant.com
Greg Smith <greg@2ndquadrant.com> writes: > If you need a script that involves changing a server setting to do > something, that translates into "you can't do that" for a typical DBA. The > idea of a program regularly changing a server configuration setting on a > production system is one you just can't sell. That makes this idea > incredibly more difficult to use in the field than any of the workarounds > that cope with the known max_standby_delay issues. I still think that the best API we can do in a timely fashion for 9.0 is: standby_conflict_winner = replay|queries pg_pause_recovery() / pg_resume_recovery() It seems to me those two functions are only exposing existing facilities in the code, so that's more an API change that a new feature inclusion. Of course I'm certainly wrong. But the code has already been written. I don't think we'll find any better to offer our users in the right time frame. Now I'll try to step back and stop repeating myself in the void :) Regards, -- dim
On May 6, 2010, at 11:26 , Dimitri Fontaine wrote: > Greg Smith <greg@2ndquadrant.com> writes: >> If you need a script that involves changing a server setting to do >> something, that translates into "you can't do that" for a typical DBA. The >> idea of a program regularly changing a server configuration setting on a >> production system is one you just can't sell. That makes this idea >> incredibly more difficult to use in the field than any of the workarounds >> that cope with the known max_standby_delay issues. > > I still think that the best API we can do in a timely fashion for 9.0 > is: > > standby_conflict_winner = replay|queries > > pg_pause_recovery() / pg_resume_recovery() > > It seems to me those two functions are only exposing existing facilities > in the code, so that's more an API change that a new feature > inclusion. Of course I'm certainly wrong. But the code has already been > written. If there was an additional SQL-callable function that returned the backends the recovery process is currently waiting for,plus one that reported that last timestamp seen in the WAL, than all those different cancellation policies could be implementedas daemons that monitor recovery and kill backends as needed, no? That would allow people to experiment with different cancellation policies, and maybe shed some light on what the usefulpolicies are in practice. best regards, Florian Pflug
Rob Wultsch wrote: > I manage a bunch of different environments and I am pretty sure that > in any of them if the db started seemingly randomly killing queries I > would have application teams followed quickly by executives coming > after me with torches and pitchforks. > > I can not imagine setting this value to anything other than a bool and > most of the time that bool would be -1. I would only be unleashing a > kill storm in utter desperation and I would probably need to explain > myself in detail after. Utter desperation means I am sure I am going > to have to do a impactful failover at any moment and need a slave > completely up to date NOW. > That's funny because when I was reading this thread, I was thinking the exact opposite: having max_standby_delay always set to 0 so I know the standby server is as up-to-date as possible. The application that accesses the hot standby has to be 'special' anyway because it might deliver not-up-to-date data. If that information about specialties regarding querying the standby server includes the warning that queries might get cancelled, they can opt for a retry themselves (is there a special return code to catch that case? like PGRES_RETRY_LATER) or a message to the user that their report is currently unavailable and they should retry in a few minutes. regards, Yeb Havinga
On Thu, May 6, 2010 at 1:35 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian <bruce@momjian.us> wrote: >>> I am afraid the current setting is tempting for users to enable, but >>> will be so unpredictable that it will tarnish the repuation of HS and >>> Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't >>> have shipped that features. It is causing all kinds of problems." We >>> have done that before (rarely), and it isn't a good feeling. >> >> I am not convinced it will be unpredictable. The only caveats that >> I've seen so far are: >> >> - You need to run ntpd. >> - Queries will get cancelled like crazy if you're not using steaming >> replication. > > And also in situations where the master is idle for a while and then > starts doing stuff. That's the most significant source of confusion, > IMHO, I wouldn't mind the requirement of ntpd so much. Oh. Ouch. OK, sorry, I missed that part. Wow, that's awful. OK, I agree: we can't ship that as-is. /me feels embarrassed for completely failing to understand the root of the issue until 84 emails into the thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Hi, On Thursday 06 May 2010 07:35:49 Heikki Linnakangas wrote: > Robert Haas wrote: > > On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> I am afraid the current setting is tempting for users to enable, but > >> will be so unpredictable that it will tarnish the repuation of HS and > >> Postgres. We don't want to be thinking in 9 months, "Wow, we shouldn't > >> have shipped that features. It is causing all kinds of problems." We > >> have done that before (rarely), and it isn't a good feeling. > > > > I am not convinced it will be unpredictable. The only caveats that > > I've seen so far are: > > > > - You need to run ntpd. > > - Queries will get cancelled like crazy if you're not using steaming > > replication. > > And also in situations where the master is idle for a while and then > starts doing stuff. That's the most significant source of confusion, > IMHO, I wouldn't mind the requirement of ntpd so much. Personally I would much rather like to keep that configurability and manually generate a record a second. Or possibly do something akin to archive_timeout... That may be not as important once there are less sources of conflict resolutions - but thats something *definitely* not going to happen for 9.0... Andres
On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote: > If there was an additional SQL-callable function that returned the backends the recovery process is currently waiting for,plus one that reported that last timestamp seen in the WAL, than all those different cancellation policies could be implementedas daemons that monitor recovery and kill backends as needed, no? > > That would allow people to experiment with different cancellation policies, and maybe shed some light on what the usefulpolicies are in practice. It would be easier to implement a conflict resolution plugin that is called when a conflict occurs, allowing users to have a customisable mechanism. Again, I have no objection to that proposal. -- Simon Riggs www.2ndQuadrant.com
On May 6, 2010, at 12:48 , Simon Riggs wrote: > On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote: >> If there was an additional SQL-callable function that returned the backends the recovery process is currently waitingfor, plus one that reported that last timestamp seen in the WAL, than all those different cancellation policies couldbe implemented as daemons that monitor recovery and kill backends as needed, no? >> >> That would allow people to experiment with different cancellation policies, and maybe shed some light on what the usefulpolicies are in practice. > > It would be easier to implement a conflict resolution plugin that is > called when a conflict occurs, allowing users to have a customisable > mechanism. Again, I have no objection to that proposal. True, providing a plugin API would be even better, since no SQL callable API would have to be devised, and possible algorithmswouldn't be constrained by such an API's limitations. The existing max_standby_delay logic could be moved to such a plugin, living in contrib. Since it was already established(I believe) that the existing max_standby_delay logic is sufficiently fragile to require significant knowledgeon the user's side about potential pitfalls, asking those users to install the plugin from contrib shouldn't betoo much to ask for. This way, users who really need something more sophisticated than recovery wins always or standby wins always are given thetools they need *if* they're willing to put in the extra effort. For those who don't, offering max_standby_delay probablydoes more harm than good anyway, so nothing is lost by not offering it in the first place. best regards, Florian Pflug
On Thu, 2010-05-06 at 13:46 +0200, Florian Pflug wrote: > On May 6, 2010, at 12:48 , Simon Riggs wrote: > > On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote: > >> If there was an additional SQL-callable function that returned the backends the recovery process is currently waitingfor, plus one that reported that last timestamp seen in the WAL, than all those different cancellation policies couldbe implemented as daemons that monitor recovery and kill backends as needed, no? > >> > >> That would allow people to experiment with different cancellation policies, and maybe shed some light on what the usefulpolicies are in practice. > > > > It would be easier to implement a conflict resolution plugin that is > > called when a conflict occurs, allowing users to have a customisable > > mechanism. Again, I have no objection to that proposal. > > True, providing a plugin API would be even better, since no SQL callable API would have to be devised, and possible algorithmswouldn't be constrained by such an API's limitations. > > The existing max_standby_delay logic could be moved to such a plugin, living in contrib. Since it was already established(I believe) that the existing max_standby_delay logic is sufficiently fragile to require significant knowledgeon the user's side about potential pitfalls, asking those users to install the plugin from contrib shouldn't betoo much to ask for. > > This way, users who really need something more sophisticated than recovery wins always or standby wins always are giventhe tools they need *if* they're willing to put in the extra effort. For those who don't, offering max_standby_delayprobably does more harm than good anyway, so nothing is lost by not offering it in the first place. No problem from me with that approach. As long as 9.0 ships with the current capability to enforce max_standby_delay, I have no problem. -- Simon Riggs www.2ndQuadrant.com
Heikki Linnakangas wrote: > Robert Haas wrote: > >> I am not convinced it will be unpredictable. The only caveats that >> I've seen so far are: >> >> - You need to run ntpd. >> - Queries will get cancelled like crazy if you're not using steaming >> replication. >> > > And also in situations where the master is idle for a while and then > starts doing stuff. That's the most significant source of confusion, > IMHO, I wouldn't mind the requirement of ntpd so much. > I consider it mandatory to include an documentation update here that says "if you set max_standby_delay > 0, and do not run something that regularly generates activity to the master like [example], you will get unnecessary query cancellation on the standby". As well as something like what Josh was suggesting, adding warnings that this is "for advanced users only", to borrow his wording. This is why my name has been on the open items list for a while now--to make sure I follow through on that. I haven't written it yet because there were still changes to the underlying code being made up until moments before beta started, then this discussion started without a break between. There are a clear set of user land things that can be done to make up the deficiencies in the state of the server code, but we won't even get to see how they work out in the field (feedback needed to improve the 9.1 design) if this capability goes away altogether. Is it not clear that there are some people who consider the occasional bit of cancellation OK, because they can correct for at the application layer and they're willing to factor it in to their design if it allows using the otherwise idle HA standby? I'm fine with expanding that section of the documentation too, to make it more obvious that's the only situation this aspect of HS is aimed at and suitable for. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Yeb Havinga wrote: > Rob Wultsch wrote: >> I can not imagine setting this value to anything other than a bool and >> most of the time that bool would be -1. > That's funny because when I was reading this thread, I was thinking > the exact opposite: having max_standby_delay always set to 0 so I know > the standby server is as up-to-date as possible. If you ask one person about this, you'll discover they only consider one behavior here sane, and any other setting is crazy. Ask five people, and you'll likely find someone who believes the complete opposite. Ask ten and carefully work out the trade-offs they're willing to make given the fundamental limitations of replication, and you'll arrive at the range of behaviors available right now, plus some more that haven't been built yet. There are a lot of different types of database applications out there, each with their own reliability and speed requirements to balance. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Thu, 2010-05-06 at 12:08 +0200, Yeb Havinga wrote: > That's funny because when I was reading this thread, I was thinking the > exact opposite: having max_standby_delay always set to 0 so I know the > standby server is as up-to-date as possible. The application that > accesses the hot standby has to be 'special' anyway because it might > deliver not-up-to-date data. If that information about specialties > regarding querying the standby server includes the warning that queries > might get cancelled, they can opt for a retry themselves (is there a > special return code to catch that case? like PGRES_RETRY_LATER) or a > message to the user that their report is currently unavailable and they > should retry in a few minutes. Very sensible. Kevin Grittner already asked for that and I alread agreed, yet it has not been implemented that way http://archives.postgresql.org/pgsql-hackers/2008-12/msg01691.php Can anyone remember a specific objection to that 'cos it still sounds very sensible to me and is a quick, low impact change. Currently the SQLSTATE is ERRCODE_ADMIN_SHUTDOWN or ERRCODE_QUERY_CANCELED if not idle. That makes it even harder to determine the retryability, since it can come back in one of two states. Proposed SQLSTATE for HS cancellation is case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:case PROCSIG_RECOVERY_CONFLICT_LOCK:casePROCSIG_RECOVERY_CONFLICT_SNAPSHOT:case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:recovery_conflict_errcode= ERRCODE_T_R_SERIALIZATION_FAILURE;break;case PROCSIG_RECOVERY_CONFLICT_DATABASE:casePROCSIG_RECOVERY_CONFLICT_TABLESPACE:recovery_conflict_errcode = ERRCODE_ADMIN_SHUTDOWN; break; We don't have a retryable SQLSTATE, so perhaps we should document that serialization_failure and deadlock_detected are both retryable error conditions. As the above implies HS can throw some errors that are non-retyable, so we use ERRCODE_ADMIN_SHUTDOWN. Comments? -- Simon Riggs www.2ndQuadrant.com
On Thu, 2010-05-06 at 12:08 +0200, Yeb Havinga wrote: > That's funny because when I was reading this thread, I was thinking the > exact opposite: having max_standby_delay always set to 0 so I know the > standby server is as up-to-date as possible. The application that > accesses the hot standby has to be 'special' anyway because it might > deliver not-up-to-date data. If that information about specialties > regarding querying the standby server includes the warning that queries > might get cancelled, they can opt for a retry themselves (is there a > special return code to catch that case? like PGRES_RETRY_LATER) or a > message to the user that their report is currently unavailable and they > should retry in a few minutes. Very sensible. Kevin Grittner already asked for that and I alread agreed, yet it has not been implemented that way http://archives.postgresql.org/pgsql-hackers/2008-12/msg01691.php Can anyone remember a specific objection to that 'cos it still sounds very sensible to me and is a quick, low impact change. Currently the SQLSTATE is ERRCODE_ADMIN_SHUTDOWN or ERRCODE_QUERY_CANCELED if not idle. That makes it even harder to determine the retryability, since it can come back in one of two states. Proposed SQLSTATE for HS cancellation is case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:case PROCSIG_RECOVERY_CONFLICT_LOCK:casePROCSIG_RECOVERY_CONFLICT_SNAPSHOT:case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:recovery_conflict_errcode= ERRCODE_T_R_SERIALIZATION_FAILURE;break;case PROCSIG_RECOVERY_CONFLICT_DATABASE:casePROCSIG_RECOVERY_CONFLICT_TABLESPACE:recovery_conflict_errcode = ERRCODE_ADMIN_SHUTDOWN; break; We don't have a retryable SQLSTATE, so perhaps we should document that serialization_failure and deadlock_detected are both retryable error conditions. As the above implies HS can throw some errors that are non-retyable, so we use ERRCODE_ADMIN_SHUTDOWN. Comments? -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > It would be easier to implement a conflict resolution plugin that is > called when a conflict occurs, allowing users to have a customisable > mechanism. Again, I have no objection to that proposal. To implement, if you say so, no doubt. To use, that means you need to install a contrib module after validation that the trade offs there are the one you're interested into, or you have to code it yourself. In C. I don't see that as an improvement over what we have now. Our main problem seems to be the documentation of the max_standby_delay, where we give the impression it's doing things the code can not do. IIUC. Regards, -- dim
On Thu, 2010-05-06 at 16:09 +0200, Dimitri Fontaine wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > It would be easier to implement a conflict resolution plugin that is > > called when a conflict occurs, allowing users to have a customisable > > mechanism. Again, I have no objection to that proposal. > > To implement, if you say so, no doubt. To use, that means you need to > install a contrib module after validation that the trade offs there are > the one you're interested into, or you have to code it yourself. In C. > > I don't see that as an improvement over what we have now. Our main > problem seems to be the documentation of the max_standby_delay, where we > give the impression it's doing things the code can not do. IIUC. I meant "easier to implement than what Florian suggested". The plugin would also allow you to have the pause/resume capability. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Thu, 2010-05-06 at 16:09 +0200, Dimitri Fontaine wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> It would be easier to implement a conflict resolution plugin that is >>> called when a conflict occurs, allowing users to have a customisable >>> mechanism. Again, I have no objection to that proposal. >> To implement, if you say so, no doubt. To use, that means you need to >> install a contrib module after validation that the trade offs there are >> the one you're interested into, or you have to code it yourself. In C. >> >> I don't see that as an improvement over what we have now. Our main >> problem seems to be the documentation of the max_standby_delay, where we >> give the impression it's doing things the code can not do. IIUC. > > I meant "easier to implement than what Florian suggested". > > The plugin would also allow you to have the pause/resume capability. Not the same plugin. A hook for stop/resume would need to be called before and/or after each record, the one for conflict resolution would need to be called at each conflict. Designing a good interface for a plugin is hard, you need at least a couple of samples ideas for plugins that would use the hook, before you know the interface is flexible enough. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Thu, 2010-05-06 at 16:09 +0200, Dimitri Fontaine wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> It would be easier to implement a conflict resolution plugin that is >>> called when a conflict occurs, allowing users to have a customisable >>> mechanism. Again, I have no objection to that proposal. >> To implement, if you say so, no doubt. To use, that means you need to >> install a contrib module after validation that the trade offs there are >> the one you're interested into, or you have to code it yourself. In C. >> >> I don't see that as an improvement over what we have now. Our main >> problem seems to be the documentation of the max_standby_delay, where we >> give the impression it's doing things the code can not do. IIUC. > > I meant "easier to implement than what Florian suggested". > > The plugin would also allow you to have the pause/resume capability. Not the same plugin. A hook for stop/resume would need to be called before and/or after each record, the one for conflict resolution would need to be called at each conflict. Designing a good interface for a plugin is hard, you need at least a couple of sample ideas for plugins that would use the hook, before you know the interface is flexible enough. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2010-05-06 at 17:39 +0300, Heikki Linnakangas wrote: > Not the same plugin. A hook for stop/resume would need to be called > before and/or after each record, the one for conflict resolution would > need to be called at each conflict. Designing a good interface for a > plugin is hard, you need at least a couple of samples ideas for plugins > that would use the hook, before you know the interface is flexible enough. * current behaviour of max_standby_delay * pause for X time if conflict, then cancel - which is the suggested behaviour upthread * pause-if-conflict, explicit resume needed Info passed to plugin * conflict type * relfilenode * latestRemovedXid -- Simon Riggs www.2ndQuadrant.com
On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm inclined to think that we should throw away all this logic and just > have the slave cancel competing queries if the replay process waits > more than max_standby_delay seconds to acquire a lock. This is simple, > understandable, and behaves the same whether we're reading live data or > not. Now that I've realized what the real problem is with max_standby_delay (namely, that inactivity on the master can use up the delay), I think we should do what Tom originally suggested here. It's not as good as a really working max_standby_delay, but we're not going to have that for 9.0, and it's clearly better than a boolean. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Simon Riggs <simon@2ndQuadrant.com> wrote: > We don't have a retryable SQLSTATE, so perhaps we should document > that serialization_failure and deadlock_detected are both > retryable error conditions. As the above implies HS can throw some > errors that are non-retyable, so we use ERRCODE_ADMIN_SHUTDOWN. There is one SQLSTATE (40001 - serialization_failure) to indicate that the transaction failed due to conflicts with concurrent transactions. This is supposed to be used when there was no problem with the transaction itself, were it to be run when nothing else is running and the environment isn't failing. (For example, it isn't the right code to use if you fail because your network connection is broken or you're running out of disk space.) In my view it is a mistake *not* to use it whenever concurrent transactions cause failure; we can always use different error message text, details, and hints to differentiate between the particular ways in which concurrent transactions caused the failure. The advantage of consistently using the standard value for this is that there is software which recognizes this and automatically retries the transaction from the beginning in a manner which is transparent to the users and the application code. Our users never see anything but a delay when hitting this; it's effectively the same as blocking, with no application programmer effort needed. Having a separate deadlock_detected SQLSTATE is arguably a bug. (In fact, I have argued that; but haven't persuaded the community of it.) Coming up with yet another SQLSTATE to indicate that there's nothing wrong with the transaction except its timing in relation to other transactions would, in my view, compound the error. That said, our shop is resourceful enough (in more than one sense) to overcome this if need be. I just don't see why the community would choose to create barriers which cause failures for software which would otherwise Just Work, and the corresponding programming burden on shops using PostgreSQL to keep a list of "nothing really wrong with the transaction, you can probably just reschedule it" SQLSTATE values for the product. -Kevin
> Now that I've realized what the real problem is with max_standby_delay > (namely, that inactivity on the master can use up the delay), I think > we should do what Tom originally suggested here. It's not as good as > a really working max_standby_delay, but we're not going to have that > for 9.0, and it's clearly better than a boolean. I guess I'm not clear on how what Tom proposed is fundamentally different from max_standby_delay = -1. If there's enough concurrent queries, recovery would never catch up. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, May 6, 2010 at 2:47 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> Now that I've realized what the real problem is with max_standby_delay >> (namely, that inactivity on the master can use up the delay), I think >> we should do what Tom originally suggested here. It's not as good as >> a really working max_standby_delay, but we're not going to have that >> for 9.0, and it's clearly better than a boolean. > > I guess I'm not clear on how what Tom proposed is fundamentally > different from max_standby_delay = -1. If there's enough concurrent > queries, recovery would never catch up. If your workload is that the standby server is getting pounded with queries like crazy, then it's probably not that different: it will fall progressively further behind. But I suspect many people will set up standby servers where most of the activity happens on the primary, but they run some reporting queries on the standby. If you expect your reporting queries to finish in <10s, you could set the max delay to say 60s. In the event that something gets wedged, recovery will eventually kill it and move on rather than just getting stuck forever.If the volume of queries is known not to be too high,it's reasonable to expect that a few good whacks will be enough to get things back on track. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
All, We are in Beta1, now, and it's May. Original goal for 9.0 was June-July. We cannot be introducing major design revisions to HS/SR at this date, or we won't ship until December. There are at least 10 other major features in 9.0, some of which are more important to some of our users than HS/SR. More importantly, I think the discussion on this thread makes it very clear that no matter how much discussion we have on standby delay, we are NOT going to get it right the first time. That is, *even if* we replace Simon's code with something else, that something else will have as many issues for real users as the current delay does, especially since we won't even have started debugging or testing the new code yet. So changing to a lock-based mechanism or designing a plugin interface are really not at all realistic at this date. Realistically, we have two options at this point: 1) reduce max_standby_delay to a boolean. 2) have a delay option (based either on WAL glob start time or on system time) like the current max_standby_delay, preferably with some bugs fixed. If we do (1), we'll be having this discussion all over again in September, and will be no better off because we won't have any production feedback on Simon's approach. If we do (2) we can hedge it in the documentation with requirements and cautions, and hopefully only dedicated DBAs will touch it, and we'll get good feedback from them on how we should redesign it for 9.1. And if it works as badly as Tom expects, then we won't have an issue with maintaining backwards compatibility, because people will be *happy* to change. One way to communicate this would be to have 2 GUCs instead of one: allow_query_cancel = on|off # defaults to on max_standby_timeout = 0 # SEE DOCS BEFORE CHANGING We named this release 9.0 because, among other things, we expected it to be less stable than the prior 3 releases. And we can continue to tell users that. I know I won't be moving any of my clients to 9.0.0. I said it before and I'll say it again: "release early, release often". -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus wrote: > All, > > We are in Beta1, now, and it's May. Original goal for 9.0 was > June-July. We cannot be introducing major design revisions to HS/SR at > this date, or we won't ship until December. > > There are at least 10 other major features in 9.0, some of which are > more important to some of our users than HS/SR. More importantly, I > think the discussion on this thread makes it very clear that no matter > how much discussion we have on standby delay, we are NOT going to get it > right the first time. That is, *even if* we replace Simon's code with > something else, that something else will have as many issues for real > users as the current delay does, especially since we won't even have > started debugging or testing the new code yet. > > So changing to a lock-based mechanism or designing a plugin interface > are really not at all realistic at this date. > > Realistically, we have two options at this point: > > 1) reduce max_standby_delay to a boolean. I suggest calling it 'delay_wal_application' or 'wal_query_cancel' or something like that. > 2) have a delay option (based either on WAL glob start time or on system > time) like the current max_standby_delay, preferably with some bugs fixed. I don't think releasing something that many of us can barely understand is going to help. I think Heikki is right that we might get feedback from 9.0 that this setting isn't even useful. If we can't get this right, and it seems we can't, we should just push this to 9.1. Remember, delaying wal application just delays making the standby a master and makes the slave data appear staler. We can just tell people that the larger their queries are, the larger this delay will be. If they want to control this, they can set 'statement_timeout' already. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Bruce Momjian wrote: > Remember, delaying wal application just delays making the standby a > master and makes the slave data appear staler. We can just tell people > that the larger their queries are, the larger this delay will be. If > they want to control this, they can set 'statement_timeout' already. > While a useful defensive component, statement_timeout is a user setting, so it can't provide guaranteed protection against a WAL application denial of service from a long running query. A user that overrides the system setting and kicks off a long query puts you right back into needing a timeout to ensure forward progress of standby replay. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Bruce Momjian wrote: > > Remember, delaying wal application just delays making the standby a > > master and makes the slave data appear staler. We can just tell people > > that the larger their queries are, the larger this delay will be. If > > they want to control this, they can set 'statement_timeout' already. > > > > While a useful defensive component, statement_timeout is a user setting, > so it can't provide guaranteed protection against a WAL application > denial of service from a long running query. A user that overrides the > system setting and kicks off a long query puts you right back into > needing a timeout to ensure forward progress of standby replay. The nice thing about query cancel is that it give predictable behavior. We could make statement_timeout that can't be changed if it is set in postgresql.conf. Again, let's think of that for 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Thu, 2010-05-06 at 15:00 +0100, Simon Riggs wrote: > On Thu, 2010-05-06 at 12:08 +0200, Yeb Havinga wrote: > > > That's funny because when I was reading this thread, I was thinking the > > exact opposite: having max_standby_delay always set to 0 so I know the > > standby server is as up-to-date as possible. The application that > > accesses the hot standby has to be 'special' anyway because it might > > deliver not-up-to-date data. If that information about specialties > > regarding querying the standby server includes the warning that queries > > might get cancelled, they can opt for a retry themselves (is there a > > special return code to catch that case? like PGRES_RETRY_LATER) or a > > message to the user that their report is currently unavailable and they > > should retry in a few minutes. > > Very sensible. Kevin Grittner already asked for that and I alread > agreed, yet it has not been implemented that way > http://archives.postgresql.org/pgsql-hackers/2008-12/msg01691.php > > Can anyone remember a specific objection to that 'cos it still sounds > very sensible to me and is a quick, low impact change. > > Currently the SQLSTATE is ERRCODE_ADMIN_SHUTDOWN or > ERRCODE_QUERY_CANCELED if not idle. That makes it even harder to > determine the retryability, since it can come back in one of two states. > > Proposed SQLSTATE for HS cancellation is > case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN: > case PROCSIG_RECOVERY_CONFLICT_LOCK: > case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: > case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK: > recovery_conflict_errcode = ERRCODE_T_R_SERIALIZATION_FAILURE; > break; > case PROCSIG_RECOVERY_CONFLICT_DATABASE: > case PROCSIG_RECOVERY_CONFLICT_TABLESPACE: > recovery_conflict_errcode = ERRCODE_ADMIN_SHUTDOWN; > break; > > We don't have a retryable SQLSTATE, so perhaps we should document that > serialization_failure and deadlock_detected are both retryable error > conditions. As the above implies HS can throw some errors that are > non-retyable, so we use ERRCODE_ADMIN_SHUTDOWN. Implemented as ERRCODE_ADMIN_SHUTDOWN only in the case of a dropped database. ERRCODE_T_R_SERIALIZATION_FAILURE in other cases. -- Simon Riggs www.2ndQuadrant.com
Attachment
On Thu, 2010-05-06 at 12:03 -0700, Josh Berkus wrote: > So changing to a lock-based mechanism or designing a plugin interface > are really not at all realistic at this date. I agree that changing to a lock-based mechanism is too much at this stage of development. However, putting in a plugin is trivial. We could do it if we choose, without instability or risk. It is as easy a change as option (1). It's not complex to design because it would use the exact same API as the internal conflict resolution module already does; we can just move the current conflict code straight into a contrib module. This can be done bug-free in about 3 hours work. There is no performance issue associated with that either. Plugins would allow all of the various mechanisms requested on list over 18 months, nor would they prevent including some of those options within the core at a later date. Without meaning to cause further contention, it is very clear that putting in contrib modules isn't bad after all, so there is no material argument against the plugin approach. I recognise that plugins for some reason ignite unstated fears, by observation that there is always an argument every time I mention them. I invite an explanation of that off-list. > Realistically, we have two options at this point: > > 1) reduce max_standby_delay to a boolean. > > 2) have a delay option (based either on WAL glob start time or on system > time) like the current max_standby_delay, preferably with some bugs fixed. With a plugin option, I would not object to option 1. If option 1 was taken, without plugins, it's clear that would be against consensus. Having said that, I'll confirm now there will not be an extreme reaction from me if option (1) was forced, nor do I counsel that from others. > I said it before and I'll say it again: "release early, release often". None of this needs to delay release. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > With a plugin option, I would not object to option 1. > > If option 1 was taken, without plugins, it's clear that would be against > consensus. > > Having said that, I'll confirm now there will not be an extreme reaction > from me if option (1) was forced, nor do I counsel that from others. I found this email amusing. You phrase it like the community is supposed to be worried by an objection from you or an "extreme reaction"; I certainly am not. You have been in the community long enough to not use such phrasing. This is not the first time I have complained about this. I have no idea why an objection from you should mean more than an objection from anyone else in the community, and I have no idea what an "extreme reaction" means, or why anyone should care. Do you think the community is negotiting with you? I think the concensus is to change this setting to a boolean. If you don't want to do it, I am sure we can find someone who will. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce@momjian.us> wrote: > I think the concensus is to change this setting to a boolean. If you > don't want to do it, I am sure we can find someone who will. I still think we should revert to Tom's original proposal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce@momjian.us> wrote: > > I think the concensus is to change this setting to a boolean. ?If you > > don't want to do it, I am sure we can find someone who will. > > I still think we should revert to Tom's original proposal. And Tom's proposal was to do it on WAL slave arrival time? If we could get agreement from everyone that that is the proper direction, fine, but I am hearing things like plugins, and other complexity that makes it seem we are not getting closer to an agreed solution, and without agreement, the simplest approach seems to be just to remove the part we can't agree upon. I think the big question is whether this issue is significant enough that we should ignore our policy of no feature design during beta. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > I think the concensus is to change this setting to a boolean. ?If you >> > don't want to do it, I am sure we can find someone who will. >> >> I still think we should revert to Tom's original proposal. > > And Tom's proposal was to do it on WAL slave arrival time? If we could > get agreement from everyone that that is the proper direction, fine, but > I am hearing things like plugins, and other complexity that makes it > seem we are not getting closer to an agreed solution, and without > agreement, the simplest approach seems to be just to remove the part we > can't agree upon. > > I think the big question is whether this issue is significant enough > that we should ignore our policy of no feature design during beta. Tom's proposal was basically to define recovery_process_lock_timeout. The recovery process would wait X seconds for a lock, then kill whoever held it. It's not the greatest knob in the world for the reasons already pointed out, but I think it's still better than a boolean and will be useful to some users. And it's pretty simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Bruce Momjian <bruce@momjian.us> writes: > I have no idea why an objection from you should mean more than an > objection from anyone else in the community, and I have no idea what an > "extreme reaction" means, or why anyone should care. Maybe I shouldn't say anything here. But clearly while you're spot on that Simon's objection is worth just as much as any other contributor's, I disagree that we shouldn't care about the way those people feel about being a member of our community. I appreciate your efforts to avoid having anyone here use such a wording but I can't help to dislike your argument for it. I hope that's simply a localisation issue (l10n is so much harder than i18n). Anyway, I so much hate reading such exchanges here that I couldn't help ranting about it. Back to suitable -hackers content. > I think the concensus is to change this setting to a boolean. If you > don't want to do it, I am sure we can find someone who will. I don't think so. I understand the current state to be:a. this problem is not blocking beta, but a must fix before releaseb.we either have to change the API or the behaviorc. only one behavior change has been proposed, by Tomd. proposedbehavior would favor queries rather than availabilitye. API change 1 is boolean + explicit pause/resume commandf.API change 2 is boolean + plugin facility, with a contrib for current behavior. g. API change 3 is boolean only I don't remember reading any mail on this thread bearing consensus on the choices above, but rather either one of us pushing for his vision or people defending the current situation, complaining about it or asking that a reasonable choice is made soon. If we have to choose between reasonable and soon, soon won't be my vote. Beta is meant to last more or less 3 months after all. Each party's standing is clear. Decision remains to be made, and I guess that the one writing the code will have a much louder voice. Regards, -- dim
Robert Haas wrote: > On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> > I think the concensus is to change this setting to a boolean. ?If you > >> > don't want to do it, I am sure we can find someone who will. > >> > >> I still think we should revert to Tom's original proposal. > > > > And Tom's proposal was to do it on WAL slave arrival time? ?If we could > > get agreement from everyone that that is the proper direction, fine, but > > I am hearing things like plugins, and other complexity that makes it > > seem we are not getting closer to an agreed solution, and without > > agreement, the simplest approach seems to be just to remove the part we > > can't agree upon. > > > > I think the big question is whether this issue is significant enough > > that we should ignore our policy of no feature design during beta. > > Tom's proposal was basically to define recovery_process_lock_timeout. > The recovery process would wait X seconds for a lock, then kill > whoever held it. It's not the greatest knob in the world for the > reasons already pointed out, but I think it's still better than a > boolean and will be useful to some users. And it's pretty simple. I thought there was concern about lock stacking causing unpredictable/unbounded delays. I am not sure boolean has a majority vote, but I am suggesting that because it is the _minimal_ feature set, and when we can't agree during beta, the minimal feature set seems like the best choice. Clearly, anything is more feature-full than boolean --- the big question is whether Tom's proposal is significantly better than boolean that we should spend the time designing and implementing it, with the possibility it will all be changed in 9.1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Bruce Momjian wrote: > I think the big question is whether this issue is significant enough > that we should ignore our policy of no feature design during beta The idea that you're considering removal of a feature that we already have people using in beta and making plans around is a policy violation too you know. A freeze should include not cutting things just because their UI or implementation is not ideal yet. And you've been using the word "consensus" here when there is no such thing. At best there's barely a majority here among people who have stated an opinion, and consensus means something much stronger even than that; that means something closer to unanimity. I thought the summary of where the project is at Josh wrote at http://archives.postgresql.org/message-id/4BE31279.7040002@agliodbs.com was excellent, both from a technical and a process commentary standpoint. I'd be completely happy to follow that plan, and then we'd be at a consensus--with no one left arguing. It was very clear back in February that if SR didn't hit the feature set to make HS less troublesome out of the box, there would be some limitations here, and that set of concerns hasn't changed much since then. I thought the backup plan if we didn't get things like xid feedback was to keep the capability as written anyway, knowing that it's still much better than no control over cancellation timing available at all. Keep improving documentation around its issues, and continue to hack away at them in user space and in the field. Then we do better for 9.1. You seem bent on removing the feedback part of that cycle. The full statement of the ESR bit Josh was quoting is "Release early. Release often. And listen to your customers."[1] My customers include some of whom believed the PostgreSQL community process enough to contribute toward the HS development that's been completed and donated to the project. They have a pretty clear view on this I'm relaying when I talk about what I'd like to see happen. They are saying they cannot completely ignore their requirements for HA failover, but would be willing to loosen them just a bit (increasing failover time slightly) if it reduces the odds of query cancellation, and therefore improves how much load they can expect to push toward the standby. max_standby_delay is a currently available mechanism that does that. I'm not going to be their nanny and say "no, that's not perfectly predictable, you might get a query canceled sometimes when you don't expect it anyway". Instead, I was hoping to let them deploy 9.0 with this option available (but certainly not the default), informed of the potential risks, see how that goes. We can confirm whether the userland workarounds we believe will be effective here really are. If so, then we can solider forward directly incorporating them into the server code, knowing that works. If not, switch to one of the safer modes, see if there's something better to use altogether in 9.1, and perhaps this whole approach gets removed. That's healthy development progress either way. Upthread Bruce expressed some concern that this was going to live forever once deployed. There is no way I'm going to let this behavior continue to be available in 9.1 if field tests say the workarounds aren't good enough. That's going to torture all of us who do customer deployments of this technology every day if that turns out to be the case, and nobody is going to feel the heat from that worse than 2ndQuadrant. I did a round once of removing GUCs that didn't do what they were expected to in the field before, based on real-world tests showing regular misuse, and I'll do it again if this falls into that same category. We've already exposed this release to a whole stack of risk from work during its development cycle, risk that doesn't really drop much just from cutting this one bit. I'd at least like to get all the reward possible from that risk, which I expected to include feedback in this area. Circumventing the planned development process by dropping this now will ruin how I expected the project to feel out the right thing on the user side, and we'll all be left with little more insight for what to do in 9.1 than we have now. And I'm not looking forward to explaining to people why a feature they've been seeing and planning to deploy for months has now been cut only after what was supposed to be a freeze for beta. [1] http://catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s04.html , and this particular bit is quite relevant here: "Linus was keeping his hacker/users constantly stimulated and rewarded—stimulated by the prospect of having an ego-satisfying piece of the action, rewarded by the sight of constant (even daily) improvement in their work. Linus was directly aiming to maximize the number of person-hours thrown at debugging and development, even at the possible cost of instability in the code and user-base burnout if any serious bug proved intractable." I continue to be disappointed at how contributing code to PostgreSQL is far more likely to come with a dose of argument and frustration rather than reward, and this discussion is a perfect example of such. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Bruce Momjian wrote: > > I think the big question is whether this issue is significant enough > > that we should ignore our policy of no feature design during beta > > The idea that you're considering removal of a feature that we already > have people using in beta and making plans around is a policy violation > too you know. A freeze should include not cutting things just because > their UI or implementation is not ideal yet. And you've been using the > word "consensus" here when there is no such thing. At best there's > barely a majority here among people who have stated an opinion, and > consensus means something much stronger even than that; that means > something closer to unanimity. I thought the summary of where the > project is at Josh wrote at > http://archives.postgresql.org/message-id/4BE31279.7040002@agliodbs.com > was excellent, both from a technical and a process commentary > standpoint. I'd be completely happy to follow that plan, and then we'd > be at a consensus--with no one left arguing. I can't argue with anything you have said in your email. The big question is whether designing during beta is worth it in this case, and whether we can get something that is useful and gives us useful feedback for 9.1, and is it worth spending the time to figure this out during beta? If we can, great, let's do it, but I have not seen that yet, and I am unclear how long we should keep trying to find it. I think everyone agrees the current code is unusable, per Heikki's comment about a WAL file arriving after a period of no WAL activity, and look how long it took our group to even understand why that fails so badly. I thought Tom's idea had problems, and there were ideas of how to improve it. It just seems like we are drifting around on something that has no easy solution, and not something that we are likely to hit during beta where we should be focusing on the release. Saying we have three months to fix this during beta seems like a recipe for delaying the final release, and this feature is not worth that. What we could do is to convert max_standby_delay to a boolean, 'ifdef' out the code that was handling non-boolean cases, and then if someone wants to work on a patch in a corner and propose something in a month that improves this, we can judge the patch on its own merits, and apply it if it is a great benefit, because basically that is what we are doing now if we fix this --- adding a new patch/feature during beta. (Frankly, because we are not requiring an initdb during beta, I am unclear how we are going to rename max_standby_delay to behave as a boolean.) It is great if we can get a working max_standby_delay, but I fear drifting/distraction at this stage. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote: > I think everyone agrees the current code is unusable, per Heikki's > comment about a WAL file arriving after a period of no WAL activity, and > look how long it took our group to even understand why that fails so > badly. To be honest its not *that* hard to simply make sure generating wal regularly to combat that. While it surely aint a nice workaround its not much of a problem either. Andres
Andres Freund <andres@anarazel.de> writes: > On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote: >> I think everyone agrees the current code is unusable, per Heikki's >> comment about a WAL file arriving after a period of no WAL activity, and >> look how long it took our group to even understand why that fails so >> badly. > To be honest its not *that* hard to simply make sure generating wal regularly > to combat that. While it surely aint a nice workaround its not much of a > problem either. Well, that's dumping a kluge onto users; but really that isn't the point. What we have here is a badly designed and badly implemented feature, and we need to not ship it like this so as to not institutionalize a bad design. I like the proposal of a boolean because it provides only the minimal feature set of two cases that are both clearly needed and easily implementable. Whatever we do later is certain to provide a superset of those two cases. If we do something else (and that includes my own proposal of a straight lock timeout), we'll be implementing something we might wish to take back later. Taking out features after they've been in a release is very hard, even if we realize they're badly designed. regards, tom lane
Tom Lane wrote: > Taking out features after they've been in a release is very hard, even if we realize they're badly > designed. > It doesn't have to be; that's the problem the "release often" part takes care of. If a release has only been out a year, and a new one comes out saying "oh, that thing we released for the first time in the last version, it didn't work as well as we'd hoped in the field; you should try to avoid that and use this new implementation that works better instead once you can upgrade", that's not only not hard, it's exactly what people using a X.0 release expect to happen. I've read the message from you that started off this thread several times now. Your low-level code implementation details shared later obviously need to be addressed. But all of the "fundamental" and "fatal" issues you mentioned at the start continue to strike me as either situations where you don't agree with the use case this was designed for, or spots where you feel the userland workarounds required to make it work right are too onerous. Bruce's objections seem to fall mainly into the latter category. I've been wandering around talking to people about that exact subject--what do people want and expect from Hot Standby, and what would they do to gain its benefits--for over six months now, independently of Simon's work which did a lot of that before me too. The use cases are covered as best they can be without better support from expected future SR features like heartbeats and XID loopback. As for the workarounds required to make things work, the responses I get match what we just saw from Andres. When the required details are explained, people say "that's annoying but I can do that", and off we go. There are significant documentation issues I know need to be cleaned up here, and I've already said I'll take care of that as soon as freeze is really here and I have a stable target. (That this discussion is still going on says that's not yet) What I fail to see are problems significant enough to not ship the parts of this feature that are done, so that it can be used by those it is appropriate for, allow feedback, and make it easy to test individual improvements upon what's already there. I can't make you prioritize based on what people are telling me. All I can do is suggest you reconsider handing control over the decision to use this feature or not to the users of the software, so they can make their own choice. I'm tired of arguing about this instead of doing productive work, and I've done all I can here to try and work within the development process of the community. If talk of removing the max_standby_delay feature clears up, I'll happily provide my promised round of documentation updates, to make its limitations and associated workarounds as clear as they can be, within a week of being told go on that. If instead this capability goes away, making those moot, I'll maintain my own release for the 2ndQuadrant customers who have insisted they need this capability if I have to. That would be really unfortunate, because the only bucket I can pull time out of for that is the one I currently allocate to answering questions on the mailing lists here most days. I'd rather spend that helping out the PostgreSQL community, but we do need to deliver what our customers want too. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Tom Lane wrote: > > > Taking out features after they've been in a release is very hard, even if we realize they're badly > > designed. > > > > It doesn't have to be; that's the problem the "release often" part takes > care of. If a release has only been out a year, and a new one comes out > saying "oh, that thing we released for the first time in the last > version, it didn't work as well as we'd hoped in the field; you should > try to avoid that and use this new implementation that works better > instead once you can upgrade", that's not only not hard, it's exactly > what people using a X.0 release expect to happen. I think this is the crux of the issue. Tom and I are saying that historically we have shipped only complete features, or as complete as reasonable, and have removed items during beta that we found didn't meet this criteria, in an attempt to reduce the amount of feature set churn in Postgres. A database is complex, so modifying the API between major releases is something we only do when we find a significant benefit. In this case, if we keep max_standby_delay as non-boolean, we know it will have to be redesigned in 9.1, and it is unclear to me what additional knowledge we will gain by shipping it in 9.0, except to have to tell people that it doesn't work well or requires complex work-arounds, and that doesn't thrill any of us. (I already suggested that statement_timeout might supply a reasonable and predictable workaround for non-boolean usage of max_standby_delay.) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Sat, May 8, 2010 at 6:51 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > Robert Haas wrote: >> >> On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian <bruce@momjian.us> wrote: >> >> > I think the concensus is to change this setting to a boolean. ?If you >> >> > don't want to do it, I am sure we can find someone who will. >> >> >> >> I still think we should revert to Tom's original proposal. >> > >> > And Tom's proposal was to do it on WAL slave arrival time? ?If we could >> > get agreement from everyone that that is the proper direction, fine, but >> > I am hearing things like plugins, and other complexity that makes it >> > seem we are not getting closer to an agreed solution, and without >> > agreement, the simplest approach seems to be just to remove the part we >> > can't agree upon. >> > >> > I think the big question is whether this issue is significant enough >> > that we should ignore our policy of no feature design during beta. >> >> Tom's proposal was basically to define recovery_process_lock_timeout. >> The recovery process would wait X seconds for a lock, then kill >> whoever held it. It's not the greatest knob in the world for the >> reasons already pointed out, but I think it's still better than a >> boolean and will be useful to some users. And it's pretty simple. > > I thought there was concern about lock stacking causing > unpredictable/unbounded delays. I am not sure boolean has a majority > vote, but I am suggesting that because it is the _minimal_ feature set, > and when we can't agree during beta, the minimal feature set seems like > the best choice. > > Clearly, anything is more feature-full than boolean --- the big question > is whether Tom's proposal is significantly better than boolean that we > should spend the time designing and implementing it, with the > possibility it will all be changed in 9.1. I doubt it's likely to be thrown out completely. We might decide to fine-tune it in some way. My fear is that if we ship this with only a boolean, we're shipping crippleware. If that fear turns out to be unfounded, I will of course be happy, but that's my concern, and I don't believe that it's entirely unfounded. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > > Clearly, anything is more feature-full than boolean --- the big question > > is whether Tom's proposal is significantly better than boolean that we > > should spend the time designing and implementing it, with the > > possibility it will all be changed in 9.1. > > I doubt it's likely to be thrown out completely. We might decide to > fine-tune it in some way. My fear is that if we ship this with only a > boolean, we're shipping crippleware. If that fear turns out to be > unfounded, I will of course be happy, but that's my concern, and I > don't believe that it's entirely unfounded. Well, historically, we have been willing to not ship features if we can't get it right. No one has ever accused us of crippleware, but our hesitancy has caused slower user adoption, though long-term, it has helped us grow a dedicated user base that trusts us. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Sun, May 9, 2010 at 12:08 AM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> > Clearly, anything is more feature-full than boolean --- the big question >> > is whether Tom's proposal is significantly better than boolean that we >> > should spend the time designing and implementing it, with the >> > possibility it will all be changed in 9.1. >> >> I doubt it's likely to be thrown out completely. We might decide to >> fine-tune it in some way. My fear is that if we ship this with only a >> boolean, we're shipping crippleware. If that fear turns out to be >> unfounded, I will of course be happy, but that's my concern, and I >> don't believe that it's entirely unfounded. > > Well, historically, we have been willing to not ship features if we > can't get it right. No one has ever accused us of crippleware, but our > hesitancy has caused slower user adoption, though long-term, it has > helped us grow a dedicated user base that trusts us. We can make the decision to not ship the feature if the feature is "max_standby_delay". But I think the feature is "Hot Standby", which I think we've pretty much committed to shipping. And I am concerned that if the only mechanism for controlling query cancellation vs. recovery lag is a boolean, people feel that we didn't get Hot Standby right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sat, 2010-05-08 at 14:48 -0400, Bruce Momjian wrote: > I think the consensus is to change this setting to a boolean. If you > don't want to do it, I am sure we can find someone who will. You expect others to act on consensus and follow rules, yet ignore them yourself when it suits your purpose. Your other points seem designed to distract people from seeing that. There is clear agreement that a problem exists. The action to take as a result of that problem is very clearly in doubt and yet you repeatedly ignore other people's comments and viable technical resolutions. If you can find a cat's paw to break consensus for you, more fool them. You might find someone with a good resolution, if you ask that instead. -- Simon Riggs www.2ndQuadrant.com
Bruce Momjian wrote: > I think everyone agrees the current code is unusable, per Heikki's > comment about a WAL file arriving after a period of no WAL > activity I don't. I am curious to hear how many complaints we've had from alpha and beta testers of HS regarding this issue. I know that if we used it with our software, the issue would probably go unnoticed because of our usage patterns and automatic query retry. A positive setting would work as intended for us. I can think of pessimal usage patterns, different software approaches, and/or goals for HS usage which would conflict badly with a positive setting. Hopefully we can document this area better than we've historically done with, for example, fsync -- which has similar trade-offs, only with more dire consequences for bad user choices. -Kevin
Tom Lane <tgl@sss.pgh.pa.us> writes: > I like the proposal of a boolean because it provides only the minimal > feature set of two cases that are both clearly needed and easily > implementable. Whatever we do later is certain to provide a superset > of those two cases. If we do something else (and that includes my own > proposal of a straight lock timeout), we'll be implementing something > we might wish to take back later. Taking out features after they've > been in a release is very hard, even if we realize they're badly > designed. That's where I though my proposal fitted in. I fail to see us wanting to take back explicit pause/resume admin functions in any future release. Now, after having read Greg's arguments, my vote would be the following:- hot_standby_conflict_winner = queries|replay, defaultsto replay- add pause/resume so that people can switch temporarily to queries- label max_standby_delay *experimental*,keep current code By clearly stating the feature is *experimental* it should be easy to both get feedback on it so that we know what to implement in 9.1, and should that be completely different, take back the feature. It should even be possible to continue tweaking its behavior during beta, or do something better. Of course it will piss off some users, but they knew they were depending on some *experimental* feature after all. Regards, -- dim
On May 9, 2010, at 13:59 , Dimitri Fontaine wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I like the proposal of a boolean because it provides only the minimal >> feature set of two cases that are both clearly needed and easily >> implementable. Whatever we do later is certain to provide a superset >> of those two cases. If we do something else (and that includes my own >> proposal of a straight lock timeout), we'll be implementing something >> we might wish to take back later. Taking out features after they've >> been in a release is very hard, even if we realize they're badly >> designed. > > That's where I though my proposal fitted in. I fail to see us wanting to > take back explicit pause/resume admin functions in any future release. > > Now, after having read Greg's arguments, my vote would be the following: > - hot_standby_conflict_winner = queries|replay, defaults to replay > - add pause/resume so that people can switch temporarily to queries > - label max_standby_delay *experimental*, keep current code Adding pause/resume seems to introduce some non-trivial locking problems, though. How would you handle a pause request ifthe recovery process currently held a lock? Dropping the lock is not an option for correctness reasons. Otherwise you wouldn't have needed to take the lock in the firstplace, no? Pausing with the lock held leads to priority-inversion like problems. Queries now might block until recovery is resumed -quite the opposite of what pause() is supposed to archive The only remaining option is to continue applying WAL until you reach a point where no locks are held, then pause. But froma user's POV that is nearly indistinguishable from simply setting hot_standby_conflict_winner to in the first place Ithink. best regards, Florian Pflug
On Sun, May 9, 2010 at 4:00 AM, Greg Smith <greg@2ndquadrant.com> wrote: > The use cases are covered as best they can be without better support from > expected future SR features like heartbeats and XID loopback. For what it's worth I think deferring these extra complications is a very useful exercise. I would like to see a system that doesn't depend on them for basic functionality. In particular I would like to see a system that can be useful using purely WAL log shipping without streaming replication at all. I'm a bit unclear how the boolean proposal would solve things though. Surely if you set the boolean to recovery-wins then when using streaming replication with any non-idle master virtually every query would be cancelled immediately as every HOT cleanup would cause a snapshot conflict with even short-lived queires in the slave. -- greg
On Sun, May 9, 2010 at 12:47 PM, Greg Stark <gsstark@mit.edu> wrote: > On Sun, May 9, 2010 at 4:00 AM, Greg Smith <greg@2ndquadrant.com> wrote: >> The use cases are covered as best they can be without better support from >> expected future SR features like heartbeats and XID loopback. > > For what it's worth I think deferring these extra complications is a > very useful exercise. I would like to see a system that doesn't depend > on them for basic functionality. In particular I would like to see a > system that can be useful using purely WAL log shipping without > streaming replication at all. > > I'm a bit unclear how the boolean proposal would solve things though. > Surely if you set the boolean to recovery-wins then when using > streaming replication with any non-idle master virtually every query > would be cancelled immediately as every HOT cleanup would cause a > snapshot conflict with even short-lived queires in the slave. It sounds to me like what we need here is some testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sat, 2010-05-08 at 20:57 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote: > >> I think everyone agrees the current code is unusable, per Heikki's > >> comment about a WAL file arriving after a period of no WAL activity, and > >> look how long it took our group to even understand why that fails so > >> badly. > > > To be honest its not *that* hard to simply make sure generating wal regularly > > to combat that. While it surely aint a nice workaround its not much of a > > problem either. > > Well, that's dumping a kluge onto users; but really that isn't the > point. What we have here is a badly designed and badly implemented > feature, and we need to not ship it like this so as to not > institutionalize a bad design. No, you have it backwards. HS was designed to work with SR. SR unfortunately did not deliver any form of monitoring, and in doing so the keepalive that it was known HS needed was left out, although it had been on the todo list for some time. Luckily Greg and I argued to have some monitoring added and my code was used to provide barest minimum monitoring for SR, yet not enough to help HS. Of course, if one team doesn't deliver for whatever reason then others must take up the slack, if they can: no complaints. Since I personally didn't know this was going to be the case until after freeze, it is very late to resolve this situation sensibly and time has been against us. It's much harder for me to reach into the depths of another person's work and see how to add necessary mechanisms, especially when I'm working elsewhere. Even if I had done, it's likely that I would have been blocked with the "great idea, next release" response as already used on this thread. Without doubt the current mechanism suffers from the issues you mention, though the current state is not the result of bad design, merely inaction and lack of integration. We could resolve the current state in many ways, if we chose. Bruce has used the word crippleware for the current state. Raising a problem and then blocking solutions is the best way I know to cripple a release. It should be clear that I've done my best to avoid this situation and have been active on both SR and HS. Had I not acted as I have done to date, SR would at this point slurp CPU like a bandit and be unmonitorable, both fatal flaws in production. I point this out not to argue, but to set the record straight. IMHO your assignment of blame is misplaced and your comments about poor design do not reflect how we arrived at the current state. -- Simon Riggs www.2ndQuadrant.com
On Sun, 2010-05-09 at 16:10 +0200, Florian Pflug wrote: > Adding pause/resume seems to introduce some non-trivial locking > problems, though. How would you handle a pause request if the recovery > process currently held a lock? (We are only talking about AccessExclusiveLocks here. No LWlocks are held across WAL records during replay) Just pause. There are no technical problem there. Perhaps a danger of unforeseen consequences, though doing that might also be desirable, who can say? -- Simon Riggs www.2ndQuadrant.com
Florian Pflug <fgp@phlo.org> writes: > The only remaining option is to continue applying WAL until you reach > a point where no locks are held, then pause. But from a user's POV > that is nearly indistinguishable from simply setting > hot_standby_conflict_winner to in the first place I think. Not really, the use case would be using the slave as a reporting server, you know you have say 4 hours of reporting queries during which you will pause the recovery. So it's ok for the pause command to take time. What I understand the boolean option would do is to force the user into choosing either high-availability or using the slave for other purposes too. The problem is in wanting both, and that's what HS was meant to solve. Having pause/resume allows for a mixed case usage which is simple to drive and understand, yet fails to provide adaptive behavior where queries are allowed to pause recovery implicitly for a while. In my mind, that would be a compromise we could reach for 9.0, but it seems introducing those admin functions now is to far a stretch. I've been failing to understand exactly why, only getting a generic answer I find unsatisfying here, because all the alternative paths being proposed, apart from "improve documentation", are more involved code wise. Regards, -- dim
On May 9, 2010, at 21:04 , Simon Riggs wrote: > On Sun, 2010-05-09 at 16:10 +0200, Florian Pflug wrote: > >> Adding pause/resume seems to introduce some non-trivial locking >> problems, though. How would you handle a pause request if the recovery >> process currently held a lock? > > (We are only talking about AccessExclusiveLocks here. No LWlocks are > held across WAL records during replay) > > Just pause. There are no technical problem there. > > Perhaps a danger of unforeseen consequences, though doing that might > also be desirable, who can say? No technical problems perhaps, but some usability ones, no? I assume people would pause recovery to prevent it from interfering with long-running reporting queries. Now, if those queriesmight block indefinitely if the pause request by chance was issued while the recovery process held an AccessExclusiveLock,then the pause *caused* exactly what it was supposed to prevent. Setting hot_standby_conflict_winnerto "queries" would at least have allowed the reporting queries to finish eventually. If AccessExclusiveLocks are taken out of the picture (they're supposed to be pretty rare on a production system anyway),setting hot_standby_conflict_winner to "queries" seems to act like a conditional pause request - recovery is pausedas soon as it gets in the way. In this setting, the real advantage of pause would be to prevent recovery from usingup all available IO bandwidth. This seems like a valid concern, but calls more for something like recovery_delay (similarto vacuum_delay) instead of pause(). best regards, Florian Pflug
On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Florian Pflug <fgp@phlo.org> writes: >> The only remaining option is to continue applying WAL until you reach >> a point where no locks are held, then pause. But from a user's POV >> that is nearly indistinguishable from simply setting >> hot_standby_conflict_winner to in the first place I think. > > Not really, the use case would be using the slave as a reporting server, > you know you have say 4 hours of reporting queries during which you will > pause the recovery. So it's ok for the pause command to take time. Seems like it could take FOREVER on a busy system. Surely that's not OK. The fact that Hot Standby has to take exclusive locks that can't be released until WAL replay has progressed to a certain point seems like a fairly serious wart. We had a discussion on another thread of how this can make the database fail to shut down properly, a problem we're not addressing because we're too busy arguing about max_standby_delay. In fact, if we knew how to pause replay without leaving random locks lying around, we could rearrange the whole smart shutdown sequence so that we paused replay FIRST and then waited for all backends to exit, but the consensus on the thread where we discussed this was that we did not know how to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sun, 2010-05-09 at 16:01 -0400, Robert Haas wrote: > The fact that Hot Standby has to take exclusive locks that can't > be released until WAL replay has progressed to a certain point seems > like a fairly serious wart. LOL And people lecture me about design. -- Simon Riggs www.2ndQuadrant.com
On May 9, 2010, at 22:01 , Robert Haas wrote: > On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: >> Florian Pflug <fgp@phlo.org> writes: >>> The only remaining option is to continue applying WAL until you reach >>> a point where no locks are held, then pause. But from a user's POV >>> that is nearly indistinguishable from simply setting >>> hot_standby_conflict_winner to in the first place I think. >> >> Not really, the use case would be using the slave as a reporting server, >> you know you have say 4 hours of reporting queries during which you will >> pause the recovery. So it's ok for the pause command to take time. > > Seems like it could take FOREVER on a busy system. Surely that's not > OK. The fact that Hot Standby has to take exclusive locks that can't > be released until WAL replay has progressed to a certain point seems > like a fairly serious wart. If this is a serious wart then it's not one of hot standby, but one of postgres proper. AccessExclusiveLocks (SELECT-blockinglocks that is, as opposed to UPDATE/DELETE-blocking locks) are never necessary from a correctness POV, they'reonly there for implementation reasons. Getting rid of them doesn't seem completely insurmountable either - just as multiple row versions remove the need to blockSELECTs dues to concurrent UPDATEs, multiple datafile versions could remove the need to block SELECTs due to concurrentALTERs. But people seem to live with them quite well, judged from the amount of work put into getting rid of them(zero). I therefore fail to see why they should pose a significant problem in HS setups. > We had a discussion on another thread of > how this can make the database fail to shut down properly, a problem > we're not addressing because we're too busy arguing about > max_standby_delay. In fact, if we knew how to pause replay without > leaving random locks lying around, we could rearrange the whole smart > shutdown sequence so that we paused replay FIRST and then waited for > all backends to exit, but the consensus on the thread where we > discussed this was that we did not know how to do that. Yeah, this was exactly my line of thought too. best regards, Florian Pflug
On Monday 10 May 2010 00:25:44 Florian Pflug wrote: > On May 9, 2010, at 22:01 , Robert Haas wrote: > > On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > >> Florian Pflug <fgp@phlo.org> writes: > >>> The only remaining option is to continue applying WAL until you reach > >>> a point where no locks are held, then pause. But from a user's POV > >>> that is nearly indistinguishable from simply setting > >>> hot_standby_conflict_winner to in the first place I think. > >> > >> Not really, the use case would be using the slave as a reporting server, > >> you know you have say 4 hours of reporting queries during which you will > >> pause the recovery. So it's ok for the pause command to take time. > > > > Seems like it could take FOREVER on a busy system. Surely that's not > > OK. The fact that Hot Standby has to take exclusive locks that can't > > be released until WAL replay has progressed to a certain point seems > > like a fairly serious wart. > > If this is a serious wart then it's not one of hot standby, but one of > postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as > opposed to UPDATE/DELETE-blocking locks) are never necessary from a > correctness POV, they're only there for implementation reasons. > > Getting rid of them doesn't seem completely insurmountable either - just as > multiple row versions remove the need to block SELECTs dues to concurrent > UPDATEs, multiple datafile versions could remove the need to block SELECTs > due to concurrent ALTERs. But people seem to live with them quite well, > judged from the amount of work put into getting rid of them (zero). I > therefore fail to see why they should pose a significant problem in HS > setups. The difference is that in HS you have to wait for a moment where *no exclusive lock at all* exist, possibly without contending for any of them, while on the master you might not even blocked by the existence of any of those locks. If you have two sessions which in overlapping transactions lock different tables exlusively you have no problem shutting the master down, but you will never reach a point where no exclusive lock is taken on the slave. Andres
On Sun, May 9, 2010 at 6:58 PM, Andres Freund <andres@anarazel.de> wrote: > On Monday 10 May 2010 00:25:44 Florian Pflug wrote: >> On May 9, 2010, at 22:01 , Robert Haas wrote: >> > On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine <dfontaine@hi-media.com> > wrote: >> >> Florian Pflug <fgp@phlo.org> writes: >> >>> The only remaining option is to continue applying WAL until you reach >> >>> a point where no locks are held, then pause. But from a user's POV >> >>> that is nearly indistinguishable from simply setting >> >>> hot_standby_conflict_winner to in the first place I think. >> >> >> >> Not really, the use case would be using the slave as a reporting server, >> >> you know you have say 4 hours of reporting queries during which you will >> >> pause the recovery. So it's ok for the pause command to take time. >> > >> > Seems like it could take FOREVER on a busy system. Surely that's not >> > OK. The fact that Hot Standby has to take exclusive locks that can't >> > be released until WAL replay has progressed to a certain point seems >> > like a fairly serious wart. >> >> If this is a serious wart then it's not one of hot standby, but one of >> postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as >> opposed to UPDATE/DELETE-blocking locks) are never necessary from a >> correctness POV, they're only there for implementation reasons. >> >> Getting rid of them doesn't seem completely insurmountable either - just as >> multiple row versions remove the need to block SELECTs dues to concurrent >> UPDATEs, multiple datafile versions could remove the need to block SELECTs >> due to concurrent ALTERs. But people seem to live with them quite well, >> judged from the amount of work put into getting rid of them (zero). I >> therefore fail to see why they should pose a significant problem in HS >> setups. > The difference is that in HS you have to wait for a moment where *no exclusive > lock at all* exist, possibly without contending for any of them, while on the > master you might not even blocked by the existence of any of those locks. > > If you have two sessions which in overlapping transactions lock different > tables exlusively you have no problem shutting the master down, but you will > never reach a point where no exclusive lock is taken on the slave. A possible solution to this in the shutdown case is to kill anyone waiting on a lock held by the startup process at the same time we kill the startup process, and to kill anyone who subsequently waits for such a lock as soon as they attempt to take it. I'm not sure if this would also make sense in the pause case. Another possible solution would be to try to figure out if there's a way to delay application of WAL that requires the taking of AELs to the point where we could apply it all at once. That might not be feasible, though, or only in some cases, and it's certainly 9.1 material (at least) in any case. Anyway, this is all a little off-topic. We need to get back to arguing about how best to cut the legs out from under a feature that's been in the tree for six months but Tom didn't get around to looking at until last week. I'll restate my position: now that I understand what the issues are (I think), the feature as currently implemented seems pretty wonky, but cutting it down to a boolean seems like an exercise in excessive pessimism about our ability to predict future development directions, as well as possibly quite inconvenient for people attempting to use Hot Standby. Therefore I think we should adopt Tom's original proposal (with +1 also from Stephen Frost), but that doesn't seem likely to fly because, on the one hand, we have Tom himself arguing (along with Bruce and possibly Heikki) that we should whack it down all the way to a boolean; and on the other hand Simon and Greg Smith and I think also Andres Freund and Kevin Grittner arguing that the original feature is OK as-is. Other people who weighed in include Stefan Kaltenbrunner (who opined that Tom had a legitimate complaint about the current design but didn't vote for a specific resolution), Greg Sabino Mullane (who pointed out that SOME of the issues that Tom raised could be solved with proper time synchronization), Josh Drake (who thought requiring NTP to be working was a bad idea, and therefore presumably favors changing something), Josh Berkus (who changed his vote at least once and whose priority seems to have to do with releasing before the turn of the century than with the actual technical option we select, apologies if I'm misreading his emails), Greg Stark (who seems to think that a boolean will be bad news but didn't specifically vote for another option), Dimitri Fontaine (who wants a boolean plus pause/resume functions, or maybe a plugin facility of some kind), Rob Wultsch (who doesn't ever want to kill queries and therefore would be happy with a boolean), Yeb Havinga (who never wants to stall recovery and therefore would also be happy with a boolean), and Florian Pflug (who points out that pause/resume is actually a nontrivial feature). Apologies if I've left anyone out or misrepresented their position. Overall I would say opinion is about evenly split between: - leave it as-is - make it a Boolean - change it in some way but to something more expressive than a Boolean I can't presume to extract a consensus from that; I don't think there is one. You could say "the majority of people want to change something" and that would be true; you could also say "the majority of people don't want a Boolean" and that would also be true. IF we adopt "leave it as-is", then we need to document that you will need to both run ntp and run some sort of heartbeat process on the master to make sure that at least a small amount of WAL keeps getting generated; or else you'll have massive query cancellations. IF we decide to make it a Boolean, then we need to document that you have to choose between the possibility of recovery falling arbitrarily behind as a result of even one query holding an exclusive lock, or alternatively instantaneously canceling queries that conflict, however briefly, with replay. IF we adopt Tom's original proposal, then we'll need to document that the timeout given is per-lock-wait, and therefore if the lock timeout is not zero and there are many lock waits the standby may fall far behind and have difficulty catching up.IF we decide to do something else, then I don't know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sun, 2010-05-09 at 20:56 -0400, Robert Haas wrote: > >> > Seems like it could take FOREVER on a busy system. Surely that's not > >> > OK. The fact that Hot Standby has to take exclusive locks that can't > >> > be released until WAL replay has progressed to a certain point seems > >> > like a fairly serious wart. > >> > >> If this is a serious wart then it's not one of hot standby, but one of > >> postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as > >> opposed to UPDATE/DELETE-blocking locks) are never necessary from a > >> correctness POV, they're only there for implementation reasons. > >> > >> Getting rid of them doesn't seem completely insurmountable either - just as > >> multiple row versions remove the need to block SELECTs dues to concurrent > >> UPDATEs, multiple datafile versions could remove the need to block SELECTs > >> due to concurrent ALTERs. But people seem to live with them quite well, > >> judged from the amount of work put into getting rid of them (zero). I > >> therefore fail to see why they should pose a significant problem in HS > >> setups. > > The difference is that in HS you have to wait for a moment where *no exclusive > > lock at all* exist, possibly without contending for any of them, while on the > > master you might not even blocked by the existence of any of those locks. > > > > If you have two sessions which in overlapping transactions lock different > > tables exlusively you have no problem shutting the master down, but you will > > never reach a point where no exclusive lock is taken on the slave. > > A possible solution to this in the shutdown case is to kill anyone > waiting on a lock held by the startup process at the same time we kill > the startup process, and to kill anyone who subsequently waits for > such a lock as soon as they attempt to take it. I already explained that killing the startup process first is a bad idea for many reasons when shutdown was discussed. Can't remember who added the new standby shutdown code recently, but it sounds like their design was pretty poor if it didn't include shutting down properly with HS. I hope they fix the bug they have introduced. HS was never designed to work that way, so there is no flaw there; it certainly worked when committed. > I'm not sure if this > would also make sense in the pause case. Not sure why pausing replay would make any difference at all. Being between one WAL record and the next is a valid and normal state that exists many thousands of times per second. If making that state longer would cause problems we would already have seen any issues. There are none, it will work fine. > Another possible solution would be to try to figure out if there's a > way to delay application of WAL that requires the taking of AELs to > the point where we could apply it all at once. That might not be > feasible, though, or only in some cases, and it's certainly 9.1 > material (at least) in any case. Locks usually protect users from accessing a table while its being clustered or dropped or something like that. Locks are not bad. They are also used by some developers to specifically serialize access to an object. AccessExclusiveLocks are rare in normal running and not to be avoided when they do exist. HS correctly supports locking, as and when such locks are made on the master. -- Simon Riggs www.2ndQuadrant.com
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, May 9, 2010 at 6:58 PM, Andres Freund <andres@anarazel.de> wrote: >> The difference is that in HS you have to wait for a moment where *no exclusive >> lock at all* exist, possibly without contending for any of them, while on the >> master you might not even blocked by the existence of any of those locks. >> >> If you have two sessions which in overlapping transactions lock different >> tables exlusively you have no problem shutting the master down, but you will >> never reach a point where no exclusive lock is taken on the slave. > > A possible solution to this in the shutdown case is to kill anyone > waiting on a lock held by the startup process at the same time we kill > the startup process, and to kill anyone who subsequently waits for > such a lock as soon as they attempt to take it. I'm not sure if this > would also make sense in the pause case. Well, wait, I'm getting lost here. It seems to me that no query on the slave is granted to take AEL, not matter what. The only case is a query waiting for the replay to release its locks. The only consequence of pause not waiting for any lock to get released from the replay is that those backends will be, well, paused. But that applies the same to any backend started after we pause. Waiting for replay to release all its locks before to pause would mean that there's a possibility that the activity on the master is such that you never reach a pause in the WAL stream. Let's assume we want any new code we throw in at this stage to be a magic wand making every use happy at once. So we'd need a pause function taking either 1 or 2 arguments, first is to say we pause now even if we know the replay is holding some locks that might pause the reporting queries too, the other is to wait until the locks are not held anymore, with a timeout (default 1min?). Ok, that's designing the API we're missing, and we should not be in the process of doing any design at this stage. But we are. > [good summary of current positions] > I can't presume to extract a consensus from that; I don't think there > is one. All we know for sure is that Tom does not want to release as-is, and he rightfully insists on several objectives as far as the editing is concerned:- no addition of code we might want to throw away later- avoid having to deprecate released behavior, it's toohard- minimal change set, possibly with no new features. One more, pausing the replay is *already* in the code base, it's exactly what happens under the hood if you favor queries rather than replay, to the point I don't understand why the pause design needs to happen now. We're only talking about having an *explicit* version of it. Regards, -- dim I too am growing tired of insisting this much. I only continue because I really can't get to understand why-o-why considering a new API over existing feature is not possible at this stage. I'm hitting my head on the wal, so to say…
Robert Haas wrote: > On Sun, May 9, 2010 at 6:58 PM, Andres Freund <andres@anarazel.de> wrote: >> On Monday 10 May 2010 00:25:44 Florian Pflug wrote: >>> On May 9, 2010, at 22:01 , Robert Haas wrote: >>>> On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine <dfontaine@hi-media.com> >> wrote: >>>> Seems like it could take FOREVER on a busy system. Surely that's not >>>> OK. The fact that Hot Standby has to take exclusive locks that can't >>>> be released until WAL replay has progressed to a certain point seems >>>> like a fairly serious wart. >>> If this is a serious wart then it's not one of hot standby, but one of >>> postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as >>> opposed to UPDATE/DELETE-blocking locks) are never necessary from a >>> correctness POV, they're only there for implementation reasons. >>> >>> Getting rid of them doesn't seem completely insurmountable either - just as >>> multiple row versions remove the need to block SELECTs dues to concurrent >>> UPDATEs, multiple datafile versions could remove the need to block SELECTs >>> due to concurrent ALTERs. But people seem to live with them quite well, >>> judged from the amount of work put into getting rid of them (zero). I >>> therefore fail to see why they should pose a significant problem in HS >>> setups. >> The difference is that in HS you have to wait for a moment where *no exclusive >> lock at all* exist, possibly without contending for any of them, while on the >> master you might not even blocked by the existence of any of those locks. >> >> If you have two sessions which in overlapping transactions lock different >> tables exlusively you have no problem shutting the master down, but you will >> never reach a point where no exclusive lock is taken on the slave. > > A possible solution to this in the shutdown case is to kill anyone > waiting on a lock held by the startup process at the same time we kill > the startup process, and to kill anyone who subsequently waits for > such a lock as soon as they attempt to take it. If you're not going to apply any more WAL records before shutdown, you could also just release all the AccessExclusiveLocks held by the startup process. Whatever the transaction was doing with the locked relation, if we're not going to replay any more WAL records before shutdown, we will not see the transaction committing or doing anything else with the relation, so we should be safe. Whatever state the data on disk is in, it must be valid, or we would have a problem with crash recovery recovering up to this WAL record and then starting up too. I'm not 100% clear if that reasoning applies to AccessExclusiveLocks taken explicitly with LOCK TABLE. It's not clear what the application would use the lock for. Nevertheless, maybe killing the transactions that wait for the locks would be more intuitive anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Robert Haas wrote: > On Thu, May 6, 2010 at 2:47 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> Now that I've realized what the real problem is with max_standby_delay >>> (namely, that inactivity on the master can use up the delay), I think >>> we should do what Tom originally suggested here. It's not as good as >>> a really working max_standby_delay, but we're not going to have that >>> for 9.0, and it's clearly better than a boolean. >> I guess I'm not clear on how what Tom proposed is fundamentally >> different from max_standby_delay = -1. If there's enough concurrent >> queries, recovery would never catch up. > > If your workload is that the standby server is getting pounded with > queries like crazy, then it's probably not that different: it will > fall progressively further behind. But I suspect many people will set > up standby servers where most of the activity happens on the primary, > but they run some reporting queries on the standby. If you expect > your reporting queries to finish in <10s, you could set the max delay > to say 60s. In the event that something gets wedged, recovery will > eventually kill it and move on rather than just getting stuck forever. > If the volume of queries is known not to be too high, it's reasonable > to expect that a few good whacks will be enough to get things back on > track. Yeah, I could live with that. A problem with using the name "max_standby_delay" for Tom's suggestion is that it sounds like a hard limit, which it isn't. But if we name it something like: # -1 = no timeout # 0 = kill conflicting queries immediately # > 0 wait for N seconds, then kill query standby_conflict_timeout = -1 it's more clear that the setting is a timeout for each *conflict*, and it's less surprising that the standby can fall indefinitely behind in the worst case. If we name the setting along those lines, I could live with that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On May 10, 2010, at 11:43 , Heikki Linnakangas wrote: > If you're not going to apply any more WAL records before shutdown, you > could also just release all the AccessExclusiveLocks held by the startup > process. Whatever the transaction was doing with the locked relation, if > we're not going to replay any more WAL records before shutdown, we will > not see the transaction committing or doing anything else with the > relation, so we should be safe. Whatever state the data on disk is in, > it must be valid, or we would have a problem with crash recovery > recovering up to this WAL record and then starting up too. Sounds plausible. But wouldn't this imply that HS could *always* postpone the acquisition of an AccessExclusiveLocks untilright before the corresponding commit record is replayed? If fail to see a case where this would fail, yet recoveryin case of an intermediate crash would be correct. best regards, Florian Pflug
* Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [100510 06:03]: > A problem with using the name "max_standby_delay" for Tom's suggestion > is that it sounds like a hard limit, which it isn't. But if we name it > something like: I'ld still rather an "if your killing something, make sure you kill enough to get all the way current" behaviour, but that's just me.... I'm want to run my standbys in a always current mode... But if I decide to play with a lagged HR, I really want to make sure there is some mechanism to cap the lag, and the "cap" is something I can understand and use to make a reasonable estimate as to when data I know is live on the primary will be seen on the standby... bonus points if it works similarly for archive recovery ;-) a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Mon, May 10, 2010 at 2:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I already explained that killing the startup process first is a bad idea > for many reasons when shutdown was discussed. Can't remember who added > the new standby shutdown code recently, but it sounds like their design > was pretty poor if it didn't include shutting down properly with HS. I > hope they fix the bug they have introduced. HS was never designed to > work that way, so there is no flaw there; it certainly worked when > committed. The patch was written by Fujii Masao and committed, after review, by me. Prior to that patch, smart shutdown never worked; now it works, or so I believe, unless recovery is stalled holding a lock upon which a regular back-end is blocking. Clearly that is both better and not all that good. If you have any ideas to improve the situation further, I'm all ears. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, May 10, 2010 at 6:03 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Yeah, I could live with that. > > A problem with using the name "max_standby_delay" for Tom's suggestion > is that it sounds like a hard limit, which it isn't. But if we name it > something like: > > # -1 = no timeout > # 0 = kill conflicting queries immediately > # > 0 wait for N seconds, then kill query > standby_conflict_timeout = -1 > > it's more clear that the setting is a timeout for each *conflict*, and > it's less surprising that the standby can fall indefinitely behind in > the worst case. If we name the setting along those lines, I could live > with that. Yeah, if we do it that way, +1 for changing the name, and your suggestion seems good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, May 10, 2010 at 6:13 AM, Florian Pflug <fgp@phlo.org> wrote: > On May 10, 2010, at 11:43 , Heikki Linnakangas wrote: >> If you're not going to apply any more WAL records before shutdown, you >> could also just release all the AccessExclusiveLocks held by the startup >> process. Whatever the transaction was doing with the locked relation, if >> we're not going to replay any more WAL records before shutdown, we will >> not see the transaction committing or doing anything else with the >> relation, so we should be safe. Whatever state the data on disk is in, >> it must be valid, or we would have a problem with crash recovery >> recovering up to this WAL record and then starting up too. > > Sounds plausible. But wouldn't this imply that HS could *always* postpone the acquisition of an AccessExclusiveLocks untilright before the corresponding commit record is replayed? If fail to see a case where this would fail, yet recoveryin case of an intermediate crash would be correct. Yeah, I'd like to understand this, too. I don't have a clear understanding of when HS needs to take locks here in the first place. [removing Josh Berkus's persistently bouncing email from the CC line] -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Florian Pflug wrote: > On May 10, 2010, at 11:43 , Heikki Linnakangas wrote: >> If you're not going to apply any more WAL records before shutdown, you >> could also just release all the AccessExclusiveLocks held by the startup >> process. Whatever the transaction was doing with the locked relation, if >> we're not going to replay any more WAL records before shutdown, we will >> not see the transaction committing or doing anything else with the >> relation, so we should be safe. Whatever state the data on disk is in, >> it must be valid, or we would have a problem with crash recovery >> recovering up to this WAL record and then starting up too. > > Sounds plausible. But wouldn't this imply that HS could *always* postpone the acquisition of an AccessExclusiveLocks untilright before the corresponding commit record is replayed? If fail to see a case where this would fail, yet recoveryin case of an intermediate crash would be correct. I guess it could in some situations, but for example the AccessExclusiveLock taken at the end of lazy vacuum to truncate the relation must be held during the truncation, or concurrent readers will get upset. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Monday 10 May 2010 14:00:45 Heikki Linnakangas wrote: > Florian Pflug wrote: > > On May 10, 2010, at 11:43 , Heikki Linnakangas wrote: > >> If you're not going to apply any more WAL records before shutdown, you > >> could also just release all the AccessExclusiveLocks held by the startup > >> process. Whatever the transaction was doing with the locked relation, if > >> we're not going to replay any more WAL records before shutdown, we will > >> not see the transaction committing or doing anything else with the > >> relation, so we should be safe. Whatever state the data on disk is in, > >> it must be valid, or we would have a problem with crash recovery > >> recovering up to this WAL record and then starting up too. > > > > Sounds plausible. But wouldn't this imply that HS could *always* postpone > > the acquisition of an AccessExclusiveLocks until right before the > > corresponding commit record is replayed? If fail to see a case where > > this would fail, yet recovery in case of an intermediate crash would be > > correct. > > I guess it could in some situations, but for example the > AccessExclusiveLock taken at the end of lazy vacuum to truncate the > relation must be held during the truncation, or concurrent readers will > get upset. Actually all the locks that do not need to be taken on the slave would not need to be an ACCESS EXCLUSIVE but a EXCLUSIVE on the master, right? That should be "fixed" on the master, not hacked up on the slave and is by far out of scope of 9.0. Thats an area where I definitely would like to improve pg in the future... Andres
Simon Riggs wrote: > Bruce has used the word crippleware for the current state. Raising a > problem and then blocking solutions is the best way I know to cripple a > release. It should be clear that I've done my best to avoid this FYI, it was Robert Haas who used the term "crippleware" to describe a boolean value for max_standby_delay, and I was just repeating his term, and disputing it would be crippleware. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Robert Haas wrote: > Wultsch (who doesn't ever want to kill queries and therefore would be > happy with a boolean), Yeb Havinga (who never wants to stall recovery > and therefore would also be happy with a boolean), and Florian Pflug > (who points out that pause/resume is actually a nontrivial feature). > Apologies if I've left anyone out or misrepresented their position. > > Overall I would say opinion is about evenly split between: > > - leave it as-is > - make it a Boolean > - change it in some way but to something more expressive than a Boolean > > I can't presume to extract a consensus from that; I don't think there > is one. You could say "the majority of people want to change > something" and that would be true; you could also say "the majority of > people don't want a Boolean" and that would also be true. Yep, this is where we are. Discussion had stopped, so it seemed like time for a decision, and with no one agreeing on what to do, feature removal seemed like the best approach. Suggesting we will fix it later in beta is not a solution. Now, if everyone agrees we should do X, and X in simple, lets do X, but I am stil not seeing that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Mon, May 10, 2010 at 6:03 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> On Thu, May 6, 2010 at 2:47 PM, Josh Berkus <josh@agliodbs.com> wrote: >>>> Now that I've realized what the real problem is with max_standby_delay >>>> (namely, that inactivity on the master can use up the delay), I think >>>> we should do what Tom originally suggested here. It's not as good as >>>> a really working max_standby_delay, but we're not going to have that >>>> for 9.0, and it's clearly better than a boolean. >>> I guess I'm not clear on how what Tom proposed is fundamentally >>> different from max_standby_delay = -1. If there's enough concurrent >>> queries, recovery would never catch up. >> >> If your workload is that the standby server is getting pounded with >> queries like crazy, then it's probably not that different: it will >> fall progressively further behind. But I suspect many people will set >> up standby servers where most of the activity happens on the primary, >> but they run some reporting queries on the standby. If you expect >> your reporting queries to finish in <10s, you could set the max delay >> to say 60s. In the event that something gets wedged, recovery will >> eventually kill it and move on rather than just getting stuck forever. >> If the volume of queries is known not to be too high, it's reasonable >> to expect that a few good whacks will be enough to get things back on >> track. > > Yeah, I could live with that. > > A problem with using the name "max_standby_delay" for Tom's suggestion > is that it sounds like a hard limit, which it isn't. But if we name it > something like: > > # -1 = no timeout > # 0 = kill conflicting queries immediately > # > 0 wait for N seconds, then kill query > standby_conflict_timeout = -1 > > it's more clear that the setting is a timeout for each *conflict*, and > it's less surprising that the standby can fall indefinitely behind in > the worst case. If we name the setting along those lines, I could live > with that. +1 from the peanut gallery. -- Mike Rylander| VP, Research and Design| Equinox Software, Inc. / The Evergreen Experts| phone: 1-877-OPEN-ILS (673-6457)|email: miker@esilibrary.com| web: http://www.esilibrary.com
Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> Overall I would say opinion is about evenly split between: >> >> - leave it as-is >> - make it a Boolean >> - change it in some way but to something more expressive than a >> Boolean I think a boolean would limit the environments in which HS would be useful. Personally, I think how far the replica is behind the source is a more useful metric, even with anomalies on the transition from idle to active; but a blocking duration would be much better than no finer control than the boolean. So my "instant runoff second choice" would be for the block duration knob. > time for a decision, and with no one agreeing on what to do, > feature removal seemed like the best approach. I keep wondering at the assertion that once a GUC is present (especially a tuning GUC like this) that we're stuck with it. I know that's true of SQL code constructs, but postgresql.conf files? How about redirect_stderr, max_fsm_*, sort_mem, etc.? This argument seems tenuous. > Suggesting we will fix it later in beta is not a solution. I'm with you there, 100% -Kevin
* Aidan Van Dyk (aidan@highrise.ca) wrote: > * Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [100510 06:03]: > > A problem with using the name "max_standby_delay" for Tom's suggestion > > is that it sounds like a hard limit, which it isn't. But if we name it > > something like: > > I'ld still rather an "if your killing something, make sure you kill > enough to get all the way current" behaviour, but that's just me.... I agree with that comment, and it's more like what max_standby_delay was. That's what I had thought Tom was proposing initially, since it makes a heck of alot more sense to me than "just keep waiting, just keep waiting..". Now, if it's possible to have things queue up behind the recovery process, such that the recovery process will only wait up to timeout * # of locks held when recovery started, that might be alright, but that's not the impression I've gotten about how this will work. Of course, I also want to be able to have a Nagios hook that checks how far behind the slave has gotten, and a way to tell the slave "oook, you're too far behind, just forcibly catch up right *now*". If I could use reload to change max_standby_delay (or whatever) and I can figure out how long the delay is (even if I have to update a table on the master and then see what it says on the slave..), I'd be happy. That being said, I do think it makes more sense to wait until we've got a conflict to start the timer, and I rather like avoiding the uncertainty of time sync between master and slave by using WAL arrival time on the slave. Thanks, Stephen
Kevin Grittner wrote: > Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > > >> Overall I would say opinion is about evenly split between: > >> > >> - leave it as-is > >> - make it a Boolean > >> - change it in some way but to something more expressive than a > >> Boolean > > I think a boolean would limit the environments in which HS would be > useful. Personally, I think how far the replica is behind the > source is a more useful metric, even with anomalies on the > transition from idle to active; but a blocking duration would be > much better than no finer control than the boolean. So my "instant > runoff second choice" would be for the block duration knob. > > > time for a decision, and with no one agreeing on what to do, > > feature removal seemed like the best approach. > > I keep wondering at the assertion that once a GUC is present > (especially a tuning GUC like this) that we're stuck with it. I > know that's true of SQL code constructs, but postgresql.conf files? > How about redirect_stderr, max_fsm_*, sort_mem, etc.? This argument > seems tenuous. You are right that we are much more flexible about changing administrative configuration parameters (like this one) than SQL. In the past, we even renamed logging parameters to be more consistent, and I think that proves the bar is quite low for GUC administrative parameter change. :-) The concern about 'max_standby_delay' is that it controls a lot of new code and affects the behavior of HS/SR in ways that might cause a poor user experience, expecially for non-expert users. I admit that expert users can use the setting, but we are coding for a general user base, and we might have to field many questions about 'max_standby_delay' from general users that will make us look bad. "The setting is total useless" is something we have heard about other partial solutions we have released in the past. We try to avoid that. ;-) Labeling something "experimental" also makes our code look sloppy. And if we decide the problem is unsolvable using this approach, we should remove it now rather than later. We don't like to carry around a wart for a small segment of our userbase. I realize many of you have not been around to see some of our less-than-perfect solutions and to see the pain they cause. Once something gets it, we have to fight to remove it. In fact, there is no way we would add 'max_standby_delay' into our codebase now, knowing its limitations, but people are having to fight hard for its removal, if necessary. Now that discussion has restarted again, let's keep going to see if can reach some kind of simple solution. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Mon, May 10, 2010 at 5:20 PM, Bruce Momjian <bruce@momjian.us> wrote: > You are right that we are much more flexible about changing > administrative configuration parameters (like this one) than SQL. In the > past, we even renamed logging parameters to be more consistent, and I > think that proves the bar is quite low for GUC administrative parameter > change. :-) > > The concern about 'max_standby_delay' is that it controls a lot of new > code and affects the behavior of HS/SR in ways that might cause a poor > user experience, expecially for non-expert users. I would like to propose that we do the following: 1) Replace max_standby_delay with a boolean as per heikki's suggestion 2) Add an explicitly experimental option like max_standby_delay or recovery_conflict_timeout which is only effective if you've chosen recovery_conflict="pause recovery" option and is explicitly documented as being scheduled to be replaced with a more complete system in future versions. My thinking is that when we do replace max_standby_delay we would keep the recovery_conflict parameter with the same semantics. It's just the additional experimental option which would change. -- greg
> 1) Replace max_standby_delay with a boolean as per heikki's suggestion > > 2) Add an explicitly experimental option like max_standby_delay or > recovery_conflict_timeout which is only effective if you've chosen > recovery_conflict="pause recovery" > option and is explicitly documented as being scheduled to be replaced > with a more complete system in future versions. +1 As far as I can tell, the current delay *works*. It just doesn't necessarily work the way most people expect it to to work. Kind of like, hmmm, shared_buffers? Or effective_cache_size? Or effective_io_concurrency? And I still think that having this kind of a delay option will give us invaluable use feedback on how the option *should* work in 9.1, which we won't get if we don't have an option. I think we will be overhauling it for 9.1, but I don't think that overhaul will benefit from a lack of data. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On May 10, 2010, at 17:39 , Kevin Grittner wrote: > Bruce Momjian <bruce@momjian.us> wrote: >> Robert Haas wrote: > >>> Overall I would say opinion is about evenly split between: >>> >>> - leave it as-is >>> - make it a Boolean >>> - change it in some way but to something more expressive than a >>> Boolean > > I think a boolean would limit the environments in which HS would be > useful. Personally, I think how far the replica is behind the > source is a more useful metric, even with anomalies on the > transition from idle to active; but a blocking duration would be > much better than no finer control than the boolean. So my "instant > runoff second choice" would be for the block duration knob. You could always toggle that boolean automatically, based on some measurement of the replication lag (Assuming the booleanwould be settable at runtime). That'd give you much more flexibility than any built-on knob could provide, and evenmore so than a built-in knob with known deficiencies. My preference is hence to make it a boolean, but in a way that allows more advanced behavior to be implemented on top ofit. In the simplest case by allowing the boolean to be flipped at runtime and ensuring that the system reacts in a saneway. best regards, Florian Pflug
On Mon, May 10, 2010 at 3:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I already explained that killing the startup process first is a bad idea > for many reasons when shutdown was discussed. Can't remember who added > the new standby shutdown code recently, but it sounds like their design > was pretty poor if it didn't include shutting down properly with HS. I > hope they fix the bug they have introduced. HS was never designed to > work that way, so there is no flaw there; it certainly worked when > committed. New smart shutdown during recovery doesn't kill the startup process until all of the read only backends have gone away. So it works fine with HS. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, 2010-05-11 at 14:01 +0900, Fujii Masao wrote: > On Mon, May 10, 2010 at 3:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > I already explained that killing the startup process first is a bad idea > > for many reasons when shutdown was discussed. Can't remember who added > > the new standby shutdown code recently, but it sounds like their design > > was pretty poor if it didn't include shutting down properly with HS. I > > hope they fix the bug they have introduced. HS was never designed to > > work that way, so there is no flaw there; it certainly worked when > > committed. > > New smart shutdown during recovery doesn't kill the startup process until > all of the read only backends have gone away. So it works fine with HS. Yes, I thought some more about what Robert said. HS works identically to normal running in this regard, so there's no hint of a bug or design flaw on that for either of us to worry about. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 12, 2010 at 2:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2010-05-11 at 14:01 +0900, Fujii Masao wrote: >> On Mon, May 10, 2010 at 3:27 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > I already explained that killing the startup process first is a bad idea >> > for many reasons when shutdown was discussed. Can't remember who added >> > the new standby shutdown code recently, but it sounds like their design >> > was pretty poor if it didn't include shutting down properly with HS. I >> > hope they fix the bug they have introduced. HS was never designed to >> > work that way, so there is no flaw there; it certainly worked when >> > committed. >> >> New smart shutdown during recovery doesn't kill the startup process until >> all of the read only backends have gone away. So it works fine with HS. > > Yes, I thought some more about what Robert said. HS works identically to > normal running in this regard, so there's no hint of a bug or design > flaw on that for either of us to worry about. I'm not sure what to make of this. Sometimes not shutting down doesn't sound like a feature to me. http://archives.postgresql.org/pgsql-hackers/2010-05/msg00098.php http://archives.postgresql.org/pgsql-hackers/2010-05/msg00103.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > I'm not sure what to make of this. Sometimes not shutting down > doesn't sound like a feature to me. It acts exactly the same in recovery as in normal running. It is not a special feature of recovery at all, bug or otherwise. You may think its a strange feature generally and I would agree. I would welcome you changing that in 9.1+, as long as your change works in both recovery and normal running. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 12, 2010 at 12:26 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > >> I'm not sure what to make of this. Sometimes not shutting down >> doesn't sound like a feature to me. > > It acts exactly the same in recovery as in normal running. It is not a > special feature of recovery at all, bug or otherwise. I admit I've sometimes been surprised that smart shutdown was waiting when I didn't expect it to. It would be good to give the shutdown more feedback. If it explicitly shows "Waiting for n sessions with active transactions to commit" or "Waiting for n sessions to disconnect" then the user would at least understand why it was waiting and what would be necessary to get it to continue. -- greg
On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > >> I'm not sure what to make of this. Sometimes not shutting down >> doesn't sound like a feature to me. > > It acts exactly the same in recovery as in normal running. It is not a > special feature of recovery at all, bug or otherwise. Simon, that doesn't make any sense. We are talking about a backend getting stuck forever on an exclusive lock that is held by the startup process and which will never be released (for example, because the master has shut down and no more WAL can be obtained for replay). The startup process does not hold locks in normal operation. There are other things we might want to change about the shutdown behavior (for example, switching from smart to fast automatically after N seconds) which could apply to both the primary and the standby and which might also be workarounds for this problem, but this particular issue is specific to Hot Standby mode and pretending otherwise is just sticking your head in the sand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: > On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > > > >> I'm not sure what to make of this. Sometimes not shutting down > >> doesn't sound like a feature to me. > > > > It acts exactly the same in recovery as in normal running. It is not a > > special feature of recovery at all, bug or otherwise. > > Simon, that doesn't make any sense. We are talking about a backend > getting stuck forever on an exclusive lock that is held by the startup > process and which will never be released (for example, because the > master has shut down and no more WAL can be obtained for replay). The > startup process does not hold locks in normal operation. When I test it, startup process holding a lock does not prevent shutdown of a standby. I'd be happy to see your test case showing a bug exists and that the behaviour differs from normal running. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: >> On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: >>> >>>> I'm not sure what to make of this. Sometimes not shutting down >>>> doesn't sound like a feature to me. >>> It acts exactly the same in recovery as in normal running. It is not a >>> special feature of recovery at all, bug or otherwise. >> Simon, that doesn't make any sense. We are talking about a backend >> getting stuck forever on an exclusive lock that is held by the startup >> process and which will never be released (for example, because the >> master has shut down and no more WAL can be obtained for replay). The >> startup process does not hold locks in normal operation. > > When I test it, startup process holding a lock does not prevent shutdown > of a standby. > > I'd be happy to see your test case showing a bug exists and that the > behaviour differs from normal running. In my testing the postmaster simply does not shut down even with no clients connected any more once in a while - most of the time it works just fine but in like 1 out of 10 cases it get's stuck - my testcase (as detailed in the related thread) is simply doing an interval load on the master (pgbench -T 120 && sleep 30 && pgbench -T 120 - rinse and repeat as needed) and pgbench -S && pg_ctl restart && pgbench -S in a lop on the standby. once in a while the standby will simply not shut down (forever - not only by eceeding the default timeout of pgctl which seems to get triggered much more often on the standby than on the master - have not looked into that yet in detail) Stefan
On Wed, 2010-05-12 at 16:03 +0200, Stefan Kaltenbrunner wrote: > Simon Riggs wrote: > > On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: > >> On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >>> On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > >>> > >>>> I'm not sure what to make of this. Sometimes not shutting down > >>>> doesn't sound like a feature to me. > >>> It acts exactly the same in recovery as in normal running. It is not a > >>> special feature of recovery at all, bug or otherwise. > >> Simon, that doesn't make any sense. We are talking about a backend > >> getting stuck forever on an exclusive lock that is held by the startup > >> process and which will never be released (for example, because the > >> master has shut down and no more WAL can be obtained for replay). The > >> startup process does not hold locks in normal operation. > > > > When I test it, startup process holding a lock does not prevent shutdown > > of a standby. > > > > I'd be happy to see your test case showing a bug exists and that the > > behaviour differs from normal running. > > In my testing the postmaster simply does not shut down even with no > clients connected any more once in a while - most of the time it works > just fine but in like 1 out of 10 cases it get's stuck - my testcase (as > detailed in the related thread) is simply doing an interval load on the > master (pgbench -T 120 && sleep 30 && pgbench -T 120 - rinse and repeat > as needed) and pgbench -S && pg_ctl restart && pgbench -S in a lop on > the standby. once in a while the standby will simply not shut down > (forever - not only by eceeding the default timeout of pgctl which seems > to get triggered much more often on the standby than on the master - > have not looked into that yet in detail) If you could recreate that on a server in debug mode we can see what's happening. If you can attach to the server and get a back trace that would help. I've not seen that behaviour at all during testing and if the issue is sporadic its not likely to help much trying to recreate myself. This could be an issue with SR, or an issue with the shutdown code itself. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote: > On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: > > On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: > > > > > >> I'm not sure what to make of this. Sometimes not shutting down > > >> doesn't sound like a feature to me. > > > > > > It acts exactly the same in recovery as in normal running. It is not a > > > special feature of recovery at all, bug or otherwise. > > > > Simon, that doesn't make any sense. We are talking about a backend > > getting stuck forever on an exclusive lock that is held by the startup > > process and which will never be released (for example, because the > > master has shut down and no more WAL can be obtained for replay). The > > startup process does not hold locks in normal operation. > > When I test it, startup process holding a lock does not prevent shutdown > of a standby. > > I'd be happy to see your test case showing a bug exists and that the > behaviour differs from normal running. Let me put this differently: I accept that Stefan has reported a problem. Neither Tom nor myself can reproduce the problem. I've re-run Stefan's test case and restarted the server more than 400 times now without any issue. I re-read your post where you gave what you yourself called "uninformed speculation". There's no real polite way to say it, but yes your speculation does appear to be uninformed, since it is incorrect. Reasons would be not least that Stefan's tests don't actually send any locks to the standby anyway (!), but even if they did your speculation as to the cause is still all wrong, as explained. There is no evidence to link this behaviour with HS, as yet, and you should be considering the possibility the problem lies elsewhere, especially since it could be code you committed that is at fault. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 12, 2010 at 11:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote: >> On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: >> > On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > > On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: >> > > >> > >> I'm not sure what to make of this. Sometimes not shutting down >> > >> doesn't sound like a feature to me. >> > > >> > > It acts exactly the same in recovery as in normal running. It is not a >> > > special feature of recovery at all, bug or otherwise. >> > >> > Simon, that doesn't make any sense. We are talking about a backend >> > getting stuck forever on an exclusive lock that is held by the startup >> > process and which will never be released (for example, because the >> > master has shut down and no more WAL can be obtained for replay). The >> > startup process does not hold locks in normal operation. >> >> When I test it, startup process holding a lock does not prevent shutdown >> of a standby. >> >> I'd be happy to see your test case showing a bug exists and that the >> behaviour differs from normal running. > > Let me put this differently: I accept that Stefan has reported a > problem. Neither Tom nor myself can reproduce the problem. I've re-run > Stefan's test case and restarted the server more than 400 times now > without any issue. OK, I'm glad to hear you've been testing this. I wasn't aware of that. > I re-read your post where you gave what you yourself called "uninformed > speculation". There's no real polite way to say it, but yes your > speculation does appear to be uninformed, since it is incorrect. Reasons > would be not least that Stefan's tests don't actually send any locks to > the standby anyway (!), Hmm. Well, assuming you're correct, that does seem to be a, uh, slight problem with my theory. > but even if they did your speculation as to the > cause is still all wrong, as explained. You lost me. I don't understand why the problem that I'm referring to couldn't happen, even if it's not what's happening here. > There is no evidence to link this behaviour with HS, as yet, and you > should be considering the possibility the problem lies elsewhere, > especially since it could be code you committed that is at fault. Huh?? The evidence that this bug is linked with HS is that it occurs on a server running in HS mode, and not otherwise. As for whether the bug is code I committed, that's certainly possible, but keep in mind it didn't work at all before IN HOT STANDBY MODE - and that will be code you committed. I'm going to go test this and see if I can figure out what's going on.I hope you will keep at it also - as you point out,your knowledge of this code far exceeds mine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote: > Huh?? The evidence that this bug is linked with HS is that it occurs > on a server running in HS mode, and not otherwise. As for whether the > bug is code I committed, that's certainly possible, but keep in mind > it didn't work at all before IN HOT STANDBY MODE - and that will be > code you committed. I'll say it now, so its plain. I'm not going to investigate every bug that occurs on Postgres, just because someone was in HS when they found it. Any more than all bugs on Postgres in normal running are MVCC bugs. There needs to be reasonable evidence or a conjecture by someone that knows something about the code. If HS were the only thing changed in recovery in this release, that might not seem reasonable, but since we have much new code and I am not the only developer, it is. Normal shutdown didn't work on a standby before HS was committed and it didn't work afterwards either. Use all the capitals you like but if you use poor arguments and combine that with no evidence then we'll not get very far, either in working together or in solving the actual bugs. Please don't continue to make wild speculations about things related to HS and recovery, so that issues do not become confused; there is no need to comment on every thread. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 12, 2010 at 5:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote: > >> Huh?? The evidence that this bug is linked with HS is that it occurs >> on a server running in HS mode, and not otherwise. As for whether the >> bug is code I committed, that's certainly possible, but keep in mind >> it didn't work at all before IN HOT STANDBY MODE - and that will be >> code you committed. > > I'll say it now, so its plain. I'm not going to investigate every bug > that occurs on Postgres, just because someone was in HS when they found > it. Fair enough, though your help debugging is always appreciated regardless of whether a problem is HS related or not. Nobody's obligated to work on anything in Postgres after all. I'm not sure who to blame for the shouting match over whose commit introduced the bug -- it doesn't seem like a relevant or useful thing to argue about, please both stop. > there is no need > to comment on every thread. This is out of line. -- greg
On Wed, 2010-05-12 at 18:05 +0100, Greg Stark wrote: > I'm not sure who to blame for the shouting match over whose commit > introduced the bug -- it doesn't seem like a relevant or useful thing > to argue about, please both stop. I haven't blamed Robert's code, merely asked him to consider that it is something other HS, since we have no evidence either way at present because the issue is sporadic and has not been replicated as yet, with no specific detail leading to any section of code. > > there is no need > > to comment on every thread. > > This is out of line. Quoted out of context, it is. My full comment is "Please don't continue to make wild speculations about things related to HS and recovery, so that issues do not become confused; there is no need to comment on every thread." ... by which I mean threads related to HS and recovery. I respect everybody's right to free speech here, but I would say the same to anyone if they do it repeatedly. I'm not the first to make such a comment on hackers either. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2010-05-12 at 17:49 +0100, Simon Riggs wrote: > On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote: > Normal shutdown didn't work on a standby before HS was committed and it > didn't work afterwards either. Use all the capitals you like but if you > use poor arguments and combine that with no evidence then we'll not get > very far, either in working together or in solving the actual bugs. > Please don't continue to make wild speculations about things related to > HS and recovery, so that issues do not become confused; there is no need > to comment on every thread. > Simon, People are very passionate about this feature. This feature has the ability to show us as moving forward in a fashion that will allow us to directly compete with the "big boys" in the "big installs", although we are still probably 2-3 releases from that. It also has the ability to make us look like a bunch of yahoos (no pun intended) who are better served beating up on that database that Oracle just bought, versus Oracle itself. Patience is a virtue for all when it comes to the this feature. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On Wed, May 12, 2010 at 1:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 18:05 +0100, Greg Stark wrote: > >> I'm not sure who to blame for the shouting match over whose commit >> introduced the bug -- it doesn't seem like a relevant or useful thing >> to argue about, please both stop. > > I haven't blamed Robert's code, merely asked him to consider that it is > something other HS, since we have no evidence either way at present > because the issue is sporadic and has not been replicated as yet, with > no specific detail leading to any section of code. I'm not really sure what we're arguing about here. I feel like I'm being accused either of (a) introducing the bug (which is possible) or (b) saying that Simon introduced the bug (which presumably is also possible, although it's not really my point). I ventured an uninformed guess at what the problem might be; Simon thinks my guess is wrong, and it may well be: but either way there's a bug buried in here somewhere and it would be nice to fix it. I thought that it would be a good idea for Simon to look at it because, on the surface, it APPEARS to have something to do with Hot Standby, since that's what Stefan was testing when he found it. Sure, the investigation might lead somewhere else; I completely admit that. Now, Simon just said he HAS looked at it and can't reproduce the problem. So now I'm even less sure what we're arguing about. I'm glad he looked at it. It's interesting that he wasn't able to reproduce the problem. I hope that he or someone else will find something that helps us move forward. I am having difficulty reproducing Stefan's test environment and perhaps for that reason I can't reproduce it either, though I've encountered several other problems about which, I suppose, I will post separate emails. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 05/12/2010 05:28 PM, Simon Riggs wrote: > On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote: >> On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote: >>> On Wed, May 12, 2010 at 7:26 AM, Simon Riggs<simon@2ndquadrant.com> wrote: >>>> On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote: >>>> >>>>> I'm not sure what to make of this. Sometimes not shutting down >>>>> doesn't sound like a feature to me. >>>> >>>> It acts exactly the same in recovery as in normal running. It is not a >>>> special feature of recovery at all, bug or otherwise. >>> >>> Simon, that doesn't make any sense. We are talking about a backend >>> getting stuck forever on an exclusive lock that is held by the startup >>> process and which will never be released (for example, because the >>> master has shut down and no more WAL can be obtained for replay). The >>> startup process does not hold locks in normal operation. >> >> When I test it, startup process holding a lock does not prevent shutdown >> of a standby. >> >> I'd be happy to see your test case showing a bug exists and that the >> behaviour differs from normal running. > > Let me put this differently: I accept that Stefan has reported a > problem. Neither Tom nor myself can reproduce the problem. I've re-run > Stefan's test case and restarted the server more than 400 times now > without any issue. > > I re-read your post where you gave what you yourself called "uninformed > speculation". There's no real polite way to say it, but yes your > speculation does appear to be uninformed, since it is incorrect. Reasons > would be not least that Stefan's tests don't actually send any locks to > the standby anyway (!), but even if they did your speculation as to the > cause is still all wrong, as explained. > > There is no evidence to link this behaviour with HS, as yet, and you > should be considering the possibility the problem lies elsewhere, > especially since it could be code you committed that is at fault. Well I'm not sure why people seem to have that hard a time reproducing that issue - it seems that I can provoke it really trivially(in this case no loops, no pgbench, no tricks). A few minutes ago I logged into my test standby (which is idle except for the odd connect to template1 caused by nagios - the master is idle as well and has been for days): postgres@soldata005:~$ psql psql (9.0beta1) Type "help" for help. postgres=# select 1; ?column? ---------- 1 (1 row) postgres=# \q postgres@soldata005:~$ pg_ctl -D /var/lib/postgresql/9.0b1/main/ restart waiting for server to shut down.... done server stopped server starting postgres@soldata005:~$ pg_ctl -D /var/lib/postgresql/9.0b1/main/ restart waiting for server to shut down.... done server stopped server starting postgres@soldata005:~$ pg_ctl -D /var/lib/postgresql/9.0b1/main/ restart waiting for server to shut down............................................................... failed pg_ctl: server does not shut down the server log for that is as follows: <2010-05-12 20:36:18.166 CEST,,,> LOG: received smart shutdown request <2010-05-12 20:36:18.167 CEST,,,> FATAL: terminating walreceiver process due to administrator command <2010-05-12 20:36:18.174 CEST,,,> LOG: shutting down <2010-05-12 20:36:18.251 CEST,,,> LOG: database system is shut down <2010-05-12 20:36:19.706 CEST,,,> LOG: database system was interrupted while in recovery at log time 2010-05-06 17:36:05 CEST <2010-05-12 20:36:19.706 CEST,,,> HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. <2010-05-12 20:36:19.706 CEST,,,> LOG: entering standby mode <2010-05-12 20:36:19.721 CEST,,,> LOG: consistent recovery state reached at 1/12000078 <2010-05-12 20:36:19.721 CEST,,,> LOG: invalid record length at 1/12000078 <2010-05-12 20:36:19.723 CEST,,,> LOG: database system is ready to accept read only connections <2010-05-12 20:36:19.737 CEST,,,> LOG: streaming replication successfully connected to primary <2010-05-12 20:36:19.918 CEST,,,> LOG: received smart shutdown request <2010-05-12 20:36:19.919 CEST,,,> FATAL: terminating walreceiver process due to administrator command <2010-05-12 20:36:19.922 CEST,,,> LOG: shutting down <2010-05-12 20:36:19.937 CEST,,,> LOG: database system is shut down <2010-05-12 20:36:21.433 CEST,,,> LOG: database system was interrupted while in recovery at log time 2010-05-06 17:36:05 CEST <2010-05-12 20:36:21.433 CEST,,,> HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. <2010-05-12 20:36:21.433 CEST,,,> LOG: entering standby mode <2010-05-12 20:36:21.482 CEST,,,> LOG: received smart shutdown request <2010-05-12 20:36:21.504 CEST,,,> LOG: consistent recovery state reached at 1/12000078 <2010-05-12 20:36:21.504 CEST,,,> LOG: invalid record length at 1/12000078 <2010-05-12 20:36:21.505 CEST,,,> LOG: database system is ready to accept read only connections <2010-05-12 20:36:21.516 CEST,,,> LOG: streaming replication successfully connected to primary so it restarted two times successfully - however if one looks at the third time one can see that it received the smart shutdown request BEFORE it reached a consistent recovery state - yet it continued to enable HS and reenabled SR as well. The database is now sitting there doing nothing and it more or less broken because you cannot connect to it in the current state: ~$ psql psql: FATAL: the database system is shutting down the startup process has the following backtrace: (gdb) bt #0 0x00007fbe24cb2c83 in select () from /lib/libc.so.6 #1 0x00000000006e811a in pg_usleep () #2 0x000000000048c333 in XLogPageRead () #3 0x000000000048c967 in ReadRecord () #4 0x0000000000493ab6 in StartupXLOG () #5 0x0000000000495a88 in StartupProcessMain () #6 0x00000000004ab25e in AuxiliaryProcessMain () #7 0x00000000005d4a7d in StartChildProcess () #8 0x00000000005d70c2 in PostmasterMain () #9 0x000000000057d898 in main () Stefan
On Wed, 2010-05-12 at 21:10 +0200, Stefan Kaltenbrunner wrote: > > There is no evidence to link this behaviour with HS, as yet, and you > > should be considering the possibility the problem lies elsewhere, > > especially since it could be code you committed that is at fault. > > Well I'm not sure why people seem to have that hard a time reproducing > that issue - it seems that I can provoke it really trivially(in this > case no loops, no pgbench, no tricks). A few minutes ago I logged into > my test standby (which is idle except for the odd connect to template1 > caused by nagios - the master is idle as well and has been for days): Thanks, good report. > so it restarted two times successfully - however if one looks at the > third time one can see that it received the smart shutdown request > BEFORE it reached a consistent recovery state - yet it continued to > enable HS and reenabled SR as well. > > The database is now sitting there doing nothing and it more or less > broken because you cannot connect to it in the current state: > > ~$ psql > psql: FATAL: the database system is shutting down > > the startup process has the following backtrace: > > (gdb) bt > #0 0x00007fbe24cb2c83 in select () from /lib/libc.so.6 > #1 0x00000000006e811a in pg_usleep () > #2 0x000000000048c333 in XLogPageRead () > #3 0x000000000048c967 in ReadRecord () > #4 0x0000000000493ab6 in StartupXLOG () > #5 0x0000000000495a88 in StartupProcessMain () > #6 0x00000000004ab25e in AuxiliaryProcessMain () > #7 0x00000000005d4a7d in StartChildProcess () > #8 0x00000000005d70c2 in PostmasterMain () > #9 0x000000000057d898 in main () Well, its waiting for new info from primary. Nothing to do with locking, but that's not an indication that its an SR issue though either. ;-) I'll put some waits into that part of the code and see if I can induce the failure. Maybe its just a simple lack of a CHECK_FOR_INTERRUPTS(). -- Simon Riggs www.2ndQuadrant.com
Excerpts from Stefan Kaltenbrunner's message of mié may 12 15:10:28 -0400 2010: > the startup process has the following backtrace: > > (gdb) bt > #0 0x00007fbe24cb2c83 in select () from /lib/libc.so.6 > #1 0x00000000006e811a in pg_usleep () > #2 0x000000000048c333 in XLogPageRead () > #3 0x000000000048c967 in ReadRecord () > #4 0x0000000000493ab6 in StartupXLOG () > #5 0x0000000000495a88 in StartupProcessMain () > #6 0x00000000004ab25e in AuxiliaryProcessMain () > #7 0x00000000005d4a7d in StartChildProcess () > #8 0x00000000005d70c2 in PostmasterMain () > #9 0x000000000057d898 in main () I just noticed that we have some code assigning the return value of time() to a pg_time_t variable. Is this supposed to work reliably? (xlog.c lines 9267ff) --
On Wed, 2010-05-12 at 15:36 -0400, Alvaro Herrera wrote: > I just noticed that we have some code assigning the return value of > time() to a pg_time_t variable. Is this supposed to work reliably? > (xlog.c lines 9267ff) Code's used that for a while now. Checkpoints and everywhere. -- Simon Riggs www.2ndQuadrant.com
On Wed, May 12, 2010 at 3:36 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Excerpts from Stefan Kaltenbrunner's message of mié may 12 15:10:28 -0400 2010: > >> the startup process has the following backtrace: >> >> (gdb) bt >> #0 0x00007fbe24cb2c83 in select () from /lib/libc.so.6 >> #1 0x00000000006e811a in pg_usleep () >> #2 0x000000000048c333 in XLogPageRead () >> #3 0x000000000048c967 in ReadRecord () >> #4 0x0000000000493ab6 in StartupXLOG () >> #5 0x0000000000495a88 in StartupProcessMain () >> #6 0x00000000004ab25e in AuxiliaryProcessMain () >> #7 0x00000000005d4a7d in StartChildProcess () >> #8 0x00000000005d70c2 in PostmasterMain () >> #9 0x000000000057d898 in main () > > I just noticed that we have some code assigning the return value of > time() to a pg_time_t variable. Is this supposed to work reliably? > (xlog.c lines 9267ff) I' -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, May 12, 2010 at 3:51 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, May 12, 2010 at 3:36 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Excerpts from Stefan Kaltenbrunner's message of mié may 12 15:10:28 -0400 2010: >> >>> the startup process has the following backtrace: >>> >>> (gdb) bt >>> #0 0x00007fbe24cb2c83 in select () from /lib/libc.so.6 >>> #1 0x00000000006e811a in pg_usleep () >>> #2 0x000000000048c333 in XLogPageRead () >>> #3 0x000000000048c967 in ReadRecord () >>> #4 0x0000000000493ab6 in StartupXLOG () >>> #5 0x0000000000495a88 in StartupProcessMain () >>> #6 0x00000000004ab25e in AuxiliaryProcessMain () >>> #7 0x00000000005d4a7d in StartChildProcess () >>> #8 0x00000000005d70c2 in PostmasterMain () >>> #9 0x000000000057d898 in main () >> >> I just noticed that we have some code assigning the return value of >> time() to a pg_time_t variable. Is this supposed to work reliably? >> (xlog.c lines 9267ff) > > I' I have a love-hate relationship with GMail, sorry. I am wondering if we are not correctly handling the case where we get a shutdown request while we are still in the PM_STARTUP state. It looks like we might go ahead and switch to PM_RECOVERY and then PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some logic to handle the shutdown when the startup process exits, but if the startup process never exits it looks like we might get stuck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote: > I thought that it > would be a good idea for Simon to look at it because, on the surface, > it APPEARS to have something to do with Hot Standby, since that's what > Stefan was testing when he found it. He was also testing SR, yet you haven't breathed a word about that for some strange reason. It didn't APPEAR like it was HS at all, not from basic logic or from technical knowledge. So you'll have to forgive me if I don't leap into action when you say something is an HS problem in the future. -- Simon Riggs www.2ndQuadrant.com
Simon, Robert, > He was also testing SR, yet you haven't breathed a word about that for > some strange reason. It didn't APPEAR like it was HS at all, not from > basic logic or from technical knowledge. So you'll have to forgive me if > I don't leap into action when you say something is an HS problem in the > future. Can we please chill out on this some? Especially since we now have an actual reproduceable bug? Simon, it's natural for people to come to you because you are knowledgeable and responsive. You should take it as a compliment. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Wed, 2010-05-12 at 22:34 +0100, Simon Riggs wrote: > On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote: > > > I thought that it > > would be a good idea for Simon to look at it because, on the surface, > > it APPEARS to have something to do with Hot Standby, since that's what > > Stefan was testing when he found it. > > He was also testing SR, yet you haven't breathed a word about that for > some strange reason. It didn't APPEAR like it was HS at all, not from > basic logic or from technical knowledge. So you'll have to forgive me if > I don't leap into action when you say something is an HS problem in the > future. Simon, with respect -- knock it off. Robert gave a very reasonable response. He is just trying to help. Relax man. Joshua Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On Wed, May 12, 2010 at 5:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote: > >> I thought that it >> would be a good idea for Simon to look at it because, on the surface, >> it APPEARS to have something to do with Hot Standby, since that's what >> Stefan was testing when he found it. > > He was also testing SR, yet you haven't breathed a word about that for > some strange reason. It didn't APPEAR like it was HS at all, not from > basic logic or from technical knowledge. So you'll have to forgive me if > I don't leap into action when you say something is an HS problem in the > future. Well, the original subject line of the report had mentioned SR only, but I had a specific theory about what might be happening that was related to the operation of HS. You've said that you think my guess is incorrect, and that's very possible, but until we actually find and fix the bug we're all just guessing. I wasn't intending to cast aspersions on your code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-12 at 17:49 +0100, Simon Riggs wrote: > On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote: > Normal shutdown didn't work on a standby before HS was committed and it > didn't work afterwards either. Use all the capitals you like but if you > use poor arguments and combine that with no evidence then we'll not get > very far, either in working together or in solving the actual bugs. > Please don't continue to make wild speculations about things related to > HS and recovery, so that issues do not become confused; there is no need > to comment on every thread. > Simon, People are very passionate about this feature. This feature has the ability to show us as moving forward in a fashion that will allow us to directly compete with the "big boys" in the "big installs", although we are still probably 2-3 releases from that. It also has the ability to make us look like a bunch of yahoos (no pun intended) who are better served beating up on that database that Oracle just bought, versus Oracle itself. Patience is a virtue for all when it comes to the this feature. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On Thu, May 13, 2010 at 4:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I am wondering if we are not correctly handling the case where we get > a shutdown request while we are still in the PM_STARTUP state. It > looks like we might go ahead and switch to PM_RECOVERY and then > PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some > logic to handle the shutdown when the startup process exits, but if > the startup process never exits it looks like we might get stuck. Right. I reported this problem and submitted the patch before. http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php Stefan, Could you check whether the patch fixes the problem you encountered? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, May 12, 2010 at 10:46 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, May 13, 2010 at 4:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I am wondering if we are not correctly handling the case where we get >> a shutdown request while we are still in the PM_STARTUP state. It >> looks like we might go ahead and switch to PM_RECOVERY and then >> PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some >> logic to handle the shutdown when the startup process exits, but if >> the startup process never exits it looks like we might get stuck. > > Right. I reported this problem and submitted the patch before. > http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php Sorry we missed that. > Stefan, > Could you check whether the patch fixes the problem you encountered? I think that would be a good thing to check (it'll confirm whether this is the same bug), but I'm not convinced we should actually fix it that way. Prior to 8.4, we handled a smart shutdown during recovery at the conclusion of recovery, just prior to entering normal running. I'm wondering if we shouldn't revert to that behavior in both 8.4 and HEAD. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-05-12 at 22:34 +0100, Simon Riggs wrote: > On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote: > > > I thought that it > > would be a good idea for Simon to look at it because, on the surface, > > it APPEARS to have something to do with Hot Standby, since that's what > > Stefan was testing when he found it. > > He was also testing SR, yet you haven't breathed a word about that for > some strange reason. It didn't APPEAR like it was HS at all, not from > basic logic or from technical knowledge. So you'll have to forgive me if > I don't leap into action when you say something is an HS problem in the > future. Simon, with respect -- knock it off. Robert gave a very reasonable response. He is just trying to help. Relax man. Joshua Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
On 5/12/10 8:07 PM, Robert Haas wrote: > I think that would be a good thing to check (it'll confirm whether > this is the same bug), but I'm not convinced we should actually fix it > that way. Prior to 8.4, we handled a smart shutdown during recovery > at the conclusion of recovery, just prior to entering normal running. > I'm wondering if we shouldn't revert to that behavior in both 8.4 and > HEAD. This would be OK as long as we document it well. We patched the shutdown the way we did specifically because Fujii thought it would be an easy fix; if it's complicated, we should revert it and document the issue for DBAs. Oh, and to confirm: the same issue exists, and has always existed, with Warm Standby. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, May 13, 2010 at 1:12 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 5/12/10 8:07 PM, Robert Haas wrote: >> I think that would be a good thing to check (it'll confirm whether >> this is the same bug), but I'm not convinced we should actually fix it >> that way. Prior to 8.4, we handled a smart shutdown during recovery >> at the conclusion of recovery, just prior to entering normal running. >> I'm wondering if we shouldn't revert to that behavior in both 8.4 and >> HEAD. > > This would be OK as long as we document it well. We patched the > shutdown the way we did specifically because Fujii thought it would be > an easy fix; if it's complicated, we should revert it and document the > issue for DBAs. I don't understand this comment. > Oh, and to confirm: the same issue exists, and has always existed, with > Warm Standby. That's what I was thinking, but I hadn't gotten around to testing it. Thanks for the confirmation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
>> This would be OK as long as we document it well. We patched the >> shutdown the way we did specifically because Fujii thought it would be >> an easy fix; if it's complicated, we should revert it and document the >> issue for DBAs. > > I don't understand this comment. In other words, I'm saying that it's not critical that we troubleshoot this for 9.0. Revering Fujii's patch, if it's not working, is an option. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Fri, May 14, 2010 at 5:51 PM, Josh Berkus <josh@agliodbs.com> wrote: > >>> This would be OK as long as we document it well. We patched the >>> shutdown the way we did specifically because Fujii thought it would be >>> an easy fix; if it's complicated, we should revert it and document the >>> issue for DBAs. >> >> I don't understand this comment. > > In other words, I'm saying that it's not critical that we troubleshoot > this for 9.0. Revering Fujii's patch, if it's not working, is an option. There is no patch which we could revert to fix this, by Fujii Masao or anyone else. The patch he proposed has not been committed. I am still studying the problem to try to figure out where to go with it. We could decide to punt the whole thing for 9.1, but I'd like to understand what the options are before we make that decision. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company