Thread: Hot Standby: Relation-specific deferred conflict resolution
Conflict resolution improvements are important to include in this release, as discussed many times. Proposal given here http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php presents a viable design to improve this. Following patch is a complete working implementation of that design. I'm still testing it, but its worth publishing as early as possible to allow discussion. Not for commit, just yet, but soon. standby.c changes are to decide whether to defer recovery based upon relfilenode of WAL record. If resolution deferred, re-check for conflict during LockAcquire() and fail with snapshot error, just as if resolution had never been deferred. Also, an optimisation of conflict processing to avoid continual re-evaluation of conflicts since some are now deferred. API changes in heapam and nbtxlog to pass thru RelFileNode API changes in indexcmds, no behaviour changes procarray changes to implement LatestRemovedXid cache backend/access/heap/heapam.c | 6 - backend/access/nbtree/nbtxlog.c | 2 backend/commands/indexcmds.c | 4 - backend/storage/ipc/procarray.c | 55 +++++++++++- backend/storage/ipc/standby.c | 112 +++++++++++++++++++++++------ backend/storage/lmgr/lock.c | 124 ++++++++++++++++++++++++++++++-- include/storage/lock.h | 8 ++ include/storage/proc.h | 5 + include/storage/standby.h | 9 ++ 9 files changed, 292 insertions(+), 33 deletions(-) -- Simon Riggs www.2ndQuadrant.com
Attachment
Simon Riggs wrote: > Conflict resolution improvements are important to include in this > release, as discussed many times. Proposal given here > http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php > presents a viable design to improve this. > > Following patch is a complete working implementation of that design. > I'm still testing it, but its worth publishing as early as possible to > allow discussion. Not for commit, just yet, but soon. Um, you're not considering this for 9.0, are you? I think it's time to concentrate on the must-fix issues and fix the rough edges in what we have. For example, the "can't start hot standby mode from a shutdown checkpoint" issue is a must-fix issue in my opinion, about 10x as important as this. When that was last discussed, many others agreed. I run into that all the time when testing streaming replication, and every time I go "Huh, why isn't the standby opening up for connections?", and then, "Ahh, it's this stupid shutdown checkpoint issue again". And the VACUUM FULL issue is still hanging too. And maybe you could help with some other things on the open items or commitfest list. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2010-01-29 at 08:26 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Conflict resolution improvements are important to include in this > > release, as discussed many times. Proposal given here > > http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php > > presents a viable design to improve this. > > > > Following patch is a complete working implementation of that design. > > I'm still testing it, but its worth publishing as early as possible to > > allow discussion. Not for commit, just yet, but soon. > > Um, you're not considering this for 9.0, are you? I think it's time to > concentrate on the must-fix issues and fix the rough edges in what we have. Yes, it is important. > For example, the "can't start hot standby mode from a shutdown > checkpoint" issue is a must-fix issue in my opinion, about 10x as > important as this. When that was last discussed, many others agreed. I > run into that all the time when testing streaming replication, and every > time I go "Huh, why isn't the standby opening up for connections?", and > then, "Ahh, it's this stupid shutdown checkpoint issue again". That was not the feedback I have received. Nobody has commented on that to me, though many have commented on the need for the current patch. As mentioned, I went to the trouble of running a meeting to gain additional feedback and the result was very clear. Of course, if we ignore any feature, then someone will say "its that stupid issue again", but that won't imply we got our priority wrong. > And the VACUUM FULL issue is still hanging too. Yes, that is a must fix. -- Simon Riggs www.2ndQuadrant.com
On Fri, Jan 29, 2010 at 9:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > That was not the feedback I have received. Nobody has commented on that > to me, though many have commented on the need for the current patch. As > mentioned, I went to the trouble of running a meeting to gain additional > feedback and the result was very clear. I don't have a technical opinion about this problem yet as I haven't tested HS+SR yet but I'm not sure it's a good idea to base technical decisions and priorities on user polls (I'm pretty sure most of them don't use HS+SR as much as Heikki these days). If you ask people what they want in their future cars, they won't answer they want wheels or an engine: it's something obvious for them. AFAICS (but I might be wrong), you asked this question to people who are interested in HS+SR but don't have any idea of what it's like to use HS+SR daily with or without this limitation. There are perhaps better arguments for not doing it but this one seems a bit weird to me. -- Guillaume
On Fri, 2010-01-29 at 09:20 +0100, Guillaume Smet wrote: > On Fri, Jan 29, 2010 at 9:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > That was not the feedback I have received. Nobody has commented on that > > to me, though many have commented on the need for the current patch. As > > mentioned, I went to the trouble of running a meeting to gain additional > > feedback and the result was very clear. > > I don't have a technical opinion about this problem yet as I haven't > tested HS+SR yet but I'm not sure it's a good idea to base technical > decisions and priorities on user polls (I'm pretty sure most of them > don't use HS+SR as much as Heikki these days). > If you ask people what they want in their future cars, they won't > answer they want wheels or an engine: it's something obvious for them. > AFAICS (but I might be wrong), you asked this question to people who > are interested in HS+SR but don't have any idea of what it's like to > use HS+SR daily with or without this limitation. Well, you are correct that a larger group of users *could* have avoided an obvious and important issue. Though if you deploy that argument it can be applied both ways: Heikki may also be missing an obvious and important issue. Where does that leave us? I am not against putting both into this release. If I am forced to choose just one, I've at least given reasons why that should be so. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Fri, 2010-01-29 at 08:26 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> Conflict resolution improvements are important to include in this >>> release, as discussed many times. Proposal given here >>> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php >>> presents a viable design to improve this. >>> >>> Following patch is a complete working implementation of that design. >>> I'm still testing it, but its worth publishing as early as possible to >>> allow discussion. Not for commit, just yet, but soon. >> Um, you're not considering this for 9.0, are you? I think it's time to >> concentrate on the must-fix issues and fix the rough edges in what we have. > > Yes, it is important. > >> For example, the "can't start hot standby mode from a shutdown >> checkpoint" issue is a must-fix issue in my opinion, about 10x as >> important as this. When that was last discussed, many others agreed. I >> run into that all the time when testing streaming replication, and every >> time I go "Huh, why isn't the standby opening up for connections?", and >> then, "Ahh, it's this stupid shutdown checkpoint issue again". > > That was not the feedback I have received. Nobody has commented on that > to me, Yes they have. I have on several occasions, as have other people on this mailing list: http://archives.postgresql.org/message-id/603c8f070912201611h4951087craa080ff6b48a97cd@mail.gmail.com http://archives.postgresql.org/message-id/4B30AE53.6020202@gmail.com http://archives.postgresql.org/message-id/407d949e0912220738je1e0141m87d7b688dd4ba27f@mail.gmail.com I even *fixed* that already, but you decided to take it out before committing. I then added it to the list of must-fix items in the TODO list, but you took that out too. I have no objection to doing things in smaller steps, but this *is* a must-fix item before release. I still don't understand why you took it out, nor why you're objecting to that. > though many have commented on the need for the current patch. Who? > As > mentioned, I went to the trouble of running a meeting to gain additional > feedback and the result was very clear. So what was the clear result? If you're looking for things to do, I agree with Greg Stark that the removal of max_standby_delay=-1 option is not good. That should be fixed too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote: > So what was the clear result? I have spoken clearly enough. You were welcome to attend the Hot Standby User Group. The fact that you did not expresses your own priorities quite well, ISTM. Your protestations to know more about the wishes of users than they do themselves isn't hugely impressive. There are many features we should add. I will add them in priority order until forced to stop. If you or anyone else believes features are essential, then either add them yourselves or have the courage to stand up and say the release should be delayed so that I can. To do otherwise is to admit you do not actually consider them essential. It cannot be both ways. -- Simon Riggs www.2ndQuadrant.com
Heikki Linnakangas wrote: > Simon Riggs wrote: >> Conflict resolution improvements are important to include in this >> release, as discussed many times. Proposal given here >> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php >> presents a viable design to improve this. >> >> Following patch is a complete working implementation of that design. >> I'm still testing it, but its worth publishing as early as possible to >> allow discussion. Not for commit, just yet, but soon. > > Um, you're not considering this for 9.0, are you? I think it's time to > concentrate on the must-fix issues and fix the rough edges in what we have. I agree - this looks like a completely new feature development to me that tries to move the goalpost quite far for 9.0. > > For example, the "can't start hot standby mode from a shutdown > checkpoint" issue is a must-fix issue in my opinion, about 10x as > important as this. When that was last discussed, many others agreed. I > run into that all the time when testing streaming replication, and every > time I go "Huh, why isn't the standby opening up for connections?", and > then, "Ahh, it's this stupid shutdown checkpoint issue again". Yeah I ran into that one during testing as well - and I consider it a serious issue > > And the VACUUM FULL issue is still hanging too. And maybe you could help > with some other things on the open items or commitfest list. yeah and we keep finding major bugs nearly daily so I don't think we should add features that way now. First is stability and reliability, optimization and new features are imho clearly 9.1+ material. Just calling the release 9.0 and saying "we do that because it is a radical change and we expect some instability" should NOT mean we are free to put in every feature at the last minute with "yeah thats fine because people expect instability anyways". Stefan
Simon Riggs wrote: > On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote: > >> So what was the clear result? > > I have spoken clearly enough. You were welcome to attend the Hot Standby > User Group. The fact that you did not expresses your own priorities > quite well, ISTM. Your protestations to know more about the wishes of > users than they do themselves isn't hugely impressive. huh? traditionally discussions of that kind had to happen on -hackers and not in some online place some unnamed people attended. > > There are many features we should add. I will add them in priority order > until forced to stop. we are past the point of adding new features for 9.0 imho > > If you or anyone else believes features are essential, then either add > them yourselves or have the courage to stand up and say the release > should be delayed so that I can. To do otherwise is to admit you do not > actually consider them essential. It cannot be both ways. bugfix and stabilization mode is what we are in now (except for the stuff that already made it into the commitfest). Stefan
On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote: > yeah and we keep finding major bugs nearly daily Facts, please? -- Simon Riggs www.2ndQuadrant.com
2010/1/29 Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>: > Simon Riggs wrote: >> >> On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote: >> >>> So what was the clear result? >> >> I have spoken clearly enough. You were welcome to attend the Hot Standby >> User Group. The fact that you did not expresses your own priorities >> quite well, ISTM. Your protestations to know more about the wishes of >> users than they do themselves isn't hugely impressive. > > huh? traditionally discussions of that kind had to happen on -hackers and not in some online place some unnamed peopleattended. +1. Haven't we had enough communications failures with off-hackers groups trying to come up with something only to have it not be agreed upon by hackers later on? (win32 would be the biggest thing so far, but it's not like we haven't done it before in more cases) >> There are many features we should add. I will add them in priority order >> until forced to stop. > > we are past the point of adding new features for 9.0 imho > >> >> If you or anyone else believes features are essential, then either add >> them yourselves or have the courage to stand up and say the release >> should be delayed so that I can. To do otherwise is to admit you do not >> actually consider them essential. It cannot be both ways. > > bugfix and stabilization mode is what we are in now (except for the stuff that already made it into the commitfest). Well, per some recent discussions, it seems small features are still ok. But I doubt this qualifies as such. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Fri, 2010-01-29 at 11:12 +0100, Stefan Kaltenbrunner wrote: > > > > There are many features we should add. I will add them in priority order > > until forced to stop. > > we are past the point of adding new features for 9.0 imho So presumably we cannot add the new feature to start hot standby at shutdown checkpoints then either? > > If you or anyone else believes features are essential, then either add > > them yourselves or have the courage to stand up and say the release > > should be delayed so that I can. To do otherwise is to admit you do not > > actually consider them essential. It cannot be both ways. > > bugfix and stabilization mode is what we are in now (except for the > stuff that already made it into the commitfest). That's where we'd like to be, but these new features have not been in the tree long enough for what you say to be the actual position. We can pretend it is, but that doesn't make it so. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote: > >> yeah and we keep finding major bugs nearly daily > > Facts, please? > 5 seconds of time spent on archives.postgresql.org show at least the following SR/HS related bugs in the last 7 days or so: http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php some of those you might call "minor" but they are bugs and given the current rate we are seeing them is imho a clear sign of "code by far not stable enough to consider new features that late in the cycle" Stefan
Simon Riggs wrote: > On Fri, 2010-01-29 at 11:12 +0100, Stefan Kaltenbrunner wrote: > >>> There are many features we should add. I will add them in priority order >>> until forced to stop. >> we are past the point of adding new features for 9.0 imho > > So presumably we cannot add the new feature to start hot standby at > shutdown checkpoints then either? well as far as I recall that ones has been proposed much earlier than mid of december(like the patch in discussion here) and I agree with heikki that I'm not clear on what your actual objections to that patch were - care to provide a link to the archives please? > >>> If you or anyone else believes features are essential, then either add >>> them yourselves or have the courage to stand up and say the release >>> should be delayed so that I can. To do otherwise is to admit you do not >>> actually consider them essential. It cannot be both ways. >> bugfix and stabilization mode is what we are in now (except for the >> stuff that already made it into the commitfest). > > That's where we'd like to be, but these new features have not been in > the tree long enough for what you say to be the actual position. We can > pretend it is, but that doesn't make it so. Not sure I follow (maybe because I'm not a native speaker) but are you trying to say that we can simply add new features late in the release cycle to stuff commited because it is not long in the tree instead of focusing stabilizing what we have? If you are so sure that we NEED those features to be releaseworthy - maybe it was premature to commit HS and SR for this cycle? Stefan
On Fri, 2010-01-29 at 11:20 +0100, Stefan Kaltenbrunner wrote: > Simon Riggs wrote: > > On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote: > > > >> yeah and we keep finding major bugs nearly daily > > > > Facts, please? > > > > 5 seconds of time spent on archives.postgresql.org show at least the > following SR/HS related bugs in the last 7 days or so: > > http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php > http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php > http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php > http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php > > some of those you might call "minor" but they are bugs and given the > current rate we are seeing them is imho a clear sign of "code by far not > stable enough to consider new features that late in the cycle" I don't think two very minor bugs in Hot Standby, reported and fixed 7 days apart is any indication of instability. It isn't the "daily bugs reported" you suggested. Personally, I think it indicates quite the opposite - if those are the only bugs I can find now, I'm ecstatic. I think your argument does apply to Streaming Rep, at this point. We should consider releasing Alpha4 and then later going to Beta. My point of view expressed here is not built in a few seconds, it is built on discussion and feedback over 18 months. The conflict issue was discussed by me with hackers at the May 2008 dev meeting. It should be improved upon in this release and it has been the main issue concerning the full range of people I have discussed HS with. -- Simon Riggs www.2ndQuadrant.com
On Fri, Jan 29, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote: > >> So what was the clear result? > > I have spoken clearly enough. You were welcome to attend the Hot Standby > User Group. The fact that you did not expresses your own priorities > quite well, ISTM. Your protestations to know more about the wishes of > users than they do themselves isn't hugely impressive. This doesn't make any sense at all. It is not the case that people who attended the Hot Standby user group are the only ones who are entitled to provide any feedback, and that people who read pgsql-hackers are not (unless they also happen to have attended the Hot Standby user group). If anything, that's 100% backwards, but at any rate Heikki has probably spent more time over the last year on this feature than anyone in the world with the exception of yourself; I think that counts for a lot more than one two-hour meeting. ...Robert
On Fri, 2010-01-29 at 07:01 -0500, Robert Haas wrote: > On Fri, Jan 29, 2010 at 5:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote: > > > >> So what was the clear result? > > > > I have spoken clearly enough. You were welcome to attend the Hot Standby > > User Group. The fact that you did not expresses your own priorities > > quite well, ISTM. Your protestations to know more about the wishes of > > users than they do themselves isn't hugely impressive. > > This doesn't make any sense at all. I am busy working on the right features for the most number of people, as expressed to me. I accept there are people that disagree and I am sorry for that. Others are welcome to add code to do the things I will not be doing through lack of time, if they are not satisfied with my priority. I think we should extend the time available to make sure we have a sensible set of features for 9.0. The heat of this discussion tells me that we are going to be lacking features that are must-have to someone, whether or not they are in the majority. -- Simon Riggs www.2ndQuadrant.com
On Fri, 2010-01-29 at 11:33 +0200, Heikki Linnakangas wrote: > I even *fixed* that already, but you decided to take it out before > committing. I then added it to the list of must-fix items in the TODO > list, but you took that out too. I have no objection to doing things > in smaller steps, but this *is* a must-fix item before release. I > still don't understand why you took it out, nor why you're objecting > to that. This is a misrepresentation of what has happened. The item you mention was added to the TODO by me in response to your comments and has never been removed at any point; it is still there now. You added a duplicate and the duplicate was removed. I removed code that you mentioned was buggy because I don't have time to fix it and it is not high enough up the priority list. We have discussed all of these things before yet you raise them again as if those things have never been said. I am working on TODO in a priority order. The priority list has changed over time in response to comments received from both you and other people. I understand you don't like my current sequence of actions and I'm sorry for that. I am trying to be rational and balance what has been said to me for the benefit of the community from all stakeholders and have consulted widely to gather thoughts. I am more than happy for other people to work on items on the list, as Joachim and Andres have done. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > I removed code that you mentioned was > buggy because I don't have time to fix it and it is not high enough up > the priority list. We have discussed all of these things before yet you > raise them again as if those things have never been said. *sigh*. Yeah, we've been through this. As I've said before, I never said the code was buggy, that was just a misunderstanding at your end. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
The fundamental disagreement here is over what qualifies as a "wishlist" item, aka a feature or added functionality. And what qualifies as a must-fix bug. Priorities are context sensitive. If this were early in the cycle then working on bigger impact features like conflict resolution code might be more important because it's important to get them into the code base where other people can see if it solves their problems and the behaviour can be tweaked. Fixing rare but outright broken things might not be high priority because while they have to be done by release nobody is blocking on on the solution before then. On the other hand near release we stop trying to incorporate new features and focus on basic bug features. The new features don't become any less important to the users, it's just that they'll make it into the next release. There will always be more features that can help users and we'll always think of cool new things to knock off the rough edges off what we have now and get it out so we can go back to the playground for the next release. You said "I think we should extend the time available to make sure we have a sensible set of features for 9.0." Perhaps part of the problem is that I couldn't understand what your patch did from the description you posted and can't evaluate whether it's fixing a problem that makes the current feature set incoherent. Can you explain what it does in more detail so we can understand why it's necessary for a sensible set of features?
On Fri, 2010-01-29 at 16:44 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > I removed code that you mentioned was > > buggy because I don't have time to fix it and it is not high enough up > > the priority list. We have discussed all of these things before yet you > > raise them again as if those things have never been said. > > *sigh*. Yeah, we've been through this. As I've said before, I never said > the code was buggy, that was just a misunderstanding at your end. OK, if you say I misunderstood, then I accept that. The deed is done and the code removed. Putting it back takes time and given enough of that rare cloth, it will eventually be put back. -- Simon Riggs www.2ndQuadrant.com
On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote: > I think we should extend the time available to make sure we have a > sensible set of features for 9.0. The heat of this discussion tells me > that we are going to be lacking features that are must-have to someone, > whether or not they are in the majority. -1 Missing release dates because of some patch that isn't done is something the community has been trying to get away from, aggressively. The way this is supposed to work is: We have a release date Features that aren't going to make that date, don't. We release Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote: > You said "I think we should extend the time available to make sure we > have a sensible set of features for 9.0." Perhaps part of the problem > is that I couldn't understand what your patch did from the description > you posted and can't evaluate whether it's fixing a problem that makes > the current feature set incoherent. Can you explain what it does in > more detail so we can understand why it's necessary for a sensible set > of features? I'll break down the patch into two pieces to make it easier to review, and add more description, as you suggest. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Fri, 2010-01-29 at 11:20 +0100, Stefan Kaltenbrunner wrote: >> Simon Riggs wrote: >>> On Fri, 2010-01-29 at 11:10 +0100, Stefan Kaltenbrunner wrote: >>> >>>> yeah and we keep finding major bugs nearly daily >>> Facts, please? >>> >> 5 seconds of time spent on archives.postgresql.org show at least the >> following SR/HS related bugs in the last 7 days or so: >> >> http://archives.postgresql.org/pgsql-committers/2010-01/msg00400.php >> http://archives.postgresql.org/pgsql-committers/2010-01/msg00410.php >> http://archives.postgresql.org/pgsql-committers/2010-01/msg00396.php >> http://archives.postgresql.org/pgsql-committers/2010-01/msg00323.php >> >> some of those you might call "minor" but they are bugs and given the >> current rate we are seeing them is imho a clear sign of "code by far not >> stable enough to consider new features that late in the cycle" > > I don't think two very minor bugs in Hot Standby, reported and fixed 7 > days apart is any indication of instability. It isn't the "daily bugs > reported" you suggested. Personally, I think it indicates quite the > opposite - if those are the only bugs I can find now, I'm ecstatic. well we have not even made a realistic release (not even an alpha!) with the current HS/SR code, we still have "must fix" issues on the table(for both SR and HS) AND we find/fix more than a bug every two days in those two features that are the cause for moving to 9.0. If we want to release in anya realistic timeframe (and I recall you advocating for doing that in the past) we really need to wrap up what we have now, make it robust and see what we have left for all the further releases. > > I think your argument does apply to Streaming Rep, at this point. We > should consider releasing Alpha4 and then later going to Beta. so you basically say that the current codebase(as a whole) is in need of stabilisation and we need to make the cut off and release alpha4 now? Not sure how that fits into proposing new features for other parts of the system... > > My point of view expressed here is not built in a few seconds, it is > built on discussion and feedback over 18 months. The conflict issue was > discussed by me with hackers at the May 2008 dev meeting. It should be > improved upon in this release and it has been the main issue concerning > the full range of people I have discussed HS with. "in this release" refers to the current patch I guess - because there was no HS in older versions of pg :) Stefan
On Fri, Jan 29, 2010 at 11:32 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote: > >> I think we should extend the time available to make sure we have a >> sensible set of features for 9.0. The heat of this discussion tells me >> that we are going to be lacking features that are must-have to someone, >> whether or not they are in the majority. > > -1 > > Missing release dates because of some patch that isn't done is something > the community has been trying to get away from, aggressively. The way > this is supposed to work is: > > We have a release date > Features that aren't going to make that date, don't. > We release Exactly. It would be nice to see 9.0 come out in 2010, and we're not going to get there unless we start fixing the issues that are actually release-blockers, rather than adding new features. Hot Standby was committed with at least one known release blocker (VACUUM FULL) on the assumption that that release blocker would be fixed by the committer who introduced it (isn't that the rule?). Two months on, there is zero sign of any activity on that front, and instead we're now being bombarded with a series of other patches that fix issues that are not release-blockers under the theory that the feature isn't good enough to be used without them. If that's really true, it wasn't ready for commit in the first place. If this were any other patch, I'd propose reverting it. ...Robert
On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote: > Exactly. It would be nice to see 9.0 come out in 2010, and we're not > going to get there unless we start fixing the issues that are actually > release-blockers, rather than adding new features. Hot Standby was > committed with at least one known release blocker (VACUUM FULL) on the > assumption that that release blocker would be fixed by the committer > who introduced it (isn't that the rule?). Two months on, there is > zero sign of any activity on that front, and instead we're now being > bombarded with a series of other patches that fix issues that are not > release-blockers under the theory that the feature isn't good enough > to be used without them. If that's really true, it wasn't ready for > commit in the first place. > > If this were any other patch, I'd propose reverting it. > I would suggest that if we don't see activity on the release blockers, pretty much stat... we revert it. Joshua D. Drake > ...Robert > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote: > Two months on, there is > zero sign of any activity on that front I'm surprised that you call 14 commits in 28 days following a publicly available priority list: "zero sign of activity". Further discussion seems pointless. -- Simon Riggs www.2ndQuadrant.com
On Fri, 2010-01-29 at 18:08 +0000, Simon Riggs wrote: > On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote: > > Two months on, there is > > zero sign of any activity on that front > > I'm surprised that you call 14 commits in 28 days following a publicly > available priority list: "zero sign of activity". > > Further discussion seems pointless. Let's be clear. Robert is discussing release blockers. He is not suggesting that no work has been done. I believe the community considers release-blocks "the priority". Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, Jan 29, 2010 at 1:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote: >> Two months on, there is >> zero sign of any activity on that front > > I'm surprised that you call 14 commits in 28 days following a publicly > available priority list: "zero sign of activity". > > Further discussion seems pointless. Wow, that was an awesome way to quote what I said out of context and make it sound like I said something ridiculous. The problem I and others have is not with the quantity of your commits but with the issues you are choosing (not) to address. ...Robert
All, Is there a working list of HS must-fix items somewhere which people agree on? Or are we still lacking consensus? --Josh Berkus
On Fri, 2010-01-29 at 11:41 -0800, Josh Berkus wrote: > All, > > Is there a working list of HS must-fix items somewhere which people > agree on? Or are we still lacking consensus? VACUUM FULL, I believe is one. Joshua D. Drake > > --Josh Berkus > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 12:56 +0000, Simon Riggs wrote: > I think we should extend the time available to make sure we have a > sensible set of features for 9.0. The heat of this discussion tells me > that we are going to be lacking features that are must-have to someone, > whether or not they are in the majority. -1 Missing release dates because of some patch that isn't done is something the community has been trying to get away from, aggressively. The way this is supposed to work is: We have a release date Features that aren't going to make that date, don't. We release Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote: > Exactly. It would be nice to see 9.0 come out in 2010, and we're not > going to get there unless we start fixing the issues that are actually > release-blockers, rather than adding new features. Hot Standby was > committed with at least one known release blocker (VACUUM FULL) on the > assumption that that release blocker would be fixed by the committer > who introduced it (isn't that the rule?). Two months on, there is > zero sign of any activity on that front, and instead we're now being > bombarded with a series of other patches that fix issues that are not > release-blockers under the theory that the feature isn't good enough > to be used without them. If that's really true, it wasn't ready for > commit in the first place. > > If this were any other patch, I'd propose reverting it. > I would suggest that if we don't see activity on the release blockers, pretty much stat... we revert it. Joshua D. Drake > ...Robert > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 18:08 +0000, Simon Riggs wrote: > On Fri, 2010-01-29 at 12:23 -0500, Robert Haas wrote: > > Two months on, there is > > zero sign of any activity on that front > > I'm surprised that you call 14 commits in 28 days following a publicly > available priority list: "zero sign of activity". > > Further discussion seems pointless. Let's be clear. Robert is discussing release blockers. He is not suggesting that no work has been done. I believe the community considers release-blocks "the priority". Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 11:41 -0800, Josh Berkus wrote: > All, > > Is there a working list of HS must-fix items somewhere which people > agree on? Or are we still lacking consensus? VACUUM FULL, I believe is one. Joshua D. Drake > > --Josh Berkus > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.
On Fri, 2010-01-29 at 14:52 +0000, Greg Stark wrote: > Can you explain what it does in > more detail so we can understand why it's necessary for a sensible set > of features? I've slimmed down the patch to make it clearer what it does, having committed some refactoring. Problem: Currently when we perform conflict resolution we do not use the relid from the WAL record, we target all users regardless of which relations they have accessed or intend to access. So changes to table X can cause cancelation of someone accessing table Y because **they might later in the transaction access table X**. That is too heavy handed and is most often overkill. This is the same problem you and I have discussed many times, over the last 14 months, though the problem itself has been discussed on hackers many times over last 20 months and many potential solutions offered by me. An example of current behaviour, using tables A, B and C T0: An AccessExclusiveLock is applied to B T1: Q1 takes snapshot, takes lock on A and begins query T2: Q2 takes snapshot, queues for lock on B behind AccessExclusiveLock T3: Cleanup on table C is handled that will conflict with both snapshots T4: Q3 takes snapshot, takes lock on C and begins query (if possible) T5: Cleanup on table C is handled that will conflict with Q3 Current: At T3, current conflict resolution will wait for max_standby_delay and then cancel Q1 and Q2. Q3 can begin processing immediately because the snapshot it takes will always be same or later than the xmin that generated the cleanup at T3. At T5, Q3 will be quickly cancelled because all the standby delay was used up at T3 and there is none left to spend on delaying for Q3. Proposed Resolution: as presented to hackers in 12/2009 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01175.php Let's look at the effect first, then return to the detail. In this proposal, the above sequence of actions will look like this: Conflict resolution will wait at T3 until we hit max_standby_delay, at which point we learn that Q1 and Q2 do not conflict and we let them continue on their way. At T5, Q3 will be cancelled without much delay, because we have now used up most of max_standby_delay. So in both approaches, Q3 that accessed table C will be canceled fairly quickly. The key to this is that in the new proposal, Q1 and Q2 will not be canceled: they will continue to completion. How it works: When we process a snapshot conflict we check which queries have snapshots that conflict. We then wait for max_standby_delay and the check lock conflicts. (We do it this way because of a timing issue described on the link above, pointed out by Greg). When we check for lock conflicts we also set a latestRemovedXid on "that relation", so that we capture all current lockers *and* allow all future lockers to check the latestRemovedXid against their snapshot. In either case, if a lock conflict occurs then we will cancel the query. I mention "that relation" because *where* we record the xid limit for each relation is an important aspect of the design. In the current patch we take a simple approach, others are possible. If there is already a lock in the shared lock table, then we add the latestRemovedXid to that. If not, we keep track of the latestRemovedXid for the whole lock partition. So we aren't tracking each relation separately in most cases, except for when a table is being frequently accessed, or access for a long period. There is also an optimization added here. When we defer cancelation of queries the same query keeps re-appearing in the conflict list for later WAL records. As a result there is a mechanism to avoid constant re-listing of a conflict. The attached patch is for review and discussion only at this stage. I'm working on other areas now while discussion takes place, or not. -- Simon Riggs www.2ndQuadrant.com
Attachment
On Fri, 2010-01-29 at 15:01 +0000, Simon Riggs wrote: > Putting it back takes time and > given enough of that rare cloth, it will eventually be put back. Looks like I'll have time to add the starts-at-shutdown-checkpoint item back in after all. I'd appreciate it if you could review the relation-specific conflict patch, 'cos it's still important. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Fri, 2010-01-29 at 15:01 +0000, Simon Riggs wrote: > >> Putting it back takes time and >> given enough of that rare cloth, it will eventually be put back. > > Looks like I'll have time to add the starts-at-shutdown-checkpoint item > back in after all. Great! Thank you, much appreciated. > I'd appreciate it if you could review the relation-specific conflict > patch, 'cos it's still important. One fundamental gripe I have about that approach is that it's hard to predict when you will be saved by the cache and when your query will be canceled. For example, the patch stores only one "latestRemovedXid" value per lock partition. So if you have two tables that hash to different lock partitions, and are never both accessed in a single transaction, the cache will save your query every time. So far so good, but then you do a dump/restore, and the tables happen to be assigned to the same lock partition. Oops, a system that used to work fine starts to get "snapshot too old" errors. It's often better to be consistent and predictable, even if it means cancelling more queries. I think wë́'d need to have a much more fine-grained system before it's worthwhile to do deferred resolution. There's just too much "false sharing" otherwise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2010-02-02 at 20:27 +0200, Heikki Linnakangas wrote: > > I'd appreciate it if you could review the relation-specific conflict > > patch, 'cos it's still important. > > One fundamental gripe I have about that approach is that it's hard to > predict when you will be saved by the cache and when your query will be > canceled. For example, the patch stores only one "latestRemovedXid" > value per lock partition. So if you have two tables that hash to > different lock partitions, and are never both accessed in a single > transaction, the cache will save your query every time. So far so good, > but then you do a dump/restore, and the tables happen to be assigned to > the same lock partition. Oops, a system that used to work fine starts to > get "snapshot too old" errors. > > It's often better to be consistent and predictable, even if it means > cancelling more queries. I think wë́'d need to have a much more > fine-grained system before it's worthwhile to do deferred resolution. > There's just too much "false sharing" otherwise. ISTM that this is exactly backwards. There is already way too many false positives and this patch would reduce them very significantly. Plus the cancelation is hardly predictable since it relies on whether or not a btree delete takes place during execution and the arrival time and rate of those is sporadic. There is no safe, predicatable behaviour in the current code. The gripe about the cache cannot be a fundamental one, since we can easily change the size and mechanism by which the cache operates without changing the patch very much at all. I am being told this area is a must-fix issue for this release. Tom's reaction to this issue (on other thread) illustrates that beautifully: On Sun, 2010-01-31 at 15:41 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: (snip) > > 2. no matter if they haven't accessed the index being cleaned (they > > might later, is the thinking...) > > That seems seriously horrid. What is the rationale for #2 in > particular? I would hope that at worst this would affect sessions > that are actively competing for the index being cleaned. -- Simon Riggs www.2ndQuadrant.com