Thread: New trigger option of pg_standby
Hi, Current pg_standby is dangerous because the presence of the trigger file causes recovery to end whether or not the next WAL file is available. So, some *available* transactions may be lost at failover. Such danger will become high if the standby server has not caught up with the primary. Attached patch fixes the above problem by adding a new trigger option to pg_standby; the presence of this new trigger file causes recovery to end after replaying all the available WAL files. Specifically, pg_standby acts like 'cp' or 'ln' command while this new trigger file exists. I've not changed any existing features, so backward-compatibility is maintained. Thought? -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Wed, Mar 25, 2009 at 7:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Attached patch fixes the above problem by adding a new trigger option > to pg_standby; the presence of this new trigger file causes recovery to > end after replaying all the available WAL files. Shouldn't it be the default? It seems like the most expected behaviour to me. -- Guillaume
Hi, Thanks for the comment. On Wed, Mar 25, 2009 at 3:50 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Wed, Mar 25, 2009 at 7:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Attached patch fixes the above problem by adding a new trigger option >> to pg_standby; the presence of this new trigger file causes recovery to >> end after replaying all the available WAL files. > > Shouldn't it be the default? It seems like the most expected behaviour to me. Yeah, I agree... but there may be scripts for warm-standby based on the existing default behavior. So, I didn't make a new trigger the default. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 25, 2009 at 9:44 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Yeah, I agree... but there may be scripts for warm-standby based on > the existing default behavior. So, I didn't make a new trigger the default. I don't use pg_standby personnaly but I admit I'm quite surprised by the current behaviour. I'm pretty sure a lot of the current users would be surprised too. -- Guillaume
Hi, On Wed, Mar 25, 2009 at 5:55 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Wed, Mar 25, 2009 at 9:44 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Yeah, I agree... but there may be scripts for warm-standby based on >> the existing default behavior. So, I didn't make a new trigger the default. > > I don't use pg_standby personnaly but I admit I'm quite surprised by > the current behaviour. I'm pretty sure a lot of the current users > would be surprised too. The current behavior is documented as follows, so it may be taken for granted by some users. I think that we shouldn't ignore such users. --------------- Specify a trigger file whose presence should cause recovery to end whether or not the next WAL file is available. --------------- Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
>>> Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Wed, Mar 25, 2009 at 9:44 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Yeah, I agree... but there may be scripts for warm-standby based on >> the existing default behavior. So, I didn't make a new trigger the default. > > I don't use pg_standby personnaly but I admit I'm quite surprised by > the current behaviour. I'm pretty sure a lot of the current users > would be surprised too. I find it hard to imagine a use case for the existing default behavior. -Kevin
On Wed, Mar 25, 2009 at 2:59 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > I find it hard to imagine a use case for the existing default > behavior. I thought a bit about it and I think it can be useful when your priority is the availability of the service and you don't consider a data loss that important: even if you have a lot of WALs segments to replay, you may want to have your service up immediately in case of a major problem. Keeping it is a good idea IMHO but I don't think it should be the default. -- Guillaume
Hi, On Thu, Mar 26, 2009 at 12:48 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Wed, Mar 25, 2009 at 2:59 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> I find it hard to imagine a use case for the existing default >> behavior. > > I thought a bit about it and I think it can be useful when your > priority is the availability of the service and you don't consider a > data loss that important: even if you have a lot of WALs segments to > replay, you may want to have your service up immediately in case of a > major problem. Yes, I also think that this is likely use case. > Keeping it is a good idea IMHO but I don't think it should be the default. What does "the default" mean? You mean that new trigger should use the existing trigger option character (-t)? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Thu, Mar 26, 2009 at 12:48 AM, Guillaume Smet > <guillaume.smet@gmail.com> wrote: >> On Wed, Mar 25, 2009 at 2:59 PM, Kevin Grittner >> <Kevin.Grittner@wicourts.gov> wrote: >>> I find it hard to imagine a use case for the existing default >>> behavior. >> I thought a bit about it and I think it can be useful when your >> priority is the availability of the service and you don't consider a >> data loss that important: even if you have a lot of WALs segments to >> replay, you may want to have your service up immediately in case of a >> major problem. > > Yes, I also think that this is likely use case. > >> Keeping it is a good idea IMHO but I don't think it should be the default. > > What does "the default" mean? You mean that new trigger should use > the existing trigger option character (-t)? The existing behavior doesn't seem very useful to me either. Assuming there is a use case though, we probably need to support both at the same time, perhaps using different trigger files. If there's a use case for both, conceivably someone will want to sometimes trigger the failover immediately and sometimes after all WAL segments have been replayed. Whatever we do, the signaling method to trigger failover should behave the same. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > What does "the default" mean? You mean that new trigger should use > the existing trigger option character (-t)? Yes, that's my point. I understand it seems weird to switch the options but I'm pretty sure a lot of persons currently using -t would be surprised by the current behaviour. Moreover playing all the remaining WALs before starting up should be the most natural option when people are looking in the help. That said, it would be nice to hear from people really using pg_standby to know if they understand how it works now and if it's what they intended when they set it up. -- Guillaume
Hi, Guillaume Smet wrote: > On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> What does "the default" mean? You mean that new trigger should use >> the existing trigger option character (-t)? > > Yes, that's my point. > > I understand it seems weird to switch the options but I'm pretty sure > a lot of persons currently using -t would be surprised by the current > behaviour. Moreover playing all the remaining WALs before starting up > should be the most natural option when people are looking in the help. > > That said, it would be nice to hear from people really using > pg_standby to know if they understand how it works now and if it's > what they intended when they set it up. My fault not RTFM well enough, but I was surprised finding out that -t is working like that. +1 for me to switch -t to the new behaviour. Cheers -- Matteo Beccati OpenX - http://www.openx.org
On Thu, 2009-03-26 at 08:32 +0100, Guillaume Smet wrote: > On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > > What does "the default" mean? You mean that new trigger should use > > the existing trigger option character (-t)? > > Yes, that's my point. > > I understand it seems weird to switch the options but I'm pretty sure > a lot of persons currently using -t would be surprised by the current > behaviour. Moreover playing all the remaining WALs before starting up > should be the most natural option when people are looking in the help. If the standby has fallen behind then waiting for it to catch up might take hours to failover if it waits for all files. If you haven't been monitoring it correctly, you have no clue. That is also a surprising thing, so let's not jump from one surprising thing into the arms of another. If we go with this, I would suggest we make *neither* the default by removing -t, and adopting two new options: something like -f == fast failover, -p == patient failover. This then forces people to read and understand the difference between the two behaviours so they can make an informed choice of how they would like to act at this critical point in time. It is justifiable because there is no single thing called a trigger file any longer and the concept will lead to pain. Earlier, we discussed having a single trigger file that contains an option rather than two distinct trigger files. That design is better because it allows the user to choose at failover time, rather than making a binding decision at config time. That solution would be the ideal one, IMHO, because it gives user more choice - and would allow us to keep the -t option meaningfully. In that case the default should be patience. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi Simon. On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Earlier, we discussed having a single trigger file that contains an > option rather than two distinct trigger files. That design is better > because it allows the user to choose at failover time, rather than > making a binding decision at config time. That solution would be the > ideal one, IMHO, because it gives user more choice - and would allow us > to keep the -t option meaningfully. In that case the default should be > patience. Or you can define both files in your command line to have the choice. I like the idea of removing -t and adding 2 new options so that people are warned about the intended behavior. Anyway, I don't have a strong opinion about how we should fix it as I don't use pg_standby personnally, just that we should. The two options you mention have their own merits. -- Guillaume
Hi, On Thu, Mar 26, 2009 at 8:54 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > Hi Simon. > > On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Earlier, we discussed having a single trigger file that contains an >> option rather than two distinct trigger files. That design is better >> because it allows the user to choose at failover time, rather than >> making a binding decision at config time. That solution would be the >> ideal one, IMHO, because it gives user more choice - and would allow us >> to keep the -t option meaningfully. In that case the default should be >> patience. > > Or you can define both files in your command line to have the choice. Personally I like this. > I like the idea of removing -t and adding 2 new options so that people > are warned about the intended behavior. OK, I'll change the patch as Simon suggested; removing -t and adding two new options: -f = fast failover (existing behavior), -p patient failover. Also I'll default the patient failover, so it's performed when the signal (SIGINT or SIGUSR1) is received. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Simon Riggs wrote: > If we go with this, I would suggest we make *neither* the default by > removing -t, and adopting two new options: something like -f == fast > failover, -p == patient failover. -m smart|fast|immediate :-)
On Fri, 2009-03-27 at 13:56 +0200, Peter Eisentraut wrote: > Simon Riggs wrote: > > If we go with this, I would suggest we make *neither* the default by > > removing -t, and adopting two new options: something like -f == fast > > failover, -p == patient failover. > > -m smart|fast|immediate :-) Yes, a better suggestion. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Fri, Mar 27, 2009 at 12:56 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > Simon Riggs wrote: >> >> If we go with this, I would suggest we make *neither* the default by >> removing -t, and adopting two new options: something like -f == fast >> failover, -p == patient failover. > > -m smart|fast|immediate :-) The advantage of having 2 options (or the ability to put a string value in the trigger file) is that you can choose the behaviour when you need to trigger it (you just have to use the 2 options with 2 different filenames). I don't think it's the case with your proposal. -- Guillaume
On Fri, Mar 27, 2009 at 3:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > OK, I'll change the patch as Simon suggested; removing -t and adding > two new options: -f = fast failover (existing behavior), -p patient failover. > Also I'll default the patient failover, so it's performed when the signal > (SIGINT or SIGUSR1) is received. I'm wondering if we should consider backpatching this one. Even if the feature works as advertised in the documentation. It's a very surprising behaviour and I'm pretty sure someone will shoot himself in the foot with it, if not already done. Considering backpatching might change the way we want to fix it. -- Guillaume
Fujii Masao wrote: > On Thu, Mar 26, 2009 at 8:54 PM, Guillaume Smet > <guillaume.smet@gmail.com> wrote: >> On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I like the idea of removing -t and adding 2 new options so that people >> are warned about the intended behavior. > > OK, I'll change the patch as Simon suggested; removing -t and adding > two new options: -f = fast failover (existing behavior), -p patient failover. > Also I'll default the patient failover, so it's performed when the signal > (SIGINT or SIGUSR1) is received. Uh oh, that's going to be quite tricky with signals. Remember that pg_standby is called for each file. A trigger file persists until it's deleted, but a signal will only be received by the pg_standby instance that happens to be running at the time. Makes me wonder if the trigger pg_standby with signals is reliable to begin with. What if the backend is just processing a file when the signal is fired, and there's no pg_standby process running at the moment to receive it? Seems like the signaler needs to loop until it has successfully delivered the signal to a pg_standby process, which seems pretty ugly. Given all the recent trouble with signals, and the fact that it's undocumented, perhaps we should just rip out the signaling support from pg_standby. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-03-27 at 13:19 +0100, Guillaume Smet wrote: > On Fri, Mar 27, 2009 at 12:56 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > Simon Riggs wrote: > >> > >> If we go with this, I would suggest we make *neither* the default by > >> removing -t, and adopting two new options: something like -f == fast > >> failover, -p == patient failover. > > > > -m smart|fast|immediate :-) > > The advantage of having 2 options (or the ability to put a string > value in the trigger file) is that you can choose the behaviour when > you need to trigger it (you just have to use the 2 options with 2 > different filenames). I don't think it's the case with your proposal. Yes, sorry. I meant we should use the naming Peter suggests. So we would have two triggers, but call them fast and smart, rather than fast and patient. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Peter Eisentraut <peter_e@gmx.net> writes: > Simon Riggs wrote: >> If we go with this, I would suggest we make *neither* the default by >> removing -t, and adopting two new options: something like -f == fast >> failover, -p == patient failover. > -m smart|fast|immediate :-) +1 for using a "-m something" type of syntax instead of having to try to pick single-letter switches that are mnemonic for the different cases. But -1 to those particular mode names --- I think it will invite confusion with pg_ctl's behavior. regards, tom lane
On Fri, 2009-03-27 at 10:25 -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Simon Riggs wrote: > >> If we go with this, I would suggest we make *neither* the default by > >> removing -t, and adopting two new options: something like -f == fast > >> failover, -p == patient failover. > > > -m smart|fast|immediate :-) > > +1 for using a "-m something" type of syntax instead of having to try to > pick single-letter switches that are mnemonic for the different cases. > But -1 to those particular mode names --- I think it will invite > confusion with pg_ctl's behavior. The choice is between * one parameter with the option being given as text within trigger file * two parameters naming different types of trigger file I don't mind which, as long as it is one of those two, unless there is a third way to specify things so that user has control at failover time. A single -m option would hardcode that decision ahead of time, which is undesirable behaviour, hence the additional complexity being discussed. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Fri, Mar 27, 2009 at 9:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Uh oh, that's going to be quite tricky with signals. Remember that > pg_standby is called for each file. A trigger file persists until it's > deleted, but a signal will only be received by the pg_standby instance that > happens to be running at the time. You are right! > Makes me wonder if the trigger pg_standby with signals is reliable to begin > with. What if the backend is just processing a file when the signal is > fired, and there's no pg_standby process running at the moment to receive > it? Seems like the signaler needs to loop until it has successfully > delivered the signal to a pg_standby process, which seems pretty ugly. > > Given all the recent trouble with signals, and the fact that it's > undocumented, perhaps we should just rip out the signaling support from > pg_standby. So far, to be frank, I was not sure why the trigger by the signal is necessary for pg_standby. But, now, I think that it's useful when the user has forgotten to specify the trigger file. In this case, without the signaling support, there is no way to do failover in a short time; probably, the user has to do shutdown and restart a recovery from the last restart point, which would take time. So, I'd like to leave the signaling support as a safeguard. As you pointed out, the "smart" failover by signal would be very tricky. So, maybe we should not change the existing behavior of pg_standby when the signal is received. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
<div class="gmail_quote">On Thu, Mar 26, 2009 at 4:54 AM, Guillaume Smet <span dir="ltr"><<a href="mailto:guillaume.smet@gmail.com">guillaume.smet@gmail.com</a>></span>wrote:<br /><blockquote class="gmail_quote"style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi Simon.<br/><div class="im"><br /> On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> > Earlier, we discussed having a singletrigger file that contains an<br /> > option rather than two distinct trigger files. That design is better<br />> because it allows the user to choose at failover time, rather than<br /> > making a binding decision at configtime. That solution would be the<br /> > ideal one, IMHO, because it gives user more choice - and would allow us<br/> > to keep the -t option meaningfully. In that case the default should be<br /> > patience.<br /><br /></div>Oryou can define both files in your command line to have the choice.<br /><br /> I like the idea of removing -t andadding 2 new options so that people<br /> are warned about the intended behavior.<br /><br /> Anyway, I don't have a strongopinion about how we should fix it as I<br /> don't use pg_standby personnally, just that we should. The two options<br/> you mention have their own merits.<br /><br /></blockquote></div><br />For the record, I have been a user ofpg_standby in the past, and never realized this behavioural detail; probably because we never had a huge lag at the slave.<br/><br />From implementation perspective, I'd vote against any new options. That'd just make it harder for backportingto older branches. I like the idea that contents of the trigger file determines the mode of operation.<br /><br/>So, we can keep the existing behaviour where an empty trigger file waits for all WAL files to be applied, hence nosurprises for existing users. With some strong wordings in the docs, we emit a log message like<br /><br /> 'Trigger filefound; Performing patient recovery'<br /><br />that makes the user read the docs and understand the difference, if hehas not already done so. And if there are some contents in the trigger file, then the behaviour depends on that.<br /><br/>If we do not want to go to lengths of reading the file contents, the behaviour can depend on the suffix of the file.<br/><br />pg_standby -t /tmp/mytrigger ... -- lest assume this is the restore_command<br /><br />if( exists /tmp/mytrigger) do default patient recovery with appropriate messages in log<br /> if( exists /tmp/mytrigger.fast ) do stopthe recovery immediately with a message in log<br />if( exists /tmp/mytrigger.normal ) do the patient recovery with messagesin log<br /><br />Best regards,<br />-- <br />gurjeet[.singh]@EnterpriseDB.com<br /> singh.gurjeet@{ gmail | hotmail| indiatimes | yahoo }.com<br /><br />EnterpriseDB <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br/><br />Mail sent from my BlackLaptop device<br />
Hi, On Fri, Mar 27, 2009 at 11:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Fri, 2009-03-27 at 10:25 -0400, Tom Lane wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >> > Simon Riggs wrote: >> >> If we go with this, I would suggest we make *neither* the default by >> >> removing -t, and adopting two new options: something like -f == fast >> >> failover, -p == patient failover. >> >> > -m smart|fast|immediate :-) >> >> +1 for using a "-m something" type of syntax instead of having to try to >> pick single-letter switches that are mnemonic for the different cases. >> But -1 to those particular mode names --- I think it will invite >> confusion with pg_ctl's behavior. > > The choice is between > > * one parameter with the option being given as text within trigger file > > * two parameters naming different types of trigger file > > I don't mind which, as long as it is one of those two, unless there is a > third way to specify things so that user has control at failover time. A > single -m option would hardcode that decision ahead of time, which is > undesirable behaviour, hence the additional complexity being discussed. Thanks for the clarification. I'd like to choose the former because it's more flexible when new trigger action is added to pg_standby in the future. And, as Gurjeet says, it's more friendly to do smart failover (end recovery after all the available WAL are applied) when an empty trigger file exists. I'll change the patch as above. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Wed, Apr 1, 2009 at 11:01 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Hi, > > On Fri, Mar 27, 2009 at 11:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> On Fri, 2009-03-27 at 10:25 -0400, Tom Lane wrote: >>> Peter Eisentraut <peter_e@gmx.net> writes: >>> > Simon Riggs wrote: >>> >> If we go with this, I would suggest we make *neither* the default by >>> >> removing -t, and adopting two new options: something like -f == fast >>> >> failover, -p == patient failover. >>> >>> > -m smart|fast|immediate :-) >>> >>> +1 for using a "-m something" type of syntax instead of having to try to >>> pick single-letter switches that are mnemonic for the different cases. >>> But -1 to those particular mode names --- I think it will invite >>> confusion with pg_ctl's behavior. >> >> The choice is between >> >> * one parameter with the option being given as text within trigger file >> >> * two parameters naming different types of trigger file >> >> I don't mind which, as long as it is one of those two, unless there is a >> third way to specify things so that user has control at failover time. A >> single -m option would hardcode that decision ahead of time, which is >> undesirable behaviour, hence the additional complexity being discussed. > > Thanks for the clarification. > > I'd like to choose the former because it's more flexible when new > trigger action is added to pg_standby in the future. And, as Gurjeet > says, it's more friendly to do smart failover (end recovery after all > the available WAL are applied) when an empty trigger file exists. > I'll change the patch as above. Comments? Here is the patch; - Smart failover is chosen if the trigger file labeled "smart" or an empty one exists. - Fast failover is chosen if the trigger file labeled "fast" exists, the signal (SIGUSR1 or SIGINT) is received or the wait timeout happens. If you notice anything, please feel free to comment. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Apr 3, 2009 at 5:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > Here is the patch; > - Smart failover is chosen if the trigger file labeled "smart" or > an empty one exists. > - Fast failover is chosen if the trigger file labeled "fast" exists, > the signal (SIGUSR1 or SIGINT) is received or the wait timeout > happens. After some further thoughts, +1 for this approach too. I think you imply 'containing "smart"' not 'labeled "smart"'. "Labeled" is confusing IMHO. +1 to change the default behaviour too. All the people discovering the current behaviour are totally surprised. -- Guillaume
Hi, On Wed, Apr 8, 2009 at 6:56 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Fri, Apr 3, 2009 at 5:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> Here is the patch; >> - Smart failover is chosen if the trigger file labeled "smart" or >> an empty one exists. >> - Fast failover is chosen if the trigger file labeled "fast" exists, >> the signal (SIGUSR1 or SIGINT) is received or the wait timeout >> happens. > > After some further thoughts, +1 for this approach too. > > I think you imply 'containing "smart"' not 'labeled "smart"'. > "Labeled" is confusing IMHO. Thanks for the comment! I corrected such confusing expression. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Fujii Masao wrote: > Hi, > > On Wed, Apr 8, 2009 at 6:56 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote: >> On Fri, Apr 3, 2009 at 5:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Here is the patch; >>> - Smart failover is chosen if the trigger file labeled "smart" or >>> an empty one exists. >>> - Fast failover is chosen if the trigger file labeled "fast" exists, >>> the signal (SIGUSR1 or SIGINT) is received or the wait timeout >>> happens. >> After some further thoughts, +1 for this approach too. >> >> I think you imply 'containing "smart"' not 'labeled "smart"'. >> "Labeled" is confusing IMHO. > > Thanks for the comment! > I corrected such confusing expression. > + if (strspn(buf, "smart") == 5 && strncmp(buf, "smart", 5) == 0) > + { The strspn() call seems pointless here. One problem with this patch is that in smart mode, the trigger file is not deleted. That's different from current pg_standby behavior, and makes accidental failovers after one failover more likely. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Thu, Apr 9, 2009 at 9:47 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> + if (strspn(buf, "smart") == 5 && strncmp(buf, "smart", 5) == 0) >> + { > > The strspn() call seems pointless here. OK, I'll get rid of it. > > One problem with this patch is that in smart mode, the trigger file is not > deleted. That's different from current pg_standby behavior, and makes > accidental failovers after one failover more likely. Yes, it's because pg_standby cannot be sure when the trigger file can be removed in smart mode. If the trigger file is deleted as soon as it's found, just like in fast mode, pg_standby may keep waiting for WAL file again. One idea to solve this problem is to tell pg_standby as a command-line argument about whether the trigger file can be removed. That parameter value can be set to 'true' when the last applied record is re-fetched. Though pg_standby is called to restore timeline history files also after that point, the trigger file is already unnecessary (pg_standby doesn't wait for history file). Specifically, if restore_command contains new % option (%e?), it's replaced by the boolean value which indicates whether the trigger file can be deleted. This value is set to 'true' when the startup process re-fetches the last valid record, 'false' otherwise. In smart mode, pg_standby determines whether to delete the trigger file according to that value. Comments? Or, do you have any better idea? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Apr 10, 2009 at 5:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > One idea to solve this problem is to tell pg_standby as a > command-line argument about whether the trigger file can be > removed. That parameter value can be set to 'true' when the last > applied record is re-fetched. Though pg_standby is called to > restore timeline history files also after that point, the trigger file > is already unnecessary (pg_standby doesn't wait for history file). > > Specifically, if restore_command contains new % option (%e?), > it's replaced by the boolean value which indicates whether the > trigger file can be deleted. This value is set to 'true' when the > startup process re-fetches the last valid record, 'false' otherwise. > In smart mode, pg_standby determines whether to delete the > trigger file according to that value. > > Comments? Hmmm, it seems overly complicated but I don't know the code of pg_standby. > Or, do you have any better idea? Wouldn't it be possible to have a global switch (let's name it startCluster, default to false) which is set to true when the trigger file is found for the first time? You would then be able to remove the trigger file and let the cluster start by checking this variable. One more time, I don't know the code of pg_standby so it may be a stupid idea. -- Guillaume
Fujii-san, I like the new patch using the content of the file to determine the mode. Much easier to use at failover time. On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote: > > One problem with this patch is that in smart mode, the trigger file is not > > deleted. That's different from current pg_standby behavior, and makes > > accidental failovers after one failover more likely. > > Yes, it's because pg_standby cannot be sure when the trigger file > can be removed in smart mode. If the trigger file is deleted as soon > as it's found, just like in fast mode, pg_standby may keep waiting > for WAL file again. My understanding of smart mode is fairly simple: if (triggered) {if (smartMode && nextWALfile+1 exists) exit(0);else{ delete trigger file exit(1);} } If you perform a file lookahead (the +1) as shown above then you avoid the problem Heikki observes. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Fri, Apr 10, 2009 at 6:31 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Fri, Apr 10, 2009 at 5:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> One idea to solve this problem is to tell pg_standby as a >> command-line argument about whether the trigger file can be >> removed. That parameter value can be set to 'true' when the last >> applied record is re-fetched. Though pg_standby is called to >> restore timeline history files also after that point, the trigger file >> is already unnecessary (pg_standby doesn't wait for history file). >> >> Specifically, if restore_command contains new % option (%e?), >> it's replaced by the boolean value which indicates whether the >> trigger file can be deleted. This value is set to 'true' when the >> startup process re-fetches the last valid record, 'false' otherwise. >> In smart mode, pg_standby determines whether to delete the >> trigger file according to that value. >> >> Comments? > > Hmmm, it seems overly complicated but I don't know the code of pg_standby. Yes, I'd also like to simplify it more. >> Or, do you have any better idea? > > Wouldn't it be possible to have a global switch (let's name it > startCluster, default to false) which is set to true when the trigger > file is found for the first time? You would then be able to remove the > trigger file and let the cluster start by checking this variable. > > One more time, I don't know the code of pg_standby so it may be a stupid idea. Thanks for the suggestion! Since pg_standby is executed for each file, such variable cannot be taken over to the next execution, i.e. it's reset each time. So, the current patch have left the trigger file until the end. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Sat, Apr 11, 2009 at 1:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Fujii-san, > > I like the new patch using the content of the file to determine the > mode. Much easier to use at failover time. > > On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote: > >> > One problem with this patch is that in smart mode, the trigger file is not >> > deleted. That's different from current pg_standby behavior, and makes >> > accidental failovers after one failover more likely. >> >> Yes, it's because pg_standby cannot be sure when the trigger file >> can be removed in smart mode. If the trigger file is deleted as soon >> as it's found, just like in fast mode, pg_standby may keep waiting >> for WAL file again. > > My understanding of smart mode is fairly simple: > > if (triggered) > { > if (smartMode && nextWALfile+1 exists) > exit(0); > else > { > delete trigger file > exit(1); > } > } > > If you perform a file lookahead (the +1) as shown above then you avoid > the problem Heikki observes. Thanks for the suggestion! A lookahead (the +1) may have pg_standby get stuck as follows. Am I missing something? 1. the trigger file containing "smart" is created. 2. pg_standby is executed. 2-1. nextWALfile is restored. 2-2. the trigger file is deleted because nextWALfile+1 doesn'texist. 3. the restored nextWALfile is applied. 4. pg_standby is executed again to restore nextWALfile+1. 5. pg_standby gets stuck because the trigger file and nextWALfile+1 don't exist. But, a lookahead nextWALfile seems to work fine. if (triggered) { if (smartMode && nextWALfile exists) exit(0) else { delete trigger file exit(1) } } 1. the trigger file containing "smart" is created. 2. pg_standby is executed. 2-1. nextWALfile is restored. 3. the restored nextWALfile is applied. 4. pg_standby is executed again to restore nextWALfile+1. 4-1. the trigger file is deleted because nextWALfile+1 doesn'texist. 5. the startup process fails to read nextWALfile+1. 6. pg_standby is executed again to re-fetch nextWALfile. 6-1. nextWALfile is restored. 6-2. pg_standby doesn't get stuckbecause nextWALfile exists. Furthermore, pg_standby may have to check if nextWALfile exists not only in archiveLocation but also in pg_xlog. Because, when pg_xlog of the primary server can be read at failover, WAL files in it may be copied to pg_xlog of the standby server to be applied. (but, not sure if it's better to copy such files to pg_xlog instead of archiveLocation in this case). Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Apr 13, 2009 at 7:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > 1. the trigger file containing "smart" is created. > 2. pg_standby is executed. > 2-1. nextWALfile is restored. > 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist. > 3. the restored nextWALfile is applied. > 4. pg_standby is executed again to restore nextWALfile+1. I don't think it should happen. IMHO, it's an acceptable compromise to replay all the WAL files present when I created the trigger file. So if I have the smart shutdown trigger file and I don't have any nextWALfile+1, I can remove the trigger file and stop the recovery: pg_standby won't be executed again after that, even if a nextWALfile+1 appeared while replaying the previous WAL file. That said, stupid question: do we have a way to know the nextWALfile+1 name to test if it exists? nextWALfile is transmitted through the restore_command API and I'm wondering if we can have nextWALfile+1 name without changing the restore_command API. -- Guillaume
Hi, On Mon, Apr 13, 2009 at 7:21 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Mon, Apr 13, 2009 at 7:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> 1. the trigger file containing "smart" is created. >> 2. pg_standby is executed. >> 2-1. nextWALfile is restored. >> 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist. >> 3. the restored nextWALfile is applied. >> 4. pg_standby is executed again to restore nextWALfile+1. > > I don't think it should happen. IMHO, it's an acceptable compromise to > replay all the WAL files present when I created the trigger file. So > if I have the smart shutdown trigger file and I don't have any > nextWALfile+1, I can remove the trigger file and stop the recovery: > pg_standby won't be executed again after that, even if a nextWALfile+1 > appeared while replaying the previous WAL file. The scenario which I described is not related to whether the nextWALfile+1 exists or not. To clarify the detail of it; If pg_standby restores nextWALfile, deletes the trigger file and exits with 1 (i.e. tell the end of recovery to the startup process), the startup process considers that pg_standby failed, and tries to read the nextWALfile in pg_xlog instead of the restored file named "RECOVERYXLOG". This is undesirable behavior because some transactions would be lost if nextWALfile in pg_xlog doesn't exist. So, exit(0) should be called when nextWALfile exists. On the other hand, if pg_standby restores the nextWALfile, deletes the trigger file and calls exit(0), the startup process replays the restored file and tries to read the nextWALfile+1 because it doesn't know if the nextWALfile is the last valid WAL file. So, pg_standby may be executed again even after the trigger file is deleted. Am I missing something? > That said, stupid question: do we have a way to know the nextWALfile+1 > name to test if it exists? nextWALfile is transmitted through the > restore_command API and I'm wondering if we can have nextWALfile+1 > name without changing the restore_command API. Probably Yes; the following three steps are required, I think. - Get the timeline, logid and segid from the name of the nextWALfile. - Increment the logid and segid pair using NextLogSeg macro. - Calculate the name of the nextWALfile+1 using XLogFileName macro. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote: > if (triggered) > { > if (smartMode && nextWALfile exists) > exit(0) > else > { > delete trigger file > exit(1) > } > } This looks to be the correct one. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Mon, Apr 13, 2009 at 2:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > But, a lookahead nextWALfile seems to work fine. > > if (triggered) > { > if (smartMode && nextWALfile exists) > exit(0) > else > { > delete trigger file > exit(1) > } > } Umm... in this algorithm, the trigger file remains after failover if the nextWALfile has the invalid record which means the end of WAL files. I'd like to propose another simple idea; pg_standby deletes the trigger file *whenever* the nextWALfile is a timeline history file. A timeline history file is restored at the end of recovery, so it's guaranteed that the trigger file is deleted whether nextWALfile exists or not. A timeline history file is restored also at the beginning of recovery, so the accidentally remaining trigger file is deleted in early warm-standby as a side-effect of this idea. How does that sound? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote: > A lookahead (the +1) may have pg_standby get stuck as follows. > Am I missing something? > > 1. the trigger file containing "smart" is created. > 2. pg_standby is executed. > 2-1. nextWALfile is restored. > 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist. > 3. the restored nextWALfile is applied. > 4. pg_standby is executed again to restore nextWALfile+1. This can't happen. (4) will never occur when (2-2) has occurred. A non-zero error code means file not available which will cause recovery to end and hence no requests for further WAL files are made. It does *seem* as if there is a race condition there in that another WAL file may arrive after we have taken the decision there are no more WAL files, but it's not a problem. That could happen if we issue the trigger while the master is still up, which is a mistake - why would we do that? If we only issue the trigger once we are happy the master is down then we don't get a problem. So lets do it the next+1 way, when triggered. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Tue, Apr 14, 2009 at 6:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote: > >> A lookahead (the +1) may have pg_standby get stuck as follows. >> Am I missing something? >> >> 1. the trigger file containing "smart" is created. >> 2. pg_standby is executed. >> 2-1. nextWALfile is restored. >> 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist. >> 3. the restored nextWALfile is applied. >> 4. pg_standby is executed again to restore nextWALfile+1. > > This can't happen. (4) will never occur when (2-2) has occurred. A > non-zero error code means file not available which will cause recovery > to end and hence no requests for further WAL files are made. When pg_standby exits with non-zero code, (3) and (4) will never occur, and the transactions in nextWALfile will be lost. So, in (2-2), pg_standby has to call exit(0), I think. On the other hand, if exit(0) is called in (2-2), the above scenario happens. > It does *seem* as if there is a race condition there in that another WAL > file may arrive after we have taken the decision there are no more WAL > files, but it's not a problem. That could happen if we issue the trigger > while the master is still up, which is a mistake - why would we do that? > If we only issue the trigger once we are happy the master is down then > we don't get a problem. Yeah, I agree that such race condition is not a problem. The trigger file has to be created after all the WAL files arrive at the standby server. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Warm Standby restore_command documentation (was: New trigger option of pg_standby)
From
Andreas Pflug
Date:
I've been following the thread with growing lack of understanding why this is so hardly discussed, and I went back to the documentation of what the restore_command should do ( http://www.postgresql.org/docs/8.3/static/warm-standby.html ) While the algorithm presented in the pseudocode isn't dealing too good with a situation where the trigger is set while the restore_command is sleeping (this should be handled better in a real implementation), the code says "Restore all wal files. If no more wal files are present, stop restoring if the trigger is set; otherwise wait for a new wal file". Since pg_standby is meant as implementation of restore_command, it has to follow the directive stated above; *anything else is a bug*. pg_standby currently does *not* obey this directive, and has that documented, but a documented bug still is a bug. Conclusion: There's no "new trigger option" needed, instead pg_standby has to be fixed so it does what the warm standby option of postgres needs. The trigger is only to be examined if no more files are restorable, and only once. Regards, Andreas
Re: Warm Standby restore_command documentation (was: New trigger option of pg_standby)
From
Fujii Masao
Date:
Hi, On Wed, Apr 15, 2009 at 3:30 AM, Andreas Pflug <pgadmin@pse-consulting.de> wrote: > I've been following the thread with growing lack of understanding why > this is so hardly discussed, and I went back to the documentation of > what the restore_command should do ( > http://www.postgresql.org/docs/8.3/static/warm-standby.html ) > > While the algorithm presented in the pseudocode isn't dealing too good > with a situation where the trigger is set while the restore_command is > sleeping (this should be handled better in a real implementation), the > code says > > "Restore all wal files. If no more wal files are present, stop restoring > if the trigger is set; otherwise wait for a new wal file". > > Since pg_standby is meant as implementation of restore_command, it has > to follow the directive stated above; *anything else is a bug*. > pg_standby currently does *not* obey this directive, and has that > documented, but a documented bug still is a bug. > > Conclusion: There's no "new trigger option" needed, instead pg_standby > has to be fixed so it does what the warm standby option of postgres > needs. The trigger is only to be examined if no more files are > restorable, and only once. Yeah, as a result of the discussion on that thread, I'll change the default behavior instead of adding new trigger option. But, I'm not going to get rid of the current behavior; it's chosen if the trigger file containing "fast" exists. On the other hand, new behavior is chosen when the trigger file containing "smart" or an empty one exists (default). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I'd like to propose another simple idea; pg_standby deletes the > trigger file *whenever* the nextWALfile is a timeline history file. > A timeline history file is restored at the end of recovery, so it's > guaranteed that the trigger file is deleted whether nextWALfile > exists or not. > > A timeline history file is restored also at the beginning of > recovery, so the accidentally remaining trigger file is deleted > in early warm-standby as a side-effect of this idea. Here is the revised patch as above. If you notice something, please feel free to comment. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Wed, 2009-04-15 at 17:02 +0900, Fujii Masao wrote: > On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > I'd like to propose another simple idea; pg_standby deletes the > > trigger file *whenever* the nextWALfile is a timeline history file. > > A timeline history file is restored at the end of recovery, so it's > > guaranteed that the trigger file is deleted whether nextWALfile > > exists or not. > > > > A timeline history file is restored also at the beginning of > > recovery, so the accidentally remaining trigger file is deleted > > in early warm-standby as a side-effect of this idea. > > Here is the revised patch as above. > > If you notice something, please feel free to comment. Deleting the trigger file when we request a history file works in most cases, but not in all. We also request a history file when we switch timelines, so code comments need slight modification. If take a base backup, switchover and then try to regen the primary from the base backup we would need to switch timelines, which could be problematic. That is unlikely, so we should at least very clearly document the actual behaviour, as we do in the code comments. I think your wording that smart mode guarantees no data will be lost is a little strong. I'd say "on successful completion all WAL records will be replayed resulting in zero data loss". -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi Simon, Thanks for the comments! On Thu, Apr 16, 2009 at 2:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2009-04-15 at 17:02 +0900, Fujii Masao wrote: > >> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> > I'd like to propose another simple idea; pg_standby deletes the >> > trigger file *whenever* the nextWALfile is a timeline history file. >> > A timeline history file is restored at the end of recovery, so it's >> > guaranteed that the trigger file is deleted whether nextWALfile >> > exists or not. >> > >> > A timeline history file is restored also at the beginning of >> > recovery, so the accidentally remaining trigger file is deleted >> > in early warm-standby as a side-effect of this idea. >> >> Here is the revised patch as above. >> >> If you notice something, please feel free to comment. > > Deleting the trigger file when we request a history file works in most > cases, but not in all. We also request a history file when we switch > timelines, so code comments need slight modification. > > If take a base backup, switchover and then try to regen the primary from > the base backup we would need to switch timelines, which could be > problematic. That is unlikely, so we should at least very clearly > document the actual behaviour, as we do in the code comments. "switch timelines" means that a new timeline ID is assigned at the end of archive recovery? If so, even in this case, there is no problem with deleting the trigger file, I think. Or, am I misunderstanding? > I think your wording that smart mode guarantees no data will be lost is > a little strong. I'd say "on successful completion all WAL records will > be replayed resulting in zero data loss". Sounds good. I'll change the wording. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> I'd like to propose another simple idea; pg_standby deletes the >> trigger file *whenever* the nextWALfile is a timeline history file. >> A timeline history file is restored at the end of recovery, so it's >> guaranteed that the trigger file is deleted whether nextWALfile >> exists or not. >> >> A timeline history file is restored also at the beginning of >> recovery, so the accidentally remaining trigger file is deleted >> in early warm-standby as a side-effect of this idea. > > Here is the revised patch as above. I think we have gone off to an overly complicated solution. pg_standby shouldn't need to special-case history files, or know what order the server will ask for them. What's wrong with just this: (ignoring the missing fast option) --- a/contrib/pg_standby/pg_standby.c +++ b/contrib/pg_standby/pg_standby.c @@ -552,8 +552,6 @@ main(int argc, char **argv) break; case 't': /* Trigger file */ triggerPath = optarg; - if (CheckForExternalTrigger()) - exit(1); /* Normal exit, with non-zero */ break; case 'w': /*Max wait time */ maxwaittime = atoi(optarg); ie. only check and delete the trigger file once the server requests a file that doesn't exist. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Mon, Apr 20, 2009 at 6:06 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> >> wrote: >>> >>> I'd like to propose another simple idea; pg_standby deletes the >>> trigger file *whenever* the nextWALfile is a timeline history file. >>> A timeline history file is restored at the end of recovery, so it's >>> guaranteed that the trigger file is deleted whether nextWALfile >>> exists or not. >>> >>> A timeline history file is restored also at the beginning of >>> recovery, so the accidentally remaining trigger file is deleted >>> in early warm-standby as a side-effect of this idea. >> >> Here is the revised patch as above. > > I think we have gone off to an overly complicated solution. pg_standby > shouldn't need to special-case history files, or know what order the server > will ask for them. > > What's wrong with just this: (ignoring the missing fast option) > > --- a/contrib/pg_standby/pg_standby.c > +++ b/contrib/pg_standby/pg_standby.c > @@ -552,8 +552,6 @@ main(int argc, char **argv) > break; > case 't': /* Trigger file */ > triggerPath = optarg; > - if (CheckForExternalTrigger()) > - exit(1); /* Normal exit, with non-zero */ > break; > case 'w': /* Max wait time */ > maxwaittime = atoi(optarg); > > ie. only check and delete the trigger file once the server requests a file > that doesn't exist. Thanks for the suggestion! I have three comments. 1) Though some users want to save the fast failover mode, this solution doesn't provide it. You think that that mode is unnecessary? 2) This solution would go wrong when there are the WAL files in pg_xlog; If pg_xlog of the primary server can be read at failover (it's not node failure but process failure case), the WAL files in it may be copied to pg_xlog of the standby server in order to prevent the transaction loss. On the other hand, we can copy such files to the archival storage instead of pg_xlog. In this case, your simple solution goes well, but recovery would unexpectedly end without the trigger file because those WAL files have the invalid record which means the end of WAL. I'd like to control when the standby will come up as a normal server. So, I think that pg_standby should cope with also the above situation which I described. What is your opinion? 3) This solution has a race condition; some transactions would be lost if WAL file is shipped from the primary server and the trigger file is created, while pg_standby is sleeping, because pg_standby checks previously whether a trigger file exists. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Andreas Pflug wrote: > I've been following the thread with growing lack of understanding why > this is so hardly discussed, and I went back to the documentation of > what the restore_command should do ( > http://www.postgresql.org/docs/8.3/static/warm-standby.html ) > > While the algorithm presented in the pseudocode isn't dealing too good > with a situation where the trigger is set while the restore_command is > sleeping (this should be handled better in a real implementation), the > code says > > "Restore all wal files. If no more wal files are present, stop restoring > if the trigger is set; otherwise wait for a new wal file". > > Since pg_standby is meant as implementation of restore_command, it has > to follow the directive stated above; *anything else is a bug*. > pg_standby currently does *not* obey this directive, and has that > documented, but a documented bug still is a bug. I think you're interpreting the chapter too strongly. The provided pseudo-code is just an example of a suitable restore_command, it doesn't say that pg_standby behaves exactly like that. I agree we should change the default behavior, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Andreas Pflug wrote: >> I've been following the thread with growing lack of understanding why >> this is so hardly discussed, and I went back to the documentation of >> what the restore_command should do ( >> http://www.postgresql.org/docs/8.3/static/warm-standby.html ) >> >> While the algorithm presented in the pseudocode isn't dealing too good >> with a situation where the trigger is set while the restore_command is >> sleeping (this should be handled better in a real implementation), the >> code says >> >> "Restore all wal files. If no more wal files are present, stop restoring >> if the trigger is set; otherwise wait for a new wal file". >> >> Since pg_standby is meant as implementation of restore_command, it has >> to follow the directive stated above; *anything else is a bug*. >> pg_standby currently does *not* obey this directive, and has that >> documented, but a documented bug still is a bug. > > I think you're interpreting the chapter too strongly. The provided > pseudo-code is just an example of a suitable restore_command, it > doesn't say that pg_standby behaves exactly like that. After reading that chapter, I assumed that pg_standby actually does work like this, and skipped reading the pg_standby specific doc.... The pgsql doc tries hard to give best advice for common situations, especially for integrity and safety issues. IMHO it's best to have the warm-standby chapter as reference how things should work for typical use-cases. Regards, Andreas
Fujii Masao wrote: > On Mon, Apr 20, 2009 at 6:06 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> What's wrong with just this: (ignoring the missing fast option) >> >> --- a/contrib/pg_standby/pg_standby.c >> +++ b/contrib/pg_standby/pg_standby.c >> @@ -552,8 +552,6 @@ main(int argc, char **argv) >> break; >> case 't': /* Trigger file */ >> triggerPath = optarg; >> - if (CheckForExternalTrigger()) >> - exit(1); /* Normal exit, with non-zero */ >> break; >> case 'w': /* Max wait time */ >> maxwaittime = atoi(optarg); >> >> ie. only check and delete the trigger file once the server requests a file >> that doesn't exist. > > Thanks for the suggestion! I have three comments. > > 1) > Though some users want to save the fast failover mode, this > solution doesn't provide it. You think that that mode is > unnecessary? No I just left it out to keep it simple. The patch was only meant to make the point, I didn't intend it to be applied as is. > 2) > This solution would go wrong when there are the WAL files in > pg_xlog; If pg_xlog of the primary server can be read at failover > (it's not node failure but process failure case), the WAL files in > it may be copied to pg_xlog of the standby server in order to > prevent the transaction loss. > > On the other hand, we can copy such files to the archival storage > instead of pg_xlog. In this case, your simple solution goes well, > but recovery would unexpectedly end without the trigger file > because those WAL files have the invalid record which means > the end of WAL. > > I'd like to control when the standby will come up as a normal > server. So, I think that pg_standby should cope with also the > above situation which I described. What is your opinion? Hmm, the user manual instructs to copy any unarchived files directly into pg_xlog, but I guess you could copy them to the archive instead. At the end of archive recovery, the server always probes for the timeline by requesting history files until it fails to find one. That probing should remove the trigger file if it hasn't been removed by then. It's a bit coincidental to rely on that, but at least it's simple. The assumption we're making is that the server won't exit recovery before asking restore_command for a file that doesn't exist. > 3) > This solution has a race condition; some transactions would > be lost if WAL file is shipped from the primary server and the > trigger file is created, while pg_standby is sleeping, because > pg_standby checks previously whether a trigger file exists. Yeah, it should sleep only after checking for the trigger file. That doesn't completely eliminate the race condition though: I think it should check again that the WAL file doesn't exist after reading the trigger file but before deleting it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2009-04-20 at 17:47 +0300, Heikki Linnakangas wrote: > At the end of archive recovery, the server always probes for the > timeline by requesting history files until it fails to find one. That > probing should remove the trigger file if it hasn't been removed by > then. It's a bit coincidental to rely on that, but at least it's simple. > The assumption we're making is that the server won't exit recovery > before asking restore_command for a file that doesn't exist. If you really want to simplify this, then we should have a final_command parameter for a command to be executed at the end of recovery. That would make the change to pg_standby very simple and allow for a very simple final_command also. That would make the logic similar to what we do for aggregates: transition function and final function. We could call it restore_cleanup_command or something similar. I suspect this option will make you consider Fujii-san's patch in a better light. :-) -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Mon, 2009-04-20 at 17:47 +0300, Heikki Linnakangas wrote: > >> At the end of archive recovery, the server always probes for the >> timeline by requesting history files until it fails to find one. That >> probing should remove the trigger file if it hasn't been removed by >> then. It's a bit coincidental to rely on that, but at least it's simple. >> The assumption we're making is that the server won't exit recovery >> before asking restore_command for a file that doesn't exist. > > If you really want to simplify this, then we should have a final_command > parameter for a command to be executed at the end of recovery. That > would make the change to pg_standby very simple and allow for a very > simple final_command also. That would make the logic similar to what we > do for aggregates: transition function and final function. > > We could call it restore_cleanup_command or something similar. Hmm, that might indeed be a cleaner interface. However, that throws the idea of backpatching out of the window, and will make it impossible to run a PG 8.4 pg_standby against a PG 8.3 server. Do we want to add the new parameter for 8.4 anyway? > I suspect this option will make you consider Fujii-san's patch in a > better light. :-) No, removing trigger file as soon as a non-existant file is requested still seems simpler than deleting it whenever a timeline history file is requested. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-04-21 at 14:17 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Mon, 2009-04-20 at 17:47 +0300, Heikki Linnakangas wrote: > > > >> At the end of archive recovery, the server always probes for the > >> timeline by requesting history files until it fails to find one. That > >> probing should remove the trigger file if it hasn't been removed by > >> then. It's a bit coincidental to rely on that, but at least it's simple. > >> The assumption we're making is that the server won't exit recovery > >> before asking restore_command for a file that doesn't exist. > > > > If you really want to simplify this, then we should have a final_command > > parameter for a command to be executed at the end of recovery. That > > would make the change to pg_standby very simple and allow for a very > > simple final_command also. That would make the logic similar to what we > > do for aggregates: transition function and final function. > > > > We could call it restore_cleanup_command or something similar. > > Hmm, that might indeed be a cleaner interface. However, that throws the > idea of backpatching out of the window, and will make it impossible to > run a PG 8.4 pg_standby against a PG 8.3 server. Do we want to add the > new parameter for 8.4 anyway? Perhaps, let's see how we resolve the perceived 8.2 and 8.3 issues. > > I suspect this option will make you consider Fujii-san's patch in a > > better light. :-) > > No, removing trigger file as soon as a non-existant file is requested > still seems simpler than deleting it whenever a timeline history file is > requested. If you do this, then you would have to change the procedure written into the 8.3 docs also. Docs aren't backpatchable. What you propose is *better* than raw pg_standby is now, but still not enough in all cases, as I think you know. Simple isn't the requirement here, is it? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > If you do this, then you would have to change the procedure written into > the 8.3 docs also. Docs aren't backpatchable. > > What you propose is *better* than raw pg_standby is now, but still not > enough in all cases, as I think you know. No, I don't. What is the case where it doesn't work? > Simple isn't the requirement here, is it? Simplicity is always a virtue, because it leads to maintainability. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-04-21 at 14:28 +0300, Heikki Linnakangas wrote: > > Simple isn't the requirement here, is it? > > Simplicity is always a virtue, because it leads to maintainability. "Simple enough" is a virtue. Less than that is not... -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Simon Riggs wrote: >> >> If you do this, then you would have to change the procedure written into >> the 8.3 docs also. Docs aren't backpatchable. >> >> What you propose is *better* than raw pg_standby is now, but still not >> enough in all cases, as I think you know. > > No, I don't. What is the case where it doesn't work? It's the case which I described as the 2nd comment to your proposal. 1. pg_standby tries to restore a non-existent file 1-1. remove the trigger file 1-2. pg_standby exits with non-zero code 2. the startup process tries to read it from pg_xlog 2-1. it is applied 3. the startup process tries to restore the next file using pg_standby 3-1. pg_standby gets *stuck* since the requested file and trigger file don't exist. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Simon Riggs wrote: >>> What you propose is *better* than raw pg_standby is now, but still not >>> enough in all cases, as I think you know. >> No, I don't. What is the case where it doesn't work? > > It's the case which I described as the 2nd comment to your > proposal. > > 1. pg_standby tries to restore a non-existent file > 1-1. remove the trigger file > 1-2. pg_standby exits with non-zero code > 2. the startup process tries to read it from pg_xlog > 2-1. it is applied > 3. the startup process tries to restore the next file using pg_standby > 3-1. pg_standby gets *stuck* since the requested file and trigger file > don't exist. Ahh, ok, I didn't understand the issue correctly before. But wait a minute, we already have exactly the same problem with the current 8.2/8.3 pg_standby, don't we? [tests]. Yes, we do. Simon's suggestion of a separate restore_completion_command is very attractive as it would provide an explicit place to hook up the deletion of the trigger file. It seems useful anyway, you might want to put a command there to e.g update a log file or launch some custom daemon software when the recovery ends. The question then is what to do with 8.2 and 8.3? Even if we decided to keep the behavior that the failover is triggered immediately (fast mode), pg_standby getting stuck if you copy any WAL files directly into pg_xlog seems like a bug that needs to be fixed. Fujii's idea of deleting the trigger file when history file is requested is the only proposal this far that works and doesn't require changes to people's config files, so I guess that's what we'll have to do at least for back-branches. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Fujii Masao wrote: > Hi, > > On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > >> Simon Riggs wrote: >> >>> If you do this, then you would have to change the procedure written into >>> the 8.3 docs also. Docs aren't backpatchable. >>> >>> What you propose is *better* than raw pg_standby is now, but still not >>> enough in all cases, as I think you know. >>> >> No, I don't. What is the case where it doesn't work? >> > > It's the case which I described as the 2nd comment to your > proposal. > > 1. pg_standby tries to restore a non-existent file > 1-1. remove the trigger file > 1-2. pg_standby exits with non-zero code > 2. the startup process tries to read it from pg_xlog > 2-1. it is applied > 3. the startup process tries to restore the next file using pg_standby > I'm a little confused. After pg_standby returned non-zero as indication for end-of-recovery, the startup process shouldn't request another file from pg_standby, right? Which means 3. should never happen (unless the startup process stalls and restarts, in which case I find it normal that another trigger required). Regards, Andreas
On Tue, 2009-04-21 at 15:55 +0300, Heikki Linnakangas wrote: > Fujii Masao wrote: > > On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas > > <heikki.linnakangas@enterprisedb.com> wrote: > >> Simon Riggs wrote: > >>> What you propose is *better* than raw pg_standby is now, but still not > >>> enough in all cases, as I think you know. > >> No, I don't. What is the case where it doesn't work? > > > > It's the case which I described as the 2nd comment to your > > proposal. > > > > 1. pg_standby tries to restore a non-existent file > > 1-1. remove the trigger file > > 1-2. pg_standby exits with non-zero code > > 2. the startup process tries to read it from pg_xlog > > 2-1. it is applied > > 3. the startup process tries to restore the next file using pg_standby > > 3-1. pg_standby gets *stuck* since the requested file and trigger file > > don't exist. > > Ahh, ok, I didn't understand the issue correctly before. > > But wait a minute, we already have exactly the same problem with the > current 8.2/8.3 pg_standby, don't we? [tests]. Yes, we do. > > Simon's suggestion of a separate restore_completion_command is very > attractive as it would provide an explicit place to hook up the deletion > of the trigger file. It seems useful anyway, you might want to put a > command there to e.g update a log file or launch some custom daemon > software when the recovery ends. The question then is what to do with > 8.2 and 8.3? Even if we decided to keep the behavior that the failover > is triggered immediately (fast mode), pg_standby getting stuck if you > copy any WAL files directly into pg_xlog seems like a bug that needs to > be fixed. > > Fujii's idea of deleting the trigger file when history file is requested > is the only proposal this far that works and doesn't require changes to > people's config files, so I guess that's what we'll have to do at least > for back-branches. Agreed. Fujii-san's proposal is the only one that covers all the important things. The assumptions need careful documentation, as you say. The idea of a restore_completion_command does still sound attractive, easy to implement and non-intrusive enough to do so right now. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Andreas Pflug wrote: > I'm a little confused. After pg_standby returned non-zero as indication > for end-of-recovery, the startup process shouldn't request another file > from pg_standby, right? Non-zero return value from restore_command doesn't mean end-of-recovery, it means file-not-found. The server will try to open the WAL file from pg_xlog if it's not found in archive (= restore_command returned non-zero). If it's found in pg_xlog, it will then ask for the next WAL file from the archive again. And on top of that, the server will ask for history files until it finds one that doesn't exist to figure out the new timeline. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Apr 21, 2009 at 12:25:50PM +0100, Simon Riggs wrote: > > No, removing trigger file as soon as a non-existant file is > > requested still seems simpler than deleting it whenever a timeline > > history file is requested. > > If you do this, then you would have to change the procedure written > into the 8.3 docs also. Docs aren't backpatchable. Are you sure? We've found errors in them before and fixed them. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Fujii Masao wrote: > Hi, > > On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> I'd like to propose another simple idea; pg_standby deletes the >> trigger file *whenever* the nextWALfile is a timeline history file. >> A timeline history file is restored at the end of recovery, so it's >> guaranteed that the trigger file is deleted whether nextWALfile >> exists or not. >> >> A timeline history file is restored also at the beginning of >> recovery, so the accidentally remaining trigger file is deleted >> in early warm-standby as a side-effect of this idea. > > Here is the revised patch as above. > > If you notice something, please feel free to comment. Ok, looking at this in more detail now. A couple of small things: We mustn't remove the trigger file immediately even in fast mode. As noted elsewhere in this thread, we have the same bug in fast mode where pg_standby gets stuck if you copy WAL files directly into pg_xlog. pg_standby should exit with code 0 only if the file is restore successfully. As the patch stands it also returns 0 when smart failover is done. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, Thanks for reviewing the patch! On Wed, Apr 22, 2009 at 4:27 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> Hi, >> >> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> >> wrote: >>> >>> I'd like to propose another simple idea; pg_standby deletes the >>> trigger file *whenever* the nextWALfile is a timeline history file. >>> A timeline history file is restored at the end of recovery, so it's >>> guaranteed that the trigger file is deleted whether nextWALfile >>> exists or not. >>> >>> A timeline history file is restored also at the beginning of >>> recovery, so the accidentally remaining trigger file is deleted >>> in early warm-standby as a side-effect of this idea. >> >> Here is the revised patch as above. >> >> If you notice something, please feel free to comment. > > Ok, looking at this in more detail now. A couple of small things: > > We mustn't remove the trigger file immediately even in fast mode. As noted > elsewhere in this thread, we have the same bug in fast mode where pg_standby > gets stuck if you copy WAL files directly into pg_xlog. Yes, there is the same problem also in fast mode. But, in fast mode, the trigger file has to be deleted immediately if it's found. Otherwise, recovery may fail as follows. 1. pg_standby finds the trigger file for fast mode, and returns non-zero without deleting the trigger file. 2. the startup process tries to read the WAL file from pg_xlog, but it's not found. 3. the startup process tries to restore the last applied WAL file using pg_standby. 4. (Again) pg_standby finds the trigger file for fast mode, and returns non-zero without deleting the trigger file. 5. the startup process tries to read the last applied WAL file, but it's not found. (though the last applied file wasof course restored before, the restored one cannot be read here) 6. recovery fails because the last applied WAL file cannot be read. On the other hand, if pg_standby returns 0 also in fast mode when the requested file and trigger file exist, ISTM that there is not much difference between fast and smart mode; also in fast mode, all the available WAL files would be applied. So, I left fast mode as it was (as much as possible) though it has the above problem, which is for backward compatibility. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, 2009-04-21 at 09:59 -0700, David Fetter wrote: > On Tue, Apr 21, 2009 at 12:25:50PM +0100, Simon Riggs wrote: > > > No, removing trigger file as soon as a non-existant file is > > > requested still seems simpler than deleting it whenever a timeline > > > history file is requested. > > > > If you do this, then you would have to change the procedure written > > into the 8.3 docs also. Docs aren't backpatchable. > > Are you sure? We've found errors in them before and fixed them. The proposed change is not a bug fix -- hmmm, or is it? I think we have a way that does not require this... -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Fujii Masao wrote: > On Wed, Apr 22, 2009 at 4:27 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Fujii Masao wrote: >>> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> >>> wrote: >>>> I'd like to propose another simple idea; pg_standby deletes the >>>> trigger file *whenever* the nextWALfile is a timeline history file. >>>> A timeline history file is restored at the end of recovery, so it's >>>> guaranteed that the trigger file is deleted whether nextWALfile >>>> exists or not. >>>> >>>> A timeline history file is restored also at the beginning of >>>> recovery, so the accidentally remaining trigger file is deleted >>>> in early warm-standby as a side-effect of this idea. >>> Here is the revised patch as above. >>> >>> If you notice something, please feel free to comment. >> Ok, looking at this in more detail now. A couple of small things: >> >> We mustn't remove the trigger file immediately even in fast mode. As noted >> elsewhere in this thread, we have the same bug in fast mode where pg_standby >> gets stuck if you copy WAL files directly into pg_xlog. > > Yes, there is the same problem also in fast mode. But, in fast > mode, the trigger file has to be deleted immediately if it's found. > Otherwise, recovery may fail as follows. > > 1. pg_standby finds the trigger file for fast mode, and returns > non-zero without deleting the trigger file. > 2. the startup process tries to read the WAL file from pg_xlog, > but it's not found. > 3. the startup process tries to restore the last applied WAL file > using pg_standby. > 4. (Again) pg_standby finds the trigger file for fast mode, and > returns non-zero without deleting the trigger file. > 5. the startup process tries to read the last applied WAL file, > but it's not found. > (though the last applied file was of course restored before, > the restored one cannot be read here) > 6. recovery fails because the last applied WAL file cannot be > read. > > On the other hand, if pg_standby returns 0 also in fast mode > when the requested file and trigger file exist, ISTM that there > is not much difference between fast and smart mode; also in > fast mode, all the available WAL files would be applied. Hmm, pg_standby could truncate the trigger file, so that it acts like a smart trigger in the subsequent pg_standby invocations. Assuming the postgres user has write access to it; it probably does because it can delete it, but conceivably it has only read access on the file but write access on the directory it's in. This is getting complicated, though. I guess it would be enough to document that you mustn't copy any extra files into pg_xlog if you use a fast trigger. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Thu, Apr 23, 2009 at 4:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > This is getting complicated, though. I guess it would be enough to document > that you mustn't copy any extra files into pg_xlog if you use a fast > trigger. Agreed. I added this note into document. Attached is the updated patch. I also fixed my bug which pg_standby returns 0 even if the requested file fails to be restored in smart mode. This patch is ready to commit, I think. Please review this. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Fujii Masao wrote: > On Thu, Apr 23, 2009 at 4:49 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> This is getting complicated, though. I guess it would be enough to document >> that you mustn't copy any extra files into pg_xlog if you use a fast >> trigger. > > Agreed. I added this note into document. > > Attached is the updated patch. I also fixed my bug which > pg_standby returns 0 even if the requested file fails to be > restored in smart mode. > > This patch is ready to commit, I think. Please review this. Looking at this again.. Deleting the trigger file when a history file is requested: > /* > * Get rid of the trigger file at the end of archive recovery. > * Otherwise, it would unexpectedly cause the subsequent warm-standby to > * end. > * > * Here is the right place to remove the trigger file since a timeline > * history file is requested only at the beginning and end of archive > * recovery. > */ changes the behavior in a subtle way: if you create trigger file before starting recovery, it will be deleted when the recovery is started and no failover is done. Currently, it will end the recovery immediately. That makes me uncomfortable to back-patch this. That change in behavior might be hard to work-around: the process that creates the trigger file would have to make sure that the server has started recovery before creating the file. Here's another idea: Let's modify xlog.c so that when the server asks for WAL file X, and restore_command returns "not found", the server will not ask for any WAL files >= X again (in that recovery session). Presumably if X doesn't exist, no later files will exist either. That would be pretty simple change, and it would allow us to go with the simpler implementation in pg_standby and just remove the trigger file immediately when it returns "not found" (instead of removing it when history file is requested). That would make it safe to copy extra WAL files into pg_xlog, even in fast failover mode. Does anyone see a hole in that idea? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-05-12 at 14:15 +0300, Heikki Linnakangas wrote: > Here's another idea: Let's modify xlog.c so that when the server asks > for WAL file X, and restore_command returns "not found", the server > will not ask for any WAL files >= X again (in that recovery session). > Presumably if X doesn't exist, no later files will exist either. That was proposed and rejected earlier, though not in this context. Sounds fine to me. I agree that we should simplify/clarify the file request pattern, which is the source of many difficulties over last few years. I would further propose that I take pg_standby out as a module, so we can more easily support earlier releases. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Tue, 2009-05-12 at 14:15 +0300, Heikki Linnakangas wrote: > >> Here's another idea: Let's modify xlog.c so that when the server asks >> for WAL file X, and restore_command returns "not found", the server >> will not ask for any WAL files >= X again (in that recovery session). >> Presumably if X doesn't exist, no later files will exist either. > > That was proposed and rejected earlier, though not in this context. Got a link? I'd like to check the past discussion. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-05-12 at 14:38 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Tue, 2009-05-12 at 14:15 +0300, Heikki Linnakangas wrote: > > > >> Here's another idea: Let's modify xlog.c so that when the server asks > >> for WAL file X, and restore_command returns "not found", the server > >> will not ask for any WAL files >= X again (in that recovery session). > >> Presumably if X doesn't exist, no later files will exist either. > > > > That was proposed and rejected earlier, though not in this context. > > Got a link? I'd like to check the past discussion. I'm sorry, I don't. One of the recovery related discussions started by me. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, On Tue, May 12, 2009 at 8:15 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> On Thu, Apr 23, 2009 at 4:49 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> This is getting complicated, though. I guess it would be enough to >>> document >>> that you mustn't copy any extra files into pg_xlog if you use a fast >>> trigger. >> >> Agreed. I added this note into document. >> >> Attached is the updated patch. I also fixed my bug which >> pg_standby returns 0 even if the requested file fails to be >> restored in smart mode. >> >> This patch is ready to commit, I think. Please review this. > > Looking at this again.. > > Deleting the trigger file when a history file is requested: > >> /* >> * Get rid of the trigger file at the end of archive >> recovery. >> * Otherwise, it would unexpectedly cause the subsequent >> warm-standby to >> * end. >> * >> * Here is the right place to remove the trigger file since >> a timeline >> * history file is requested only at the beginning and end >> of archive >> * recovery. >> */ > > changes the behavior in a subtle way: if you create trigger file before > starting recovery, it will be deleted when the recovery is started and no > failover is done. Currently, it will end the recovery immediately. > > That makes me uncomfortable to back-patch this. That change in behavior > might be hard to work-around: the process that creates the trigger file > would have to make sure that the server has started recovery before creating > the file. > > Here's another idea: Let's modify xlog.c so that when the server asks for > WAL file X, and restore_command returns "not found", the server will not ask > for any WAL files >= X again (in that recovery session). Presumably if X > doesn't exist, no later files will exist either. That would be pretty simple > change, and it would allow us to go with the simpler implementation in > pg_standby and just remove the trigger file immediately when it returns "not > found" (instead of removing it when history file is requested). That would > make it safe to copy extra WAL files into pg_xlog, even in fast failover > mode. > > Does anyone see a hole in that idea? Probably yes. The trigger file would remain after failover if the restored WAL file has the invalid record which means the end of WAL. In this case, "not found" is not returned. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Tue, May 12, 2009 at 8:15 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Here's another idea: Let's modify xlog.c so that when the server asks for >> WAL file X, and restore_command returns "not found", the server will not ask >> for any WAL files >= X again (in that recovery session). Presumably if X >> doesn't exist, no later files will exist either. That would be pretty simple >> change, and it would allow us to go with the simpler implementation in >> pg_standby and just remove the trigger file immediately when it returns "not >> found" (instead of removing it when history file is requested). That would >> make it safe to copy extra WAL files into pg_xlog, even in fast failover >> mode. >> >> Does anyone see a hole in that idea? > > Probably yes. The trigger file would remain after failover if the > restored WAL file has the invalid record which means the end > of WAL. In this case, "not found" is not returned. Yep. That's not pleasant either :-(. I don't think we're going to get this to work reliably without extending the interface between the backend and restore_command. We've discussed many methods and there's always some nasty corner-case like that. I think we should leave back-branches as is, and go with Simon's suggestion to add new "recovery_end_command" that's run when the recovery is finished. That's simpler and more reliable than any of the other approaches we've discussed, and might become handy for other purposes as well. Does someone want to take a stab at writing a patch for that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I don't think we're going to get this to work reliably without extending > the interface between the backend and restore_command. We've discussed > many methods and there's always some nasty corner-case like that. > I think we should leave back-branches as is, and go with Simon's > suggestion to add new "recovery_end_command" that's run when the > recovery is finished. That's simpler and more reliable than any of the > other approaches we've discussed, and might become handy for other > purposes as well. > Does someone want to take a stab at writing a patch for that? Does this conclusion mean that changing pg_standby is no longer on the table for 8.4? It certainly smells more like a new feature than a bug fix. regards, tom lane
On Wed, 2009-05-13 at 13:01 -0400, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > I don't think we're going to get this to work reliably without extending > > the interface between the backend and restore_command. We've discussed > > many methods and there's always some nasty corner-case like that. Agreed. > > I think we should leave back-branches as is, and go with Simon's > > suggestion to add new "recovery_end_command" that's run when the > > recovery is finished. That's simpler and more reliable than any of the > > other approaches we've discussed, and might become handy for other > > purposes as well. That is the cleanest way, though we cannot really avoid acting for backbranches also. > > Does someone want to take a stab at writing a patch for that? No, not if there is a likelihood the work would be wasted. > Does this conclusion mean that changing pg_standby is no longer > on the table for 8.4? It certainly smells more like a new feature > than a bug fix. I don't really understand this comment. Why would fixing a memory leak be worthwhile when fixing a potential for data loss be a deferrable activity? I will set-up pg_standby as an external module and we can change it from there. No more discussions-for-8.4 and I can update as required to support each release. So let's just remove it from contrib and be done. Counterthoughts? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > I will set-up pg_standby as an external module and we can change it from > there. No more discussions-for-8.4 and I can update as required to > support each release. So let's just remove it from contrib and be done. Huh? The proposed fix involves a backend change, so I don't see how removing pg_standby from the distribution frees you from the constraints of our versioning. regards, tom lane
Simon Riggs wrote: > I will set-up pg_standby as an external module and we can change it from > there. No more discussions-for-8.4 and I can update as required to > support each release. So let's just remove it from contrib and be done. > Counterthoughts? > > We're in Beta. You can't just go yanking stuff like that. Beta testers will be justifiably very annoyed. Please calm down. pg_standby is useful and needs to be correct. And its existence as a standard module is one of the things that has made me feel confident about recommending people to use the PITR stuff. I'll be very annoyed if it were to get pulled. cheers andrew
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> I don't think we're going to get this to work reliably without extending >> the interface between the backend and restore_command. We've discussed >> many methods and there's always some nasty corner-case like that. > >> I think we should leave back-branches as is, and go with Simon's >> suggestion to add new "recovery_end_command" that's run when the >> recovery is finished. That's simpler and more reliable than any of the >> other approaches we've discussed, and might become handy for other >> purposes as well. > >> Does someone want to take a stab at writing a patch for that? > > Does this conclusion mean that changing pg_standby is no longer > on the table for 8.4? It certainly smells more like a new feature > than a bug fix. This whole thing can be considered to be a new feature. It's working as designed. But people seem to be surprised about the current behavior (me included), and we don't currently provide the behavior that most people actually want. I think we should fix it for 8.4. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-05-13 at 14:14 -0400, Andrew Dunstan wrote: > pg_standby is useful and needs to be correct. And its existence as a > standard module is one of the things that has made me feel confident > about recommending people to use the PITR stuff. I'll be very annoyed if > it were to get pulled. Although I am not advocating one position or another there are benefits to removing it. It would be nice to continue to enhance pg_standby without the limitations of the core release schedule. Sincerely, Joshua D. Drake -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
Simon Riggs wrote: > On Wed, 2009-05-13 at 13:01 -0400, Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> Does someone want to take a stab at writing a patch for that? > > No, not if there is a likelihood the work would be wasted. There always is. (I would've wrote the patch myself right away, but I'm extremely busy at the moment. :-( Might take one more day before I get the time to finish it, and we don't have much time) >> Does this conclusion mean that changing pg_standby is no longer >> on the table for 8.4? It certainly smells more like a new feature >> than a bug fix. > > I don't really understand this comment. Why would fixing a memory leak > be worthwhile when fixing a potential for data loss be a deferrable > activity? Because the data loss is working as designed and documented, even though the design is not what most people want and the documentation could say that more prominently. That said, I'm in favor of changing this for 8.4. > I will set-up pg_standby as an external module and we can change it from > there. No more discussions-for-8.4 and I can update as required to > support each release. So let's just remove it from contrib and be done. > Counterthoughts? That's a lot more drastic change to make in beta. Besides, the proposed fix required backend changes. I think we should keep it in contrib. (At least for this release: If we get more integrated replication options in 8.5, that would be a good time to move pg_standby out of contrib if that's what we want.) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: > I think we should fix it for 8.4. Agreed. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Andrew Dunstan wrote: >> > We're in Beta. You can't just go yanking stuff like that. Beta testers > will be justifiably very annoyed. > > Please calm down. > > pg_standby is useful and needs to be correct. And its existence as a > standard module is one of the things that has made me feel confident > about recommending people to use the PITR stuff. I'll be very annoyed > if it were to get pulled. Since mentioned in the docs, I consider it at least the semi-official tool for pgsql PITR handling. But as this discussion reveals, the api is flawed, and will not allow guaranteed consistency (whatever pg_standby tries) until fixed. While this may not be a bug of the restore_script call, the pitr procedure in total is partially broken (in the sense that it doesn't provide what most users expect in a secure way) and thus needs to be fixed. It seems a fix can't be provided without extending the api. Regards, Andreas
On Wed, 2009-05-13 at 14:14 -0400, Andrew Dunstan wrote: > pg_standby is useful and needs to be correct. My suggestion was designed to provide this. A misunderstanding. > And its existence as a > standard module is one of the things that has made me feel confident > about recommending people to use the PITR stuff. I'll be very annoyed if > it were to get pulled. If we cannot make it correct within core, then I will make it correct somewhere else, beta or not. Other than that, I have no wish to remove it from contrib. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> Does this conclusion mean that changing pg_standby is no longer >> on the table for 8.4? It certainly smells more like a new feature >> than a bug fix. > This whole thing can be considered to be a new feature. It's working as > designed. But people seem to be surprised about the current behavior (me > included), and we don't currently provide the behavior that most people > actually want. I think we should fix it for 8.4. Well, that's okay by me if it can be done in a timely fashion. Bear in mind that we are planning to wrap beta2 not much more than 24 hours from now. regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > That's a lot more drastic change to make in beta. Besides, the proposed > fix required backend changes. I think we should keep it in contrib. (At > least for this release: If we get more integrated replication options in > 8.5, that would be a good time to move pg_standby out of contrib if > that's what we want.) The proposed fix requires coordinated changes in the core and pg_standby. That would be a lot *harder* if pg_standby were external. Since we've evidently not gotten this API quite right yet, I think we should be keeping pg_standby in contrib until we do, ie the API has been stable for awhile ... regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > >> That's a lot more drastic change to make in beta. Besides, the proposed >> fix required backend changes. I think we should keep it in contrib. (At >> least for this release: If we get more integrated replication options in >> 8.5, that would be a good time to move pg_standby out of contrib if >> that's what we want.) >> > > The proposed fix requires coordinated changes in the core and > pg_standby. That would be a lot *harder* if pg_standby were external. > Since we've evidently not gotten this API quite right yet, I think we > should be keeping pg_standby in contrib until we do, ie the API has been > stable for awhile ... > > > Agreed. Frankly, if anything it should move from contrib to the core proper. I regard it as an essential utility, not an optional extra. cheers andrew
On Wed, 2009-05-13 at 14:53 -0400, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > Tom Lane wrote: > >> Does this conclusion mean that changing pg_standby is no longer > >> on the table for 8.4? It certainly smells more like a new feature > >> than a bug fix. > > > This whole thing can be considered to be a new feature. It's working as > > designed. But people seem to be surprised about the current behavior (me > > included), and we don't currently provide the behavior that most people > > actually want. I think we should fix it for 8.4. > > Well, that's okay by me if it can be done in a timely fashion. Bear in > mind that we are planning to wrap beta2 not much more than 24 hours from > now. I'll write it now then, so it can be reviewed tomorrow. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Wed, 2009-05-13 at 14:53 -0400, Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> Tom Lane wrote: >>>> Does this conclusion mean that changing pg_standby is no longer >>>> on the table for 8.4? It certainly smells more like a new feature >>>> than a bug fix. >>> This whole thing can be considered to be a new feature. It's working as >>> designed. But people seem to be surprised about the current behavior (me >>> included), and we don't currently provide the behavior that most people >>> actually want. I think we should fix it for 8.4. >> Well, that's okay by me if it can be done in a timely fashion. Bear in >> mind that we are planning to wrap beta2 not much more than 24 hours from >> now. > > I'll write it now then, so it can be reviewed tomorrow. Thanks, Simon! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-05-13 at 15:05 -0400, Andrew Dunstan wrote: > Frankly, if anything it should move from contrib to the core proper. I > regard it as an essential utility, not an optional extra. I like that idea. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: > This whole thing can be considered to be a new feature. recovery.conf will contain a new optional parameter: recovery_end_command (string) This parameter specifies a shell command that will be executed once only at the end of recovery. This parameter is optional. The purpose of the recovery_end_command is to provide a mechanism for cleanup following replication or recovery. Any %r is replaced by the name of the file containing the last valid restart point. That is the earliest file that must be kept to allow a restore to be restartable, so this information can be used to truncate the archive to just the minimum required to support restart of the current restore. %r would only be used in a warm-standby configuration (see Section 24.4). Write %% to embed an actual % character in the command. recovery_end_command is performed *after* the UpdateControlFile() once the we are DB_IN_PRODUCTION. This behaviour ensures that a crash prior to the final checkpoint will continue to see the trigger file. Once we are safe, we can remove the trigger file safely. We also can now ignore any complexity surrounding whether WAL files are full or not, and whether WAL files were restored from the archive or from the local directory. Comments? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > recovery_end_command is performed *after* the UpdateControlFile() once > the we are DB_IN_PRODUCTION. Hmm, shouldn't it be after the last checkpoint but before we go to DB_IN_PRODUCTION? I have to admit I've not been following this closely though, so I may be missing something. regards, tom lane
On Wed, 2009-05-13 at 16:47 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > recovery_end_command is performed *after* the UpdateControlFile() once > > the we are DB_IN_PRODUCTION. > > Hmm, shouldn't it be after the last checkpoint Definitely. > but before we go to DB_IN_PRODUCTION? I think it can be either, so I'll go with your proposal. (I'm aware Fujii-san is asleep right now, so we should expect another viewpoint before tomorrow). -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Sorry for the delay. On Thu, May 14, 2009 at 6:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> but before we go to DB_IN_PRODUCTION? > > I think it can be either, so I'll go with your proposal. I also think so. > (I'm aware Fujii-san is asleep right now, so we should expect another > viewpoint before tomorrow). I'd like to avoid adding new parameter for warm-standby if possible because currently the setup of it is already complicated. But, I don't have another good idea yet other than the already proposed. Sorry. Personally, I'd rather make pg_standby delete a trigger file when the timeline history file is requested even if this would break the current behavior, than the setup of warm-standby becomes more complicated. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote: > On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: > > > This whole thing can be considered to be a new feature. > > recovery.conf will contain a new optional parameter: > > recovery_end_command (string) Implemented. Some possibility of re-factoring in calc of %r, though that has not been done to ensure code clarity and avoid need for retesting other aspects of recovery at this stage of beta. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Attachment
Hi, On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote: >> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: >> >> > This whole thing can be considered to be a new feature. >> >> recovery.conf will contain a new optional parameter: >> >> recovery_end_command (string) > > Implemented. > + ereport(signaled ? FATAL : WARNING, > + (errmsg("recovery_end_command \"%s\": return code %d", > + xlogRecoveryEndCmd, rc))); In fast failover case, pg_standby has to delete the trigger file immediately if it's found. Otherwise, recovery may go wrong as I already described. http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php So, in fast mode, recovery_end_command would always fail to delete the trigger file, and cause warning. This is odd behavior, I think. We should change WARNING to DEBUG2 like RestoreArchivedFile() in the above code? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote: >>> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: >>> >>>> This whole thing can be considered to be a new feature. >>> recovery.conf will contain a new optional parameter: >>> >>> recovery_end_command (string) >> Implemented. > >> + ereport(signaled ? FATAL : WARNING, >> + (errmsg("recovery_end_command \"%s\": return code %d", >> + xlogRecoveryEndCmd, rc))); > > In fast failover case, pg_standby has to delete the trigger file immediately > if it's found. Otherwise, recovery may go wrong as I already described. > http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php And if you delete it immediately, you risk getting stuck if there's extra WAL files in pg_xlog. So still need the change in the backend to not call restore_command again for a WAL segment equal to or later than one that it already failed to restore. Or, we can truncate the trigger file to make it behave like a smart failover for the subsequent pg_standby calls. > So, in fast mode, recovery_end_command would always fail to delete the > trigger file, and cause warning. This is odd behavior, I think. We should > change WARNING to DEBUG2 like RestoreArchivedFile() in the above code? I think we should just change the pg_standby example to use "rm -f" rather than "rm". It seems useful to give a warning if the command fails. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, 2009-05-15 at 03:49 +0900, Fujii Masao wrote: > Hi, > > On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote: > >> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: > >> > >> > This whole thing can be considered to be a new feature. > >> > >> recovery.conf will contain a new optional parameter: > >> > >> recovery_end_command (string) > > > > Implemented. > > > + ereport(signaled ? FATAL : WARNING, > > + (errmsg("recovery_end_command \"%s\": return code %d", > > + xlogRecoveryEndCmd, rc))); > > In fast failover case, pg_standby has to delete the trigger file immediately > if it's found. Otherwise, recovery may go wrong as I already described. > http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php > > So, in fast mode, recovery_end_command would always fail to delete the > trigger file, and cause warning. This is odd behavior, I think. We should > change WARNING to DEBUG2 like RestoreArchivedFile() in the above code? Using rm -f would avoid the WARNING. I'd rather keep it at WARNING, since not sure what command I'll be running and what a non-zero rc means. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
I've finally committed Simon's recovery_end_command patch, as well as the changes to pg_standby. There's now smart and fast failover modes, chosen by the content of the trigger file, smart mode is the default. A "fast" trigger file is truncated, turning it into a "smart" trigger for subsequent pg_standby invocations. I believe this is now safe in all the combinations discussed, in both fast and smartmode, with or without extra WAL files copied to pg_xlog, and also if the last archived WAL file is incomplete. You now need to set up recovery_end_command to clean up the trigger file; pg_standby no longer does that automatically. Simon Riggs wrote: > On Fri, 2009-05-15 at 03:49 +0900, Fujii Masao wrote: >> Hi, >> >> On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote: >>>> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote: >>>> >>>>> This whole thing can be considered to be a new feature. >>>> recovery.conf will contain a new optional parameter: >>>> >>>> recovery_end_command (string) >>> Implemented. >>> + ereport(signaled ? FATAL : WARNING, >>> + (errmsg("recovery_end_command \"%s\": return code %d", >>> + xlogRecoveryEndCmd, rc))); >> In fast failover case, pg_standby has to delete the trigger file immediately >> if it's found. Otherwise, recovery may go wrong as I already described. >> http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php >> >> So, in fast mode, recovery_end_command would always fail to delete the >> trigger file, and cause warning. This is odd behavior, I think. We should >> change WARNING to DEBUG2 like RestoreArchivedFile() in the above code? > > Using rm -f would avoid the WARNING. > > I'd rather keep it at WARNING, since not sure what command I'll be > running and what a non-zero rc means. > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Fri, May 15, 2009 at 5:31 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I've finally committed Simon's recovery_end_command patch, as well as the > changes to pg_standby. There's now smart and fast failover modes, chosen by > the content of the trigger file, smart mode is the default. A "fast" trigger > file is truncated, turning it into a "smart" trigger for subsequent > pg_standby invocations. I believe this is now safe in all the combinations > discussed, in both fast and smart mode, with or without extra WAL files > copied to pg_xlog, and also if the last archived WAL file is incomplete. Thanks for revising my patch and committing it! This seems to work fine in all the case which I described. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, 2009-05-14 at 23:31 +0300, Heikki Linnakangas wrote: > I've finally committed ....changes to pg_standby. That was a good team effort. Thanks for committing. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > I don't think we're going to get this to work reliably without extending > > the interface between the backend and restore_command. We've discussed > > many methods and there's always some nasty corner-case like that. > > > I think we should leave back-branches as is, and go with Simon's > > suggestion to add new "recovery_end_command" that's run when the > > recovery is finished. That's simpler and more reliable than any of the > > other approaches we've discussed, and might become handy for other > > purposes as well. > > > Does someone want to take a stab at writing a patch for that? > > Does this conclusion mean that changing pg_standby is no longer > on the table for 8.4? It certainly smells more like a new feature > than a bug fix. I think the big frustration is that this issue was first brought up March 25 and it took two months to resolve it, at which point we were in beta. I think many hoped a better idea would emerge but often that just doesn't happen and we have to do the best fix we can and move on. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > > I don't think we're going to get this to work reliably without extending > > > the interface between the backend and restore_command. We've discussed > > > many methods and there's always some nasty corner-case like that. > > > > > I think we should leave back-branches as is, and go with Simon's > > > suggestion to add new "recovery_end_command" that's run when the > > > recovery is finished. That's simpler and more reliable than any of the > > > other approaches we've discussed, and might become handy for other > > > purposes as well. > > > > > Does someone want to take a stab at writing a patch for that? > > > > Does this conclusion mean that changing pg_standby is no longer > > on the table for 8.4? It certainly smells more like a new feature > > than a bug fix. > > I think the big frustration is that this issue was first brought up > March 25 and it took two months to resolve it, at which point we were in > beta. I think many hoped a better idea would emerge but often that just > doesn't happen and we have to do the best fix we can and move on. I agree, very frustrating. Why do you bring it up again now? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote: > > Tom Lane wrote: > > > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > > > I don't think we're going to get this to work reliably without extending > > > > the interface between the backend and restore_command. We've discussed > > > > many methods and there's always some nasty corner-case like that. > > > > > > > I think we should leave back-branches as is, and go with Simon's > > > > suggestion to add new "recovery_end_command" that's run when the > > > > recovery is finished. That's simpler and more reliable than any of the > > > > other approaches we've discussed, and might become handy for other > > > > purposes as well. > > > > > > > Does someone want to take a stab at writing a patch for that? > > > > > > Does this conclusion mean that changing pg_standby is no longer > > > on the table for 8.4? It certainly smells more like a new feature > > > than a bug fix. > > > > I think the big frustration is that this issue was first brought up > > March 25 and it took two months to resolve it, at which point we were in > > beta. I think many hoped a better idea would emerge but often that just > > doesn't happen and we have to do the best fix we can and move on. > > I agree, very frustrating. Why do you bring it up again now? Wny not? We are not going to improve unless we face our faults. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Simon Riggs wrote: > > > > On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote: > > > Tom Lane wrote: > > > > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > > > > I don't think we're going to get this to work reliably without extending > > > > > the interface between the backend and restore_command. We've discussed > > > > > many methods and there's always some nasty corner-case like that. > > > > > > > > > I think we should leave back-branches as is, and go with Simon's > > > > > suggestion to add new "recovery_end_command" that's run when the > > > > > recovery is finished. That's simpler and more reliable than any of the > > > > > other approaches we've discussed, and might become handy for other > > > > > purposes as well. > > > > > > > > > Does someone want to take a stab at writing a patch for that? > > > > > > > > Does this conclusion mean that changing pg_standby is no longer > > > > on the table for 8.4? It certainly smells more like a new feature > > > > than a bug fix. > > > > > > I think the big frustration is that this issue was first brought up > > > March 25 and it took two months to resolve it, at which point we were in > > > beta. I think many hoped a better idea would emerge but often that just > > > doesn't happen and we have to do the best fix we can and move on. > > > > I agree, very frustrating. Why do you bring it up again now? > > Wny not? We are not going to improve unless we face our faults. Oh, and I am backlogged on email, which is why I didn't mention it a week ago when this thread was active (which I think was your point). :-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2009-05-27 at 09:48 -0400, Bruce Momjian wrote: > We are not going to improve unless we face our faults. True. Who or what is at fault, in your opinion? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Bruce Momjian wrote: > Bruce Momjian wrote: >> Simon Riggs wrote: >>> On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote: >>>> I think the big frustration is that this issue was first brought up >>>> March 25 and it took two months to resolve it, at which point we were in >>>> beta. I think many hoped a better idea would emerge but often that just >>>> doesn't happen and we have to do the best fix we can and move on. >>> I agree, very frustrating. Why do you bring it up again now? >> Wny not? We are not going to improve unless we face our faults. > > Oh, and I am backlogged on email, which is why I didn't mention it a > week ago when this thread was active (which I think was your point). :-) Did you catch up the backlog far enough to see that Simon coded and I committed the patch to add "recovery_end_command" just before beta2? So it's in 8.4 now (http://archives.postgresql.org/message-id/20090514203109.8EBA2754067@cvs.postgresql.org). I agree we could've should've handled this more promptly, and I'll take my part of the blame for that. I let the various proposed patches sit for long times before reviewing them thoroughly, partly because I was busy and partly because I didn't feel good about the proposed approaches. I think what went in in the end is pretty simple and robust, but it's a shame we had to rush to get it into beta2, after sitting on our thumbs for months. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-05-27 at 09:51 -0400, Bruce Momjian wrote: > Bruce Momjian wrote: > > Simon Riggs wrote: > > > > > > On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote: > > > > > > > > I think the big frustration is that this issue was first brought up > > > > March 25 and it took two months to resolve it, at which point we were in > > > > beta. I think many hoped a better idea would emerge but often that just > > > > doesn't happen and we have to do the best fix we can and move on. > > > > > > I agree, very frustrating. Why do you bring it up again now? > > > > Wny not? We are not going to improve unless we face our faults. > > Oh, and I am backlogged on email, which is why I didn't mention it a > week ago when this thread was active (which I think was your point). :-) If there is criticism of someone or something, please let's hear it. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Wed, 2009-05-27 at 09:48 -0400, Bruce Momjian wrote: > > > We are not going to improve unless we face our faults. > > True. Who or what is at fault, in your opinion? Well, we knew there was an issue but we didn't finalize our conclusions and address it as best we could before beta; instead it languished and was only addressed when we had our back up against the wall for beta2. Obviously this is not the best aproach. Ideally someone would have taken ownership of the issue, summarized the email conclusions, gotten a patch together, and submitted it for application. I know patches were posted but no one got group concensus on its usefulness until recently. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2009-05-27 at 12:08 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > > > On Wed, 2009-05-27 at 09:48 -0400, Bruce Momjian wrote: > > > > > We are not going to improve unless we face our faults. > > > > True. Who or what is at fault, in your opinion? > > Well, we knew there was an issue but we didn't finalize our conclusions > and address it as best we could before beta; instead it languished and > was only addressed when we had our back up against the wall for beta2. > Obviously this is not the best aproach. Ideally someone would have > taken ownership of the issue, summarized the email conclusions, gotten a > patch together, and submitted it for application. I know patches were > posted but no one got group consensus on its usefulness until recently. My experience is that consensus/votes will be overruled by final committer, if they disagree, and so only a committer has the real authority to take the role you suggest. http://en.wiktionary.org/wiki/responsibility "The obligation to carry forward an assigned task to a successful conclusion. With responsibility goes authority to direct and take the necessary action to ensure success" >From my side, Fujii-san's patch was adequate and backpatchable, though needed docs changes; I proposed it should be committed and said so clearly on open items wiki. My own suggestion was also an option, but it was also clearly not backpatchable and seemed unlikely to be acceptable at any time, let alone during beta. I am surprised at the final outcome, but at least there is one, which is good. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > My experience is that consensus/votes will be overruled by final > committer, if they disagree, > That's a fairly strong statement. I can't think of an occasion when this has happened on any matter of significance, and I can remember many times when Tom, Bruce and others have bowed to the consensus despite their own preferences. Bruce said the other day that we are trustees of the code, and I don't think I could put it better. I know I am conscious of that. cheers andrew
On Wed, May 27, 2009 at 1:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Simon Riggs wrote: >> >> My experience is that consensus/votes will be overruled by final >> committer, if they disagree, > > That's a fairly strong statement. I can't think of an occasion when this has > happened on any matter of significance, and I can remember many times when > Tom, Bruce and others have bowed to the consensus despite their own > preferences. Tom and Bruce do give way before a clear consensus, but on the other hand I think Simon is right that there was never much chance of getting anything committed here without Heikki's endorsement, which was slow in coming by his own admission. (I'm not in any way saying he was wrong to withhold his endorsement, just that he did.) I think it's undeniable that the voices of the committers carry significantly more weight than those of others on this mailing list, and especially that of Tom because of the sheer volume of what he commits compared to anyone else. Having one of the committers say that they don't like your patch doesn't completely kill its chances of getting accepted, but it definitely turns it into an uphill battle. On the other hand, if one of the committers takes a fancy to your patch it will occasionally jump ahead of the queue and get reviewed or committed before patches submitted much earlier. And more than one committer got features into 8.4 that were not really done in time for the 11/08 CommitFest, and likely would have been rejected if they'd come from a non-committer. As a thought experiment, consider two patches, one of which has a +1 from Tom Lane and a -1 from some other respected community member who is not a committer, and the other of which has a +1 from the community member and a -1 from Tom Lane. Which do you think is more likely to get committed? I know what I'd pick. Now, in many cases, the fact that the committers speak with the loudest voices is a good thing, because they are mostly very good coders with lots of PostgreSQL experience and a proven track record of not breaking the tree too often. But that doesn't make it any less true. ...Robert
On Wed, 2009-05-27 at 13:14 -0400, Andrew Dunstan wrote: > > Simon Riggs wrote: > > My experience is that consensus/votes will be overruled by final > > committer, if they disagree, > > That's a fairly strong statement. I was attempting to be open and honest to allow us to face our faults, as proposed, because I care about the health and efficiency of the project. I expected your response and know you think my comments to be blunt at times. We should worry about the people that don't speak their mind, not those that do. > I can't think of an occasion when this > has happened on any matter of significance, and I can remember many > times when Tom, Bruce and others have bowed to the consensus despite > their own preferences. It depends on what you regard as matters of significance and which parts of the code you pay attention to. Committers can act as they choose; I am merely pointing out that it isn't possible for non-committers to follow through on any responsibility they may attempt to assert. Specifically, trying to propose solutions, achieve consensus and develop patches only works if the committer will agree with conclusions at the end and you won't know until you get there. You can do a ton of work and it can all be for nothing if the consensus opposes the committer. After a while you reach the conclusion: Why do the work, why not just wait for the opinion and then act? Less work and same speed. > Bruce said the other day that we are trustees of the code, and I don't > think I could put it better. I know I am conscious of that. I doubt anyone reading this list feels any differently. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2009-05-27 at 17:39 +0300, Heikki Linnakangas wrote: > I agree we could've should've handled this more promptly, and I'll take > my part of the blame for that. I let the various proposed patches sit > for long times before reviewing them thoroughly, partly because I was > busy and partly because I didn't feel good about the proposed > approaches. I think what went in in the end is pretty simple and robust, > but it's a shame we had to rush to get it into beta2, after sitting on > our thumbs for months. Through everything I just said, I have no criticism of you Heikki. I had the same emotions about the proposals. I was only explaining why I felt unable to speed up the process myself. Final patch took me an hour to code and test, so I wasn't put out or rushed. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Robert Haas wrote: > Tom and Bruce do give way before a clear consensus, but on the other > hand I think Simon is right that there was never much chance of > getting anything committed here without Heikki's endorsement, which > was slow in coming by his own admission. (I'm not in any way saying > he was wrong to withhold his endorsement, just that he did.) > > I think it's undeniable that the voices of the committers carry > significantly more weight than those of others on this mailing list, > and especially that of Tom because of the sheer volume of what he > commits compared to anyone else. Having one of the committers say > that they don't like your patch doesn't completely kill its chances of > getting accepted, but it definitely turns it into an uphill battle. > On the other hand, if one of the committers takes a fancy to your > patch it will occasionally jump ahead of the queue and get reviewed or > committed before patches submitted much earlier. And more than one > committer got features into 8.4 that were not really done in time for > the 11/08 CommitFest, and likely would have been rejected if they'd > come from a non-committer. > > As a thought experiment, consider two patches, one of which has a +1 > from Tom Lane and a -1 from some other respected community member who > is not a committer, and the other of which has a +1 from the community > member and a -1 from Tom Lane. Which do you think is more likely to > get committed? I know what I'd pick. > > Now, in many cases, the fact that the committers speak with the > loudest voices is a good thing, because they are mostly very good > coders with lots of PostgreSQL experience and a proven track record of > not breaking the tree too often. But that doesn't make it any less > true. The above comments by Robert are very perceptive, and there is certainly truth that committer-endorsed patches are applied quicker than others. My only additional comment is that over time, reliable patch submitters become committers, so hopefully things balance out over time. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2009-05-27 at 12:08 -0400, Bruce Momjian wrote: > Ideally someone would have > taken ownership of the issue, summarized the email conclusions, gotten > a patch together, and submitted it for application. Just a further comment on this, based upon the patch Heikki recently committed. I raised various issues with recovery *after* feature freeze in 8.3, doing everything you mentioned above: patch (Jun '07), 4 months before beta1. 3.5 months later the patch was still un-reviewed and you deferred the patch until 8.4, without comment (Sep '07). Changes were eventually committed more than a year after original discussion (Apr '08). HACKERS "Minor changes to recovery related code" My other comments relate to that experience, and others. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support