Thread: New trigger option of pg_standby

New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

Current pg_standby is dangerous because the presence of the trigger
file causes recovery to end whether or not the next WAL file is available.
So, some *available* transactions may be lost at failover. Such danger
will become high if the standby server has not caught up with the primary.

Attached patch fixes the above problem by adding a new trigger option
to pg_standby; the presence of this new trigger file causes recovery to
end after replaying all the available WAL files. Specifically, pg_standby
acts like 'cp' or 'ln' command while this new trigger file exists.

I've not changed any existing features, so backward-compatibility is
maintained.

Thought?

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Wed, Mar 25, 2009 at 7:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Attached patch fixes the above problem by adding a new trigger option
> to pg_standby; the presence of this new trigger file causes recovery to
> end after replaying all the available WAL files.

Shouldn't it be the default? It seems like the most expected behaviour to me.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

Thanks for the comment.

On Wed, Mar 25, 2009 at 3:50 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Wed, Mar 25, 2009 at 7:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Attached patch fixes the above problem by adding a new trigger option
>> to pg_standby; the presence of this new trigger file causes recovery to
>> end after replaying all the available WAL files.
>
> Shouldn't it be the default? It seems like the most expected behaviour to me.

Yeah, I agree... but there may be scripts for warm-standby based on
the existing default behavior. So, I didn't make a new trigger the default.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Wed, Mar 25, 2009 at 9:44 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Yeah, I agree... but there may be scripts for warm-standby based on
> the existing default behavior. So, I didn't make a new trigger the default.

I don't use pg_standby personnaly but I admit I'm quite surprised by
the current behaviour. I'm pretty sure a lot of the current users
would be surprised too.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Wed, Mar 25, 2009 at 5:55 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Wed, Mar 25, 2009 at 9:44 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Yeah, I agree... but there may be scripts for warm-standby based on
>> the existing default behavior. So, I didn't make a new trigger the default.
>
> I don't use pg_standby personnaly but I admit I'm quite surprised by
> the current behaviour. I'm pretty sure a lot of the current users
> would be surprised too.

The current behavior is documented as follows, so it may be
taken for granted by some users. I think that we shouldn't
ignore such users.

---------------
Specify a trigger file whose presence should cause recovery to
end whether or not the next WAL file is available.
---------------

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
"Kevin Grittner"
Date:
>>> Guillaume Smet <guillaume.smet@gmail.com> wrote: 
> On Wed, Mar 25, 2009 at 9:44 AM, Fujii Masao <masao.fujii@gmail.com>
wrote:
>> Yeah, I agree... but there may be scripts for warm-standby based on
>> the existing default behavior. So, I didn't make a new trigger the
default.
> 
> I don't use pg_standby personnaly but I admit I'm quite surprised by
> the current behaviour. I'm pretty sure a lot of the current users
> would be surprised too.
I find it hard to imagine a use case for the existing default
behavior.
-Kevin


Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Wed, Mar 25, 2009 at 2:59 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> I find it hard to imagine a use case for the existing default
> behavior.

I thought a bit about it and I think it can be useful when your
priority is the availability of the service and you don't consider a
data loss that important: even if you have a lot of WALs segments to
replay, you may want to have your service up immediately in case of a
major problem.

Keeping it is a good idea IMHO but I don't think it should be the default.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Thu, Mar 26, 2009 at 12:48 AM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Wed, Mar 25, 2009 at 2:59 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> I find it hard to imagine a use case for the existing default
>> behavior.
>
> I thought a bit about it and I think it can be useful when your
> priority is the availability of the service and you don't consider a
> data loss that important: even if you have a lot of WALs segments to
> replay, you may want to have your service up immediately in case of a
> major problem.

Yes, I also think that this is likely use case.

> Keeping it is a good idea IMHO but I don't think it should be the default.

What does "the default" mean? You mean that new trigger should use
the existing trigger option character (-t)?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Mar 26, 2009 at 12:48 AM, Guillaume Smet
> <guillaume.smet@gmail.com> wrote:
>> On Wed, Mar 25, 2009 at 2:59 PM, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov> wrote:
>>> I find it hard to imagine a use case for the existing default
>>> behavior.
>> I thought a bit about it and I think it can be useful when your
>> priority is the availability of the service and you don't consider a
>> data loss that important: even if you have a lot of WALs segments to
>> replay, you may want to have your service up immediately in case of a
>> major problem.
> 
> Yes, I also think that this is likely use case.
> 
>> Keeping it is a good idea IMHO but I don't think it should be the default.
> 
> What does "the default" mean? You mean that new trigger should use
> the existing trigger option character (-t)?

The existing behavior doesn't seem very useful to me either. Assuming 
there is a use case though, we probably need to support both at the same 
time, perhaps using different trigger files. If there's a use case for 
both, conceivably someone will want to sometimes trigger the failover 
immediately and sometimes after all WAL segments have been replayed.

Whatever we do, the signaling method to trigger failover should behave 
the same.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> What does "the default" mean? You mean that new trigger should use
> the existing trigger option character (-t)?

Yes, that's my point.

I understand it seems weird to switch the options but I'm pretty sure
a lot of persons currently using -t would be surprised by the current
behaviour. Moreover playing all the remaining WALs before starting up
should be the most natural option when people are looking in the help.

That said, it would be nice to hear from people really using
pg_standby to know if they understand how it works now and if it's
what they intended when they set it up.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Matteo Beccati
Date:
Hi,

Guillaume Smet wrote:
> On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> What does "the default" mean? You mean that new trigger should use
>> the existing trigger option character (-t)?
> 
> Yes, that's my point.
> 
> I understand it seems weird to switch the options but I'm pretty sure
> a lot of persons currently using -t would be surprised by the current
> behaviour. Moreover playing all the remaining WALs before starting up
> should be the most natural option when people are looking in the help.
> 
> That said, it would be nice to hear from people really using
> pg_standby to know if they understand how it works now and if it's
> what they intended when they set it up.

My fault not RTFM well enough, but I was surprised finding out that -t 
is working like that.

+1 for me to switch -t to the new behaviour.


Cheers

-- 
Matteo Beccati

OpenX - http://www.openx.org


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Thu, 2009-03-26 at 08:32 +0100, Guillaume Smet wrote:
> On Thu, Mar 26, 2009 at 2:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > What does "the default" mean? You mean that new trigger should use
> > the existing trigger option character (-t)?
> 
> Yes, that's my point.
> 
> I understand it seems weird to switch the options but I'm pretty sure
> a lot of persons currently using -t would be surprised by the current
> behaviour. Moreover playing all the remaining WALs before starting up
> should be the most natural option when people are looking in the help.

If the standby has fallen behind then waiting for it to catch up might
take hours to failover if it waits for all files. If you haven't been
monitoring it correctly, you have no clue. That is also a surprising
thing, so let's not jump from one surprising thing into the arms of
another.

If we go with this, I would suggest we make *neither* the default by
removing -t, and adopting two new options: something like -f == fast
failover, -p == patient failover. This then forces people to read and
understand the difference between the two behaviours so they can make an
informed choice of how they would like to act at this critical point in
time. It is justifiable because there is no single thing called a
trigger file any longer and the concept will lead to pain.

Earlier, we discussed having a single trigger file that contains an
option rather than two distinct trigger files. That design is better
because it allows the user to choose at failover time, rather than
making a binding decision at config time. That solution would be the
ideal one, IMHO, because it gives user more choice - and would allow us
to keep the -t option meaningfully. In that case the default should be
patience.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
Hi Simon.

On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Earlier, we discussed having a single trigger file that contains an
> option rather than two distinct trigger files. That design is better
> because it allows the user to choose at failover time, rather than
> making a binding decision at config time. That solution would be the
> ideal one, IMHO, because it gives user more choice - and would allow us
> to keep the -t option meaningfully. In that case the default should be
> patience.

Or you can define both files in your command line to have the choice.

I like the idea of removing -t and adding 2 new options so that people
are warned about the intended behavior.

Anyway, I don't have a strong opinion about how we should fix it as I
don't use pg_standby personnally, just that we should. The two options
you mention have their own merits.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Thu, Mar 26, 2009 at 8:54 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> Hi Simon.
>
> On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Earlier, we discussed having a single trigger file that contains an
>> option rather than two distinct trigger files. That design is better
>> because it allows the user to choose at failover time, rather than
>> making a binding decision at config time. That solution would be the
>> ideal one, IMHO, because it gives user more choice - and would allow us
>> to keep the -t option meaningfully. In that case the default should be
>> patience.
>
> Or you can define both files in your command line to have the choice.

Personally I like this.

> I like the idea of removing -t and adding 2 new options so that people
> are warned about the intended behavior.

OK, I'll change the patch as Simon suggested; removing -t and adding
two new options: -f = fast failover (existing behavior), -p patient failover.
Also I'll default the patient failover, so it's performed when the signal
(SIGINT or SIGUSR1) is received.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Peter Eisentraut
Date:
Simon Riggs wrote:
> If we go with this, I would suggest we make *neither* the default by
> removing -t, and adopting two new options: something like -f == fast
> failover, -p == patient failover.

-m smart|fast|immediate :-)



Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Fri, 2009-03-27 at 13:56 +0200, Peter Eisentraut wrote:
> Simon Riggs wrote:
> > If we go with this, I would suggest we make *neither* the default by
> > removing -t, and adopting two new options: something like -f == fast
> > failover, -p == patient failover.
> 
> -m smart|fast|immediate :-)

Yes, a better suggestion.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Fri, Mar 27, 2009 at 12:56 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Simon Riggs wrote:
>>
>> If we go with this, I would suggest we make *neither* the default by
>> removing -t, and adopting two new options: something like -f == fast
>> failover, -p == patient failover.
>
> -m smart|fast|immediate :-)

The advantage of having 2 options (or the ability to put a string
value in the trigger file) is that you can choose the behaviour when
you need to trigger it (you just have to use the 2 options with 2
different filenames). I don't think it's the case with your proposal.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Fri, Mar 27, 2009 at 3:38 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> OK, I'll change the patch as Simon suggested; removing -t and adding
> two new options: -f = fast failover (existing behavior), -p patient failover.
> Also I'll default the patient failover, so it's performed when the signal
> (SIGINT or SIGUSR1) is received.

I'm wondering if we should consider backpatching this one. Even if the
feature works as advertised in the documentation.

It's a very surprising behaviour and I'm pretty sure someone will
shoot himself in the foot with it, if not already done.

Considering backpatching might change the way we want to fix it.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Mar 26, 2009 at 8:54 PM, Guillaume Smet
> <guillaume.smet@gmail.com> wrote:
>> On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I like the idea of removing -t and adding 2 new options so that people
>> are warned about the intended behavior.
> 
> OK, I'll change the patch as Simon suggested; removing -t and adding
> two new options: -f = fast failover (existing behavior), -p patient failover.
> Also I'll default the patient failover, so it's performed when the signal
> (SIGINT or SIGUSR1) is received.

Uh oh, that's going to be quite tricky with signals. Remember that 
pg_standby is called for each file. A trigger file persists until it's 
deleted, but a signal will only be received by the pg_standby instance 
that happens to be running at the time.

Makes me wonder if the trigger pg_standby with signals is reliable to 
begin with. What if the backend is just processing a file when the 
signal is fired, and there's no pg_standby process running at the moment 
to receive it? Seems like the signaler needs to loop until it has 
successfully delivered the signal to a pg_standby process, which seems 
pretty ugly.

Given all the recent trouble with signals, and the fact that it's 
undocumented, perhaps we should just rip out the signaling support from 
pg_standby.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Fri, 2009-03-27 at 13:19 +0100, Guillaume Smet wrote:
> On Fri, Mar 27, 2009 at 12:56 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> > Simon Riggs wrote:
> >>
> >> If we go with this, I would suggest we make *neither* the default by
> >> removing -t, and adopting two new options: something like -f == fast
> >> failover, -p == patient failover.
> >
> > -m smart|fast|immediate :-)
> 
> The advantage of having 2 options (or the ability to put a string
> value in the trigger file) is that you can choose the behaviour when
> you need to trigger it (you just have to use the 2 options with 2
> different filenames). I don't think it's the case with your proposal.

Yes, sorry. I meant we should use the naming Peter suggests.

So we would have two triggers, but call them fast and smart, rather than
fast and patient.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Simon Riggs wrote:
>> If we go with this, I would suggest we make *neither* the default by
>> removing -t, and adopting two new options: something like -f == fast
>> failover, -p == patient failover.

> -m smart|fast|immediate :-)

+1 for using a "-m something" type of syntax instead of having to try to
pick single-letter switches that are mnemonic for the different cases.
But -1 to those particular mode names --- I think it will invite
confusion with pg_ctl's behavior.
        regards, tom lane


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Fri, 2009-03-27 at 10:25 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Simon Riggs wrote:
> >> If we go with this, I would suggest we make *neither* the default by
> >> removing -t, and adopting two new options: something like -f == fast
> >> failover, -p == patient failover.
> 
> > -m smart|fast|immediate :-)
> 
> +1 for using a "-m something" type of syntax instead of having to try to
> pick single-letter switches that are mnemonic for the different cases.
> But -1 to those particular mode names --- I think it will invite
> confusion with pg_ctl's behavior.

The choice is between

* one parameter with the option being given as text within trigger file

* two parameters naming different types of trigger file

I don't mind which, as long as it is one of those two, unless there is a
third way to specify things so that user has control at failover time. A
single -m option would hardcode that decision ahead of time, which is
undesirable behaviour, hence the additional complexity being discussed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Fri, Mar 27, 2009 at 9:49 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Uh oh, that's going to be quite tricky with signals. Remember that
> pg_standby is called for each file. A trigger file persists until it's
> deleted, but a signal will only be received by the pg_standby instance that
> happens to be running at the time.

You are right!

> Makes me wonder if the trigger pg_standby with signals is reliable to begin
> with. What if the backend is just processing a file when the signal is
> fired, and there's no pg_standby process running at the moment to receive
> it? Seems like the signaler needs to loop until it has successfully
> delivered the signal to a pg_standby process, which seems pretty ugly.
>
> Given all the recent trouble with signals, and the fact that it's
> undocumented, perhaps we should just rip out the signaling support from
> pg_standby.

So far, to be frank, I was not sure why the trigger by the signal is necessary
for pg_standby. But, now, I think that it's useful when the user has forgotten
to specify the trigger file. In this case, without the signaling
support, there is
no way to do failover in a short time; probably, the user has to do shutdown
and restart a recovery from the last restart point, which would take time.
So, I'd like to leave the signaling support as a safeguard.

As you pointed out, the "smart" failover by signal would be very tricky. So,
maybe we should not change the existing behavior of pg_standby when
the signal is received.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Gurjeet Singh
Date:
<div class="gmail_quote">On Thu, Mar 26, 2009 at 4:54 AM, Guillaume Smet <span dir="ltr"><<a
href="mailto:guillaume.smet@gmail.com">guillaume.smet@gmail.com</a>></span>wrote:<br /><blockquote
class="gmail_quote"style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi
Simon.<br/><div class="im"><br /> On Thu, Mar 26, 2009 at 11:50 AM, Simon Riggs <<a
href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> > Earlier, we discussed having a
singletrigger file that contains an<br /> > option rather than two distinct trigger files. That design is better<br
/>> because it allows the user to choose at failover time, rather than<br /> > making a binding decision at
configtime. That solution would be the<br /> > ideal one, IMHO, because it gives user more choice - and would allow
us<br/> > to keep the -t option meaningfully. In that case the default should be<br /> > patience.<br /><br
/></div>Oryou can define both files in your command line to have the choice.<br /><br /> I like the idea of removing -t
andadding 2 new options so that people<br /> are warned about the intended behavior.<br /><br /> Anyway, I don't have a
strongopinion about how we should fix it as I<br /> don't use pg_standby personnally, just that we should. The two
options<br/> you mention have their own merits.<br /><br /></blockquote></div><br />For the record, I have been a user
ofpg_standby in the past, and never realized this behavioural detail; probably because we never had a huge lag at the
slave.<br/><br />From implementation perspective, I'd vote against any new options. That'd just make it harder for
backportingto older branches. I like the idea that contents of the trigger file determines the mode of operation.<br
/><br/>So, we can keep the existing behaviour where an empty trigger file waits for all WAL files to be applied, hence
nosurprises for existing users. With some strong wordings in the docs, we emit a log message like<br /><br /> 'Trigger
filefound; Performing patient recovery'<br /><br />that makes the user read the docs and understand the difference, if
hehas not already done so. And if there are some contents in the trigger file, then the behaviour depends on that.<br
/><br/>If we do not want to go to lengths of reading the file contents, the behaviour can depend on the suffix of the
file.<br/><br />pg_standby -t /tmp/mytrigger ...  -- lest assume this is the restore_command<br /><br />if( exists
/tmp/mytrigger) do default patient recovery with appropriate messages in log<br /> if( exists /tmp/mytrigger.fast ) do
stopthe recovery immediately with a message in log<br />if( exists /tmp/mytrigger.normal ) do the patient recovery with
messagesin log<br /><br />Best regards,<br />-- <br />gurjeet[.singh]@EnterpriseDB.com<br /> singh.gurjeet@{ gmail |
hotmail| indiatimes | yahoo }.com<br /><br />EnterpriseDB      <a
href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br/><br />Mail sent from my BlackLaptop device<br /> 

Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Fri, Mar 27, 2009 at 11:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Fri, 2009-03-27 at 10:25 -0400, Tom Lane wrote:
>> Peter Eisentraut <peter_e@gmx.net> writes:
>> > Simon Riggs wrote:
>> >> If we go with this, I would suggest we make *neither* the default by
>> >> removing -t, and adopting two new options: something like -f == fast
>> >> failover, -p == patient failover.
>>
>> > -m smart|fast|immediate :-)
>>
>> +1 for using a "-m something" type of syntax instead of having to try to
>> pick single-letter switches that are mnemonic for the different cases.
>> But -1 to those particular mode names --- I think it will invite
>> confusion with pg_ctl's behavior.
>
> The choice is between
>
> * one parameter with the option being given as text within trigger file
>
> * two parameters naming different types of trigger file
>
> I don't mind which, as long as it is one of those two, unless there is a
> third way to specify things so that user has control at failover time. A
> single -m option would hardcode that decision ahead of time, which is
> undesirable behaviour, hence the additional complexity being discussed.

Thanks for the clarification.

I'd like to choose the former because it's more flexible when new
trigger action is added to pg_standby in the future. And, as Gurjeet
says, it's more friendly to do smart failover (end recovery after all
the available WAL are applied) when an empty trigger file exists.
I'll change the patch as above. Comments?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Wed, Apr 1, 2009 at 11:01 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi,
>
> On Fri, Mar 27, 2009 at 11:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> On Fri, 2009-03-27 at 10:25 -0400, Tom Lane wrote:
>>> Peter Eisentraut <peter_e@gmx.net> writes:
>>> > Simon Riggs wrote:
>>> >> If we go with this, I would suggest we make *neither* the default by
>>> >> removing -t, and adopting two new options: something like -f == fast
>>> >> failover, -p == patient failover.
>>>
>>> > -m smart|fast|immediate :-)
>>>
>>> +1 for using a "-m something" type of syntax instead of having to try to
>>> pick single-letter switches that are mnemonic for the different cases.
>>> But -1 to those particular mode names --- I think it will invite
>>> confusion with pg_ctl's behavior.
>>
>> The choice is between
>>
>> * one parameter with the option being given as text within trigger file
>>
>> * two parameters naming different types of trigger file
>>
>> I don't mind which, as long as it is one of those two, unless there is a
>> third way to specify things so that user has control at failover time. A
>> single -m option would hardcode that decision ahead of time, which is
>> undesirable behaviour, hence the additional complexity being discussed.
>
> Thanks for the clarification.
>
> I'd like to choose the former because it's more flexible when new
> trigger action is added to pg_standby in the future. And, as Gurjeet
> says, it's more friendly to do smart failover (end recovery after all
> the available WAL are applied) when an empty trigger file exists.
> I'll change the patch as above. Comments?

Here is the patch;
- Smart failover is chosen if the trigger file labeled "smart" or
  an empty one exists.
- Fast failover is chosen if the trigger file labeled "fast" exists,
  the signal (SIGUSR1 or SIGINT) is received or the wait timeout
  happens.

If you notice anything, please feel free to comment.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Fri, Apr 3, 2009 at 5:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Here is the patch;
> - Smart failover is chosen if the trigger file labeled "smart" or
>  an empty one exists.
> - Fast failover is chosen if the trigger file labeled "fast" exists,
>  the signal (SIGUSR1 or SIGINT) is received or the wait timeout
>  happens.

After some further thoughts, +1 for this approach too.

I think you imply 'containing "smart"' not 'labeled "smart"'.
"Labeled" is confusing IMHO.

+1 to change the default behaviour too. All the people discovering the
current behaviour are totally surprised.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Wed, Apr 8, 2009 at 6:56 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote:
> On Fri, Apr 3, 2009 at 5:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Here is the patch;
>> - Smart failover is chosen if the trigger file labeled "smart" or
>>  an empty one exists.
>> - Fast failover is chosen if the trigger file labeled "fast" exists,
>>  the signal (SIGUSR1 or SIGINT) is received or the wait timeout
>>  happens.
>
> After some further thoughts, +1 for this approach too.
>
> I think you imply 'containing "smart"' not 'labeled "smart"'.
> "Labeled" is confusing IMHO.

Thanks for the comment!
I corrected such confusing expression.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> Hi,
> 
> On Wed, Apr 8, 2009 at 6:56 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote:
>> On Fri, Apr 3, 2009 at 5:42 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> Here is the patch;
>>> - Smart failover is chosen if the trigger file labeled "smart" or
>>>  an empty one exists.
>>> - Fast failover is chosen if the trigger file labeled "fast" exists,
>>>  the signal (SIGUSR1 or SIGINT) is received or the wait timeout
>>>  happens.
>> After some further thoughts, +1 for this approach too.
>>
>> I think you imply 'containing "smart"' not 'labeled "smart"'.
>> "Labeled" is confusing IMHO.
> 
> Thanks for the comment!
> I corrected such confusing expression.

> +     if (strspn(buf, "smart") == 5 && strncmp(buf, "smart", 5) == 0)
> +     {

The strspn() call seems pointless here.

One problem with this patch is that in smart mode, the trigger file is 
not deleted. That's different from current pg_standby behavior, and 
makes accidental failovers after one failover more likely.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Thu, Apr 9, 2009 at 9:47 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> +       if (strspn(buf, "smart") == 5 && strncmp(buf, "smart", 5) == 0)
>> +       {
>
> The strspn() call seems pointless here.

OK, I'll get rid of it.

>
> One problem with this patch is that in smart mode, the trigger file is not
> deleted. That's different from current pg_standby behavior, and makes
> accidental failovers after one failover more likely.

Yes, it's because pg_standby cannot be sure when the trigger file
can be removed in smart mode. If the trigger file is deleted as soon
as it's found, just like in fast mode, pg_standby may keep waiting
for WAL file again.

One idea to solve this problem is to tell pg_standby as a
command-line argument about whether the trigger file can be
removed. That parameter value can be set to 'true' when the last
applied record is re-fetched. Though pg_standby is called to
restore timeline history files also after that point, the trigger file
is already unnecessary (pg_standby doesn't wait for history file).

Specifically, if restore_command contains new % option (%e?),
it's replaced by the boolean value which indicates whether the
trigger file can be deleted. This value is set to 'true' when the
startup process re-fetches the last valid record, 'false' otherwise.
In smart mode, pg_standby determines whether to delete the
trigger file according to that value.

Comments?

Or, do you have any better idea?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Fri, Apr 10, 2009 at 5:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> One idea to solve this problem is to tell pg_standby as a
> command-line argument about whether the trigger file can be
> removed. That parameter value can be set to 'true' when the last
> applied record is re-fetched. Though pg_standby is called to
> restore timeline history files also after that point, the trigger file
> is already unnecessary (pg_standby doesn't wait for history file).
>
> Specifically, if restore_command contains new % option (%e?),
> it's replaced by the boolean value which indicates whether the
> trigger file can be deleted. This value is set to 'true' when the
> startup process re-fetches the last valid record, 'false' otherwise.
> In smart mode, pg_standby determines whether to delete the
> trigger file according to that value.
>
> Comments?

Hmmm, it seems overly complicated but I don't know the code of pg_standby.

> Or, do you have any better idea?

Wouldn't it be possible to have a global switch (let's name it
startCluster, default to false) which is set to true when the trigger
file is found for the first time? You would then be able to remove the
trigger file and let the cluster start by checking this variable.

One more time, I don't know the code of pg_standby so it may be a stupid idea.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
Fujii-san,

I like the new patch using the content of the file to determine the
mode. Much easier to use at failover time.

On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote:

> > One problem with this patch is that in smart mode, the trigger file is not
> > deleted. That's different from current pg_standby behavior, and makes
> > accidental failovers after one failover more likely.
> 
> Yes, it's because pg_standby cannot be sure when the trigger file
> can be removed in smart mode. If the trigger file is deleted as soon
> as it's found, just like in fast mode, pg_standby may keep waiting
> for WAL file again.

My understanding of smart mode is fairly simple: 
  if (triggered)  {if (smartMode && nextWALfile+1 exists)    exit(0);else{    delete trigger file    exit(1);}  }

If you perform a file lookahead (the +1) as shown above then you avoid
the problem Heikki observes.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Fri, Apr 10, 2009 at 6:31 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Fri, Apr 10, 2009 at 5:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> One idea to solve this problem is to tell pg_standby as a
>> command-line argument about whether the trigger file can be
>> removed. That parameter value can be set to 'true' when the last
>> applied record is re-fetched. Though pg_standby is called to
>> restore timeline history files also after that point, the trigger file
>> is already unnecessary (pg_standby doesn't wait for history file).
>>
>> Specifically, if restore_command contains new % option (%e?),
>> it's replaced by the boolean value which indicates whether the
>> trigger file can be deleted. This value is set to 'true' when the
>> startup process re-fetches the last valid record, 'false' otherwise.
>> In smart mode, pg_standby determines whether to delete the
>> trigger file according to that value.
>>
>> Comments?
>
> Hmmm, it seems overly complicated but I don't know the code of pg_standby.

Yes, I'd also like to simplify it more.

>> Or, do you have any better idea?
>
> Wouldn't it be possible to have a global switch (let's name it
> startCluster, default to false) which is set to true when the trigger
> file is found for the first time? You would then be able to remove the
> trigger file and let the cluster start by checking this variable.
>
> One more time, I don't know the code of pg_standby so it may be a stupid idea.

Thanks for the suggestion!

Since pg_standby is executed for each file, such variable cannot
be taken over to the next execution, i.e. it's reset each time. So,
the current patch have left the trigger file until the end.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Sat, Apr 11, 2009 at 1:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> Fujii-san,
>
> I like the new patch using the content of the file to determine the
> mode. Much easier to use at failover time.
>
> On Fri, 2009-04-10 at 12:47 +0900, Fujii Masao wrote:
>
>> > One problem with this patch is that in smart mode, the trigger file is not
>> > deleted. That's different from current pg_standby behavior, and makes
>> > accidental failovers after one failover more likely.
>>
>> Yes, it's because pg_standby cannot be sure when the trigger file
>> can be removed in smart mode. If the trigger file is deleted as soon
>> as it's found, just like in fast mode, pg_standby may keep waiting
>> for WAL file again.
>
> My understanding of smart mode is fairly simple:
>
>   if (triggered)
>   {
>        if (smartMode && nextWALfile+1 exists)
>                exit(0);
>        else
>        {
>                delete trigger file
>                exit(1);
>        }
>   }
>
> If you perform a file lookahead (the +1) as shown above then you avoid
> the problem Heikki observes.

Thanks for the suggestion!

A lookahead (the +1) may have pg_standby get stuck as follows.
Am I missing something?

1. the trigger file containing "smart" is created.
2. pg_standby is executed.   2-1. nextWALfile is restored.   2-2. the trigger file is deleted because nextWALfile+1
doesn'texist. 
3. the restored nextWALfile is applied.
4. pg_standby is executed again to restore nextWALfile+1.
5. pg_standby gets stuck because the trigger file and nextWALfile+1   don't exist.

But, a lookahead nextWALfile seems to work fine.

if (triggered)
{   if (smartMode && nextWALfile exists)       exit(0)   else   {       delete trigger file       exit(1)   }
}

1. the trigger file containing "smart" is created.
2. pg_standby is executed.   2-1. nextWALfile is restored.
3. the restored nextWALfile is applied.
4. pg_standby is executed again to restore nextWALfile+1.   4-1. the trigger file is deleted because nextWALfile+1
doesn'texist. 
5. the startup process fails to read nextWALfile+1.
6. pg_standby is executed again to re-fetch nextWALfile.   6-1. nextWALfile is restored.   6-2. pg_standby doesn't get
stuckbecause nextWALfile exists. 

Furthermore, pg_standby may have to check if nextWALfile exists
not only in archiveLocation but also in pg_xlog. Because, when
pg_xlog of the primary server can be read at failover, WAL files
in it may be copied to pg_xlog of the standby server to be applied.
(but, not sure if it's better to copy such files to pg_xlog instead of
archiveLocation in this case).

Comments?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Guillaume Smet
Date:
On Mon, Apr 13, 2009 at 7:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> 1. the trigger file containing "smart" is created.
> 2. pg_standby is executed.
>    2-1. nextWALfile is restored.
>    2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
> 3. the restored nextWALfile is applied.
> 4. pg_standby is executed again to restore nextWALfile+1.

I don't think it should happen. IMHO, it's an acceptable compromise to
replay all the WAL files present when I created the trigger file. So
if I have the smart shutdown trigger file and I don't have any
nextWALfile+1, I can remove the trigger file and stop the recovery:
pg_standby won't be executed again after that, even if a nextWALfile+1
appeared while replaying the previous WAL file.

That said, stupid question: do we have a way to know the nextWALfile+1
name to test if it exists? nextWALfile is transmitted through the
restore_command API and I'm wondering if we can have nextWALfile+1
name without changing the restore_command API.

-- 
Guillaume


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Mon, Apr 13, 2009 at 7:21 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Mon, Apr 13, 2009 at 7:52 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> 1. the trigger file containing "smart" is created.
>> 2. pg_standby is executed.
>>    2-1. nextWALfile is restored.
>>    2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
>> 3. the restored nextWALfile is applied.
>> 4. pg_standby is executed again to restore nextWALfile+1.
>
> I don't think it should happen. IMHO, it's an acceptable compromise to
> replay all the WAL files present when I created the trigger file. So
> if I have the smart shutdown trigger file and I don't have any
> nextWALfile+1, I can remove the trigger file and stop the recovery:
> pg_standby won't be executed again after that, even if a nextWALfile+1
> appeared while replaying the previous WAL file.

The scenario which I described is not related to whether the
nextWALfile+1 exists or not. To clarify the detail of it;

If pg_standby restores nextWALfile, deletes the trigger file and
exits with 1 (i.e. tell the end of recovery to the startup process),
the startup process considers that pg_standby failed,
and tries to read the nextWALfile in pg_xlog instead of the
restored file named "RECOVERYXLOG". This is undesirable
behavior because some transactions would be lost if nextWALfile
in pg_xlog doesn't exist. So, exit(0) should be called when
nextWALfile exists.

On the other hand, if pg_standby restores the nextWALfile,
deletes the trigger file and calls exit(0), the startup process
replays the restored file and tries to read the nextWALfile+1
because it doesn't know if the nextWALfile is the last valid WAL
file. So, pg_standby may be executed again even after the trigger
file is deleted.

Am I missing something?

> That said, stupid question: do we have a way to know the nextWALfile+1
> name to test if it exists? nextWALfile is transmitted through the
> restore_command API and I'm wondering if we can have nextWALfile+1
> name without changing the restore_command API.

Probably Yes; the following three steps are required, I think.
- Get the timeline, logid and segid from the name of the nextWALfile.
- Increment the logid and segid pair using NextLogSeg macro.
- Calculate the name of the nextWALfile+1 using XLogFileName macro.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:

> if (triggered)
> {
>     if (smartMode && nextWALfile exists)
>         exit(0)
>     else
>     {
>         delete trigger file
>         exit(1)
>     }
> }

This looks to be the correct one.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Mon, Apr 13, 2009 at 2:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> But, a lookahead nextWALfile seems to work fine.
>
> if (triggered)
> {
>    if (smartMode && nextWALfile exists)
>        exit(0)
>    else
>    {
>        delete trigger file
>        exit(1)
>    }
> }

Umm... in this algorithm, the trigger file remains after failover
if the nextWALfile has the invalid record which means the end
of WAL files.

I'd like to propose another simple idea; pg_standby deletes the
trigger file *whenever* the nextWALfile is a timeline history file.
A timeline history file is restored at the end of recovery, so it's
guaranteed that the trigger file is deleted whether nextWALfile
exists or not.

A timeline history file is restored also at the beginning of
recovery, so the accidentally remaining trigger file is deleted
in early warm-standby as a side-effect of this idea.

How does that sound?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:

> A lookahead (the +1) may have pg_standby get stuck as follows.
> Am I missing something?
> 
> 1. the trigger file containing "smart" is created.
> 2. pg_standby is executed.
>     2-1. nextWALfile is restored.
>     2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
> 3. the restored nextWALfile is applied.
> 4. pg_standby is executed again to restore nextWALfile+1.

This can't happen. (4) will never occur when (2-2) has occurred. A
non-zero error code means file not available which will cause recovery
to end and hence no requests for further WAL files are made.

It does *seem* as if there is a race condition there in that another WAL
file may arrive after we have taken the decision there are no more WAL
files, but it's not a problem. That could happen if we issue the trigger
while the master is still up, which is a mistake - why would we do that?
If we only issue the trigger once we are happy the master is down then
we don't get a problem.

So lets do it the next+1 way, when triggered.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Tue, Apr 14, 2009 at 6:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:
>
>> A lookahead (the +1) may have pg_standby get stuck as follows.
>> Am I missing something?
>>
>> 1. the trigger file containing "smart" is created.
>> 2. pg_standby is executed.
>>     2-1. nextWALfile is restored.
>>     2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
>> 3. the restored nextWALfile is applied.
>> 4. pg_standby is executed again to restore nextWALfile+1.
>
> This can't happen. (4) will never occur when (2-2) has occurred. A
> non-zero error code means file not available which will cause recovery
> to end and hence no requests for further WAL files are made.

When pg_standby exits with non-zero code, (3) and (4) will never
occur, and the transactions in nextWALfile will be lost. So, in (2-2),
pg_standby has to call exit(0), I think.

On the other hand, if exit(0) is called in (2-2), the above scenario
happens.

> It does *seem* as if there is a race condition there in that another WAL
> file may arrive after we have taken the decision there are no more WAL
> files, but it's not a problem. That could happen if we issue the trigger
> while the master is still up, which is a mistake - why would we do that?
> If we only issue the trigger once we are happy the master is down then
> we don't get a problem.

Yeah, I agree that such race condition is not a problem. The
trigger file has to be created after all the WAL files arrive at
the standby server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


I've been following the thread with growing lack of understanding why
this is so hardly discussed, and I went back to the documentation of
what the restore_command should do (
http://www.postgresql.org/docs/8.3/static/warm-standby.html )

While the algorithm presented in the pseudocode isn't dealing too good
with a situation where the trigger is set while the restore_command is
sleeping (this should be handled better in a real implementation), the
code says

"Restore all wal files. If no more wal files are present, stop restoring
if the trigger is set; otherwise wait for a new wal file".

Since pg_standby is meant as implementation of restore_command, it has
to follow the directive stated above; *anything else is a bug*.
pg_standby currently does *not* obey this directive, and has that
documented, but a documented bug still is a bug.

Conclusion: There's no "new trigger option" needed, instead pg_standby
has to be fixed so it does what the warm standby option of postgres
needs. The trigger is only to be examined if no more files are
restorable, and only once.

Regards,
Andreas


Hi,

On Wed, Apr 15, 2009 at 3:30 AM, Andreas Pflug
<pgadmin@pse-consulting.de> wrote:
> I've been following the thread with growing lack of understanding why
> this is so hardly discussed, and I went back to the documentation of
> what the restore_command should do (
> http://www.postgresql.org/docs/8.3/static/warm-standby.html )
>
> While the algorithm presented in the pseudocode isn't dealing too good
> with a situation where the trigger is set while the restore_command is
> sleeping (this should be handled better in a real implementation), the
> code says
>
> "Restore all wal files. If no more wal files are present, stop restoring
> if the trigger is set; otherwise wait for a new wal file".
>
> Since pg_standby is meant as implementation of restore_command, it has
> to follow the directive stated above; *anything else is a bug*.
> pg_standby currently does *not* obey this directive, and has that
> documented, but a documented bug still is a bug.
>
> Conclusion: There's no "new trigger option" needed, instead pg_standby
> has to be fixed so it does what the warm standby option of postgres
> needs. The trigger is only to be examined if no more files are
> restorable, and only once.

Yeah, as a result of the discussion on that thread, I'll change
the default behavior instead of adding new trigger option.
But, I'm not going to get rid of the current behavior; it's chosen
if the trigger file containing "fast" exists. On the other hand,
new behavior is chosen when the trigger file containing "smart"
or an empty one exists (default).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I'd like to propose another simple idea; pg_standby deletes the
> trigger file *whenever* the nextWALfile is a timeline history file.
> A timeline history file is restored at the end of recovery, so it's
> guaranteed that the trigger file is deleted whether nextWALfile
> exists or not.
>
> A timeline history file is restored also at the beginning of
> recovery, so the accidentally remaining trigger file is deleted
> in early warm-standby as a side-effect of this idea.

Here is the revised patch as above.

If you notice something, please feel free to comment.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-04-15 at 17:02 +0900, Fujii Masao wrote:

> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > I'd like to propose another simple idea; pg_standby deletes the
> > trigger file *whenever* the nextWALfile is a timeline history file.
> > A timeline history file is restored at the end of recovery, so it's
> > guaranteed that the trigger file is deleted whether nextWALfile
> > exists or not.
> >
> > A timeline history file is restored also at the beginning of
> > recovery, so the accidentally remaining trigger file is deleted
> > in early warm-standby as a side-effect of this idea.
> 
> Here is the revised patch as above.
> 
> If you notice something, please feel free to comment.

Deleting the trigger file when we request a history file works in most
cases, but not in all. We also request a history file when we switch
timelines, so code comments need slight modification.

If take a base backup, switchover and then try to regen the primary from
the base backup we would need to switch timelines, which could be
problematic. That is unlikely, so we should at least very clearly
document the actual behaviour, as we do in the code comments.

I think your wording that smart mode guarantees no data will be lost is
a little strong. I'd say "on successful completion all WAL records will
be replayed resulting in zero data loss". 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi Simon,

Thanks for the comments!

On Thu, Apr 16, 2009 at 2:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Wed, 2009-04-15 at 17:02 +0900, Fujii Masao wrote:
>
>> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > I'd like to propose another simple idea; pg_standby deletes the
>> > trigger file *whenever* the nextWALfile is a timeline history file.
>> > A timeline history file is restored at the end of recovery, so it's
>> > guaranteed that the trigger file is deleted whether nextWALfile
>> > exists or not.
>> >
>> > A timeline history file is restored also at the beginning of
>> > recovery, so the accidentally remaining trigger file is deleted
>> > in early warm-standby as a side-effect of this idea.
>>
>> Here is the revised patch as above.
>>
>> If you notice something, please feel free to comment.
>
> Deleting the trigger file when we request a history file works in most
> cases, but not in all. We also request a history file when we switch
> timelines, so code comments need slight modification.
>
> If take a base backup, switchover and then try to regen the primary from
> the base backup we would need to switch timelines, which could be
> problematic. That is unlikely, so we should at least very clearly
> document the actual behaviour, as we do in the code comments.

"switch timelines" means that a new timeline ID is assigned at
the end of archive recovery? If so, even in this case, there is
no problem with deleting the trigger file, I think.
Or, am I misunderstanding?

> I think your wording that smart mode guarantees no data will be lost is
> a little strong. I'd say "on successful completion all WAL records will
> be replayed resulting in zero data loss".

Sounds good. I'll change the wording.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> I'd like to propose another simple idea; pg_standby deletes the
>> trigger file *whenever* the nextWALfile is a timeline history file.
>> A timeline history file is restored at the end of recovery, so it's
>> guaranteed that the trigger file is deleted whether nextWALfile
>> exists or not.
>>
>> A timeline history file is restored also at the beginning of
>> recovery, so the accidentally remaining trigger file is deleted
>> in early warm-standby as a side-effect of this idea.
> 
> Here is the revised patch as above.

I think we have gone off to an overly complicated solution. pg_standby 
shouldn't need to special-case history files, or know what order the 
server will ask for them.

What's wrong with just this: (ignoring the missing fast option)

--- a/contrib/pg_standby/pg_standby.c
+++ b/contrib/pg_standby/pg_standby.c
@@ -552,8 +552,6 @@ main(int argc, char **argv)                break;            case 't':           /* Trigger file */
              triggerPath = optarg;
 
-               if (CheckForExternalTrigger())
-                   exit(1);    /* Normal exit, with non-zero */                break;            case 'w':
/*Max wait time */                maxwaittime = atoi(optarg);
 

ie. only check and delete the trigger file once the server requests a 
file that doesn't exist.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Mon, Apr 20, 2009 at 6:06 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com>
>> wrote:
>>>
>>> I'd like to propose another simple idea; pg_standby deletes the
>>> trigger file *whenever* the nextWALfile is a timeline history file.
>>> A timeline history file is restored at the end of recovery, so it's
>>> guaranteed that the trigger file is deleted whether nextWALfile
>>> exists or not.
>>>
>>> A timeline history file is restored also at the beginning of
>>> recovery, so the accidentally remaining trigger file is deleted
>>> in early warm-standby as a side-effect of this idea.
>>
>> Here is the revised patch as above.
>
> I think we have gone off to an overly complicated solution. pg_standby
> shouldn't need to special-case history files, or know what order the server
> will ask for them.
>
> What's wrong with just this: (ignoring the missing fast option)
>
> --- a/contrib/pg_standby/pg_standby.c
> +++ b/contrib/pg_standby/pg_standby.c
> @@ -552,8 +552,6 @@ main(int argc, char **argv)
>                break;
>            case 't':           /* Trigger file */
>                triggerPath = optarg;
> -               if (CheckForExternalTrigger())
> -                   exit(1);    /* Normal exit, with non-zero */
>                break;
>            case 'w':           /* Max wait time */
>                maxwaittime = atoi(optarg);
>
> ie. only check and delete the trigger file once the server requests a file
> that doesn't exist.

Thanks for the suggestion! I have three comments.

1)
Though some users want to save the fast failover mode, this
solution doesn't provide it. You think that that mode is
unnecessary?

2)
This solution would go wrong when there are the WAL files in
pg_xlog; If pg_xlog of the primary server can be read at failover
(it's not node failure but process failure case), the WAL files in
it may be copied to pg_xlog of the standby server in order to
prevent the transaction loss.

On the other hand, we can copy such files to the archival storage
instead of pg_xlog. In this case, your simple solution goes well,
but recovery would unexpectedly end without the trigger file
because those WAL files have the invalid record which means
the end of WAL.

I'd like to control when the standby will come up as a normal
server. So, I think that pg_standby should cope with also the
above situation which I described. What is your opinion?

3)
This solution has a race condition; some transactions would
be lost if WAL file is shipped from the primary server and the
trigger file is created, while pg_standby is sleeping, because
pg_standby checks previously whether a trigger file exists.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Warm Standby restore_command documentation

From
Heikki Linnakangas
Date:
Andreas Pflug wrote:
> I've been following the thread with growing lack of understanding why
> this is so hardly discussed, and I went back to the documentation of
> what the restore_command should do (
> http://www.postgresql.org/docs/8.3/static/warm-standby.html )
> 
> While the algorithm presented in the pseudocode isn't dealing too good
> with a situation where the trigger is set while the restore_command is
> sleeping (this should be handled better in a real implementation), the
> code says
> 
> "Restore all wal files. If no more wal files are present, stop restoring
> if the trigger is set; otherwise wait for a new wal file".
> 
> Since pg_standby is meant as implementation of restore_command, it has
> to follow the directive stated above; *anything else is a bug*.
> pg_standby currently does *not* obey this directive, and has that
> documented, but a documented bug still is a bug.

I think you're interpreting the chapter too strongly. The provided 
pseudo-code is just an example of a suitable restore_command, it doesn't 
say that pg_standby behaves exactly like that.

I agree we should change the default behavior, though.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Warm Standby restore_command documentation

From
Andreas Pflug
Date:
Heikki Linnakangas wrote:
> Andreas Pflug wrote:
>> I've been following the thread with growing lack of understanding why
>> this is so hardly discussed, and I went back to the documentation of
>> what the restore_command should do (
>> http://www.postgresql.org/docs/8.3/static/warm-standby.html )
>>
>> While the algorithm presented in the pseudocode isn't dealing too good
>> with a situation where the trigger is set while the restore_command is
>> sleeping (this should be handled better in a real implementation), the
>> code says
>>
>> "Restore all wal files. If no more wal files are present, stop restoring
>> if the trigger is set; otherwise wait for a new wal file".
>>
>> Since pg_standby is meant as implementation of restore_command, it has
>> to follow the directive stated above; *anything else is a bug*.
>> pg_standby currently does *not* obey this directive, and has that
>> documented, but a documented bug still is a bug.
>
> I think you're interpreting the chapter too strongly. The provided
> pseudo-code is just an example of a suitable restore_command, it
> doesn't say that pg_standby behaves exactly like that. 
After reading that chapter, I assumed that pg_standby actually does work
like this, and skipped reading the pg_standby specific doc....
The pgsql doc tries hard to give best advice for common situations,
especially for integrity and safety issues. IMHO it's best to have the
warm-standby chapter as reference how things should work for typical
use-cases.

Regards,
Andreas





Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Mon, Apr 20, 2009 at 6:06 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> What's wrong with just this: (ignoring the missing fast option)
>>
>> --- a/contrib/pg_standby/pg_standby.c
>> +++ b/contrib/pg_standby/pg_standby.c
>> @@ -552,8 +552,6 @@ main(int argc, char **argv)
>>                break;
>>            case 't':           /* Trigger file */
>>                triggerPath = optarg;
>> -               if (CheckForExternalTrigger())
>> -                   exit(1);    /* Normal exit, with non-zero */
>>                break;
>>            case 'w':           /* Max wait time */
>>                maxwaittime = atoi(optarg);
>>
>> ie. only check and delete the trigger file once the server requests a file
>> that doesn't exist.
> 
> Thanks for the suggestion! I have three comments.
> 
> 1)
> Though some users want to save the fast failover mode, this
> solution doesn't provide it. You think that that mode is
> unnecessary?

No I just left it out to keep it simple. The patch was only meant to 
make the point, I didn't intend it to be applied as is.

> 2)
> This solution would go wrong when there are the WAL files in
> pg_xlog; If pg_xlog of the primary server can be read at failover
> (it's not node failure but process failure case), the WAL files in
> it may be copied to pg_xlog of the standby server in order to
> prevent the transaction loss.
> 
> On the other hand, we can copy such files to the archival storage
> instead of pg_xlog. In this case, your simple solution goes well,
> but recovery would unexpectedly end without the trigger file
> because those WAL files have the invalid record which means
> the end of WAL.
> 
> I'd like to control when the standby will come up as a normal
> server. So, I think that pg_standby should cope with also the
> above situation which I described. What is your opinion?

Hmm, the user manual instructs to copy any unarchived files directly 
into pg_xlog, but I guess you could copy them to the archive instead.

At the end of archive recovery, the server always probes for the 
timeline by requesting history files until it fails to find one. That 
probing should remove the trigger file if it hasn't been removed by 
then. It's a bit coincidental to rely on that, but at least it's simple. 
The assumption we're making is that the server won't exit recovery 
before asking restore_command for a file that doesn't exist.

> 3)
> This solution has a race condition; some transactions would
> be lost if WAL file is shipped from the primary server and the
> trigger file is created, while pg_standby is sleeping, because
> pg_standby checks previously whether a trigger file exists.

Yeah, it should sleep only after checking for the trigger file. That 
doesn't completely eliminate the race condition though: I think it 
should check again that the WAL file doesn't exist after reading the 
trigger file but before deleting it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Mon, 2009-04-20 at 17:47 +0300, Heikki Linnakangas wrote:

> At the end of archive recovery, the server always probes for the 
> timeline by requesting history files until it fails to find one. That 
> probing should remove the trigger file if it hasn't been removed by 
> then. It's a bit coincidental to rely on that, but at least it's simple. 
> The assumption we're making is that the server won't exit recovery 
> before asking restore_command for a file that doesn't exist.

If you really want to simplify this, then we should have a final_command
parameter for a command to be executed at the end of recovery. That
would make the change to pg_standby very simple and allow for a very
simple final_command also. That would make the logic similar to what we
do for aggregates: transition function and final function. 

We could call it restore_cleanup_command or something similar.

I suspect this option will make you consider Fujii-san's patch in a
better light. :-)

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Mon, 2009-04-20 at 17:47 +0300, Heikki Linnakangas wrote:
> 
>> At the end of archive recovery, the server always probes for the 
>> timeline by requesting history files until it fails to find one. That 
>> probing should remove the trigger file if it hasn't been removed by 
>> then. It's a bit coincidental to rely on that, but at least it's simple. 
>> The assumption we're making is that the server won't exit recovery 
>> before asking restore_command for a file that doesn't exist.
> 
> If you really want to simplify this, then we should have a final_command
> parameter for a command to be executed at the end of recovery. That
> would make the change to pg_standby very simple and allow for a very
> simple final_command also. That would make the logic similar to what we
> do for aggregates: transition function and final function. 
> 
> We could call it restore_cleanup_command or something similar.

Hmm, that might indeed be a cleaner interface. However, that throws the 
idea of backpatching out of the window, and will make it impossible to 
run a PG 8.4 pg_standby against a PG 8.3 server. Do we want to add the 
new parameter for 8.4 anyway?

> I suspect this option will make you consider Fujii-san's patch in a
> better light. :-)

No, removing trigger file as soon as a non-existant file is requested 
still seems simpler than deleting it whenever a timeline history file is 
requested.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Tue, 2009-04-21 at 14:17 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2009-04-20 at 17:47 +0300, Heikki Linnakangas wrote:
> > 
> >> At the end of archive recovery, the server always probes for the 
> >> timeline by requesting history files until it fails to find one. That 
> >> probing should remove the trigger file if it hasn't been removed by 
> >> then. It's a bit coincidental to rely on that, but at least it's simple. 
> >> The assumption we're making is that the server won't exit recovery 
> >> before asking restore_command for a file that doesn't exist.
> > 
> > If you really want to simplify this, then we should have a final_command
> > parameter for a command to be executed at the end of recovery. That
> > would make the change to pg_standby very simple and allow for a very
> > simple final_command also. That would make the logic similar to what we
> > do for aggregates: transition function and final function. 
> > 
> > We could call it restore_cleanup_command or something similar.
> 
> Hmm, that might indeed be a cleaner interface. However, that throws the 
> idea of backpatching out of the window, and will make it impossible to 
> run a PG 8.4 pg_standby against a PG 8.3 server. Do we want to add the 
> new parameter for 8.4 anyway?

Perhaps, let's see how we resolve the perceived 8.2 and 8.3 issues.

> > I suspect this option will make you consider Fujii-san's patch in a
> > better light. :-)
> 
> No, removing trigger file as soon as a non-existant file is requested 
> still seems simpler than deleting it whenever a timeline history file is 
> requested.

If you do this, then you would have to change the procedure written into
the 8.3 docs also. Docs aren't backpatchable.

What you propose is *better* than raw pg_standby is now, but still not
enough in all cases, as I think you know. Simple isn't the requirement
here, is it?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> If you do this, then you would have to change the procedure written into
> the 8.3 docs also. Docs aren't backpatchable.
> 
> What you propose is *better* than raw pg_standby is now, but still not
> enough in all cases, as I think you know.

No, I don't. What is the case where it doesn't work?
> Simple isn't the requirement here, is it?

Simplicity is always a virtue, because it leads to maintainability.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Tue, 2009-04-21 at 14:28 +0300, Heikki Linnakangas wrote:

>  > Simple isn't the requirement here, is it?
> 
> Simplicity is always a virtue, because it leads to maintainability.

"Simple enough" is a virtue. Less than that is not...

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Simon Riggs wrote:
>>
>> If you do this, then you would have to change the procedure written into
>> the 8.3 docs also. Docs aren't backpatchable.
>>
>> What you propose is *better* than raw pg_standby is now, but still not
>> enough in all cases, as I think you know.
>
> No, I don't. What is the case where it doesn't work?

It's the case which I described as the 2nd comment to your
proposal.

1. pg_standby tries to restore a non-existent file
1-1. remove the trigger file
1-2. pg_standby exits with non-zero code
2. the startup process tries to read it from pg_xlog
2-1. it is applied
3. the startup process tries to restore the next file using pg_standby
3-1. pg_standby gets *stuck* since the requested file and trigger file      don't exist.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Simon Riggs wrote:
>>> What you propose is *better* than raw pg_standby is now, but still not
>>> enough in all cases, as I think you know.
>> No, I don't. What is the case where it doesn't work?
> 
> It's the case which I described as the 2nd comment to your
> proposal.
> 
> 1. pg_standby tries to restore a non-existent file
> 1-1. remove the trigger file
> 1-2. pg_standby exits with non-zero code
> 2. the startup process tries to read it from pg_xlog
> 2-1. it is applied
> 3. the startup process tries to restore the next file using pg_standby
> 3-1. pg_standby gets *stuck* since the requested file and trigger file
>        don't exist.

Ahh, ok, I didn't understand the issue correctly before.

But wait a minute, we already have exactly the same problem with the 
current 8.2/8.3 pg_standby, don't we? [tests]. Yes, we do.

Simon's suggestion of a separate restore_completion_command is very 
attractive as it would provide an explicit place to hook up the deletion 
of the trigger file. It seems useful anyway, you might want to put a 
command there to e.g update a log file or launch some custom daemon 
software when the recovery ends. The question then is what to do with 
8.2 and 8.3? Even if we decided to keep the behavior that the failover 
is triggered immediately (fast mode), pg_standby getting stuck if you 
copy any WAL files directly into pg_xlog seems like a bug that needs to 
be fixed.

Fujii's idea of deleting the trigger file when history file is requested 
is the only proposal this far that works and doesn't require changes to 
people's config files, so I guess that's what we'll have to do at least 
for back-branches.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Andreas Pflug
Date:
Fujii Masao wrote:
> Hi,
>
> On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>   
>> Simon Riggs wrote:
>>     
>>> If you do this, then you would have to change the procedure written into
>>> the 8.3 docs also. Docs aren't backpatchable.
>>>
>>> What you propose is *better* than raw pg_standby is now, but still not
>>> enough in all cases, as I think you know.
>>>       
>> No, I don't. What is the case where it doesn't work?
>>     
>
> It's the case which I described as the 2nd comment to your
> proposal.
>
> 1. pg_standby tries to restore a non-existent file
> 1-1. remove the trigger file
> 1-2. pg_standby exits with non-zero code
> 2. the startup process tries to read it from pg_xlog
> 2-1. it is applied
> 3. the startup process tries to restore the next file using pg_standby
>   
I'm a little confused. After pg_standby returned non-zero as indication
for end-of-recovery, the startup process shouldn't request another file
from pg_standby, right? Which means 3. should never happen (unless the
startup process stalls and restarts, in which case I find it normal that
another trigger required).

Regards,
Andreas


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Tue, 2009-04-21 at 15:55 +0300, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > On Tue, Apr 21, 2009 at 8:28 PM, Heikki Linnakangas
> > <heikki.linnakangas@enterprisedb.com> wrote:
> >> Simon Riggs wrote:
> >>> What you propose is *better* than raw pg_standby is now, but still not
> >>> enough in all cases, as I think you know.
> >> No, I don't. What is the case where it doesn't work?
> > 
> > It's the case which I described as the 2nd comment to your
> > proposal.
> > 
> > 1. pg_standby tries to restore a non-existent file
> > 1-1. remove the trigger file
> > 1-2. pg_standby exits with non-zero code
> > 2. the startup process tries to read it from pg_xlog
> > 2-1. it is applied
> > 3. the startup process tries to restore the next file using pg_standby
> > 3-1. pg_standby gets *stuck* since the requested file and trigger file
> >        don't exist.
> 
> Ahh, ok, I didn't understand the issue correctly before.
> 
> But wait a minute, we already have exactly the same problem with the 
> current 8.2/8.3 pg_standby, don't we? [tests]. Yes, we do.
> 
> Simon's suggestion of a separate restore_completion_command is very 
> attractive as it would provide an explicit place to hook up the deletion 
> of the trigger file. It seems useful anyway, you might want to put a 
> command there to e.g update a log file or launch some custom daemon 
> software when the recovery ends. The question then is what to do with 
> 8.2 and 8.3? Even if we decided to keep the behavior that the failover 
> is triggered immediately (fast mode), pg_standby getting stuck if you 
> copy any WAL files directly into pg_xlog seems like a bug that needs to 
> be fixed.
> 
> Fujii's idea of deleting the trigger file when history file is requested 
> is the only proposal this far that works and doesn't require changes to 
> people's config files, so I guess that's what we'll have to do at least 
> for back-branches.

Agreed. Fujii-san's proposal is the only one that covers all the
important things. The assumptions need careful documentation, as you
say.

The idea of a restore_completion_command does still sound attractive,
easy to implement and non-intrusive enough to do so right now.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Andreas Pflug wrote:
> I'm a little confused. After pg_standby returned non-zero as indication
> for end-of-recovery, the startup process shouldn't request another file
> from pg_standby, right?

Non-zero return value from restore_command doesn't mean end-of-recovery, 
it means file-not-found. The server will try to open the WAL file from 
pg_xlog if it's not found in archive (= restore_command returned 
non-zero). If it's found in pg_xlog, it will then ask for the next WAL 
file from the archive again. And on top of that, the server will ask for 
history files until it finds one that doesn't exist to figure out the 
new timeline.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
David Fetter
Date:
On Tue, Apr 21, 2009 at 12:25:50PM +0100, Simon Riggs wrote:
> > No, removing trigger file as soon as a non-existant file is
> > requested still seems simpler than deleting it whenever a timeline
> > history file is requested.
> 
> If you do this, then you would have to change the procedure written
> into the 8.3 docs also.  Docs aren't backpatchable.

Are you sure?  We've found errors in them before and fixed them.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> Hi,
> 
> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> I'd like to propose another simple idea; pg_standby deletes the
>> trigger file *whenever* the nextWALfile is a timeline history file.
>> A timeline history file is restored at the end of recovery, so it's
>> guaranteed that the trigger file is deleted whether nextWALfile
>> exists or not.
>>
>> A timeline history file is restored also at the beginning of
>> recovery, so the accidentally remaining trigger file is deleted
>> in early warm-standby as a side-effect of this idea.
> 
> Here is the revised patch as above.
> 
> If you notice something, please feel free to comment.

Ok, looking at this in more detail now. A couple of small things:

We mustn't remove the trigger file immediately even in fast mode. As 
noted elsewhere in this thread, we have the same bug in fast mode where 
pg_standby gets stuck if you copy WAL files directly into pg_xlog.

pg_standby should exit with code 0 only if the file is restore 
successfully. As the patch stands it also returns 0 when smart failover 
is done.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

Thanks for reviewing the patch!

On Wed, Apr 22, 2009 at 4:27 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> Hi,
>>
>> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com>
>> wrote:
>>>
>>> I'd like to propose another simple idea; pg_standby deletes the
>>> trigger file *whenever* the nextWALfile is a timeline history file.
>>> A timeline history file is restored at the end of recovery, so it's
>>> guaranteed that the trigger file is deleted whether nextWALfile
>>> exists or not.
>>>
>>> A timeline history file is restored also at the beginning of
>>> recovery, so the accidentally remaining trigger file is deleted
>>> in early warm-standby as a side-effect of this idea.
>>
>> Here is the revised patch as above.
>>
>> If you notice something, please feel free to comment.
>
> Ok, looking at this in more detail now. A couple of small things:
>
> We mustn't remove the trigger file immediately even in fast mode. As noted
> elsewhere in this thread, we have the same bug in fast mode where pg_standby
> gets stuck if you copy WAL files directly into pg_xlog.

Yes, there is the same problem also in fast mode. But, in fast
mode, the trigger file has to be deleted immediately if it's found.
Otherwise, recovery may fail as follows.

1. pg_standby finds the trigger file for fast mode, and returns   non-zero without deleting the trigger file.
2. the startup process tries to read the WAL file from pg_xlog,   but it's not found.
3. the startup process tries to restore the last applied WAL file   using pg_standby.
4. (Again) pg_standby finds the trigger file for fast mode, and   returns non-zero without deleting the trigger file.
5. the startup process tries to read the last applied WAL file,   but it's not found.   (though the last applied file
wasof course restored before,    the restored one cannot be read here)
 
6. recovery fails because the last applied WAL file cannot be   read.

On the other hand, if pg_standby returns 0 also in fast mode
when the requested file and trigger file exist, ISTM that there
is not much difference between fast and smart mode; also in
fast mode, all the available WAL files would be applied.

So, I left fast mode as it was (as much as possible) though it has
the above problem, which is for backward compatibility.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Tue, 2009-04-21 at 09:59 -0700, David Fetter wrote:
> On Tue, Apr 21, 2009 at 12:25:50PM +0100, Simon Riggs wrote:
> > > No, removing trigger file as soon as a non-existant file is
> > > requested still seems simpler than deleting it whenever a timeline
> > > history file is requested.
> > 
> > If you do this, then you would have to change the procedure written
> > into the 8.3 docs also.  Docs aren't backpatchable.
> 
> Are you sure?  We've found errors in them before and fixed them.

The proposed change is not a bug fix -- hmmm, or is it?

I think we have a way that does not require this...

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Wed, Apr 22, 2009 at 4:27 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao <masao.fujii@gmail.com>
>>> wrote:
>>>> I'd like to propose another simple idea; pg_standby deletes the
>>>> trigger file *whenever* the nextWALfile is a timeline history file.
>>>> A timeline history file is restored at the end of recovery, so it's
>>>> guaranteed that the trigger file is deleted whether nextWALfile
>>>> exists or not.
>>>>
>>>> A timeline history file is restored also at the beginning of
>>>> recovery, so the accidentally remaining trigger file is deleted
>>>> in early warm-standby as a side-effect of this idea.
>>> Here is the revised patch as above.
>>>
>>> If you notice something, please feel free to comment.
>> Ok, looking at this in more detail now. A couple of small things:
>>
>> We mustn't remove the trigger file immediately even in fast mode. As noted
>> elsewhere in this thread, we have the same bug in fast mode where pg_standby
>> gets stuck if you copy WAL files directly into pg_xlog.
> 
> Yes, there is the same problem also in fast mode. But, in fast
> mode, the trigger file has to be deleted immediately if it's found.
> Otherwise, recovery may fail as follows.
> 
> 1. pg_standby finds the trigger file for fast mode, and returns
>     non-zero without deleting the trigger file.
> 2. the startup process tries to read the WAL file from pg_xlog,
>     but it's not found.
> 3. the startup process tries to restore the last applied WAL file
>     using pg_standby.
> 4. (Again) pg_standby finds the trigger file for fast mode, and
>     returns non-zero without deleting the trigger file.
> 5. the startup process tries to read the last applied WAL file,
>     but it's not found.
>     (though the last applied file was of course restored before,
>      the restored one cannot be read here)
> 6. recovery fails because the last applied WAL file cannot be
>     read.
> 
> On the other hand, if pg_standby returns 0 also in fast mode
> when the requested file and trigger file exist, ISTM that there
> is not much difference between fast and smart mode; also in
> fast mode, all the available WAL files would be applied.

Hmm, pg_standby could truncate the trigger file, so that it acts like a 
smart trigger in the subsequent pg_standby invocations. Assuming the 
postgres user has write access to it; it probably does because it can 
delete it, but conceivably it has only read access on the file but write 
access on the directory it's in.

This is getting complicated, though. I guess it would be enough to 
document that you mustn't copy any extra files into pg_xlog if you use a 
fast trigger.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Thu, Apr 23, 2009 at 4:49 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> This is getting complicated, though. I guess it would be enough to document
> that you mustn't copy any extra files into pg_xlog if you use a fast
> trigger.

Agreed. I added this note into document.

Attached is the updated patch. I also fixed my bug which
pg_standby returns 0 even if the requested file fails to be
restored in smart mode.

This patch is ready to commit, I think. Please review this.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Apr 23, 2009 at 4:49 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> This is getting complicated, though. I guess it would be enough to document
>> that you mustn't copy any extra files into pg_xlog if you use a fast
>> trigger.
> 
> Agreed. I added this note into document.
> 
> Attached is the updated patch. I also fixed my bug which
> pg_standby returns 0 even if the requested file fails to be
> restored in smart mode.
> 
> This patch is ready to commit, I think. Please review this.

Looking at this again..

Deleting the trigger file when a history file is requested:

>         /*
>          * Get rid of the trigger file at the end of archive recovery.
>          * Otherwise, it would unexpectedly cause the subsequent warm-standby to
>          * end.
>          *
>          * Here is the right place to remove the trigger file since a timeline
>          * history file is requested only at the beginning and end of archive
>          * recovery.
>          */

changes the behavior in a subtle way: if you create trigger file before 
starting recovery, it will be deleted when the recovery is started and 
no failover is done. Currently, it will end the recovery immediately.

That makes me uncomfortable to back-patch this. That change in behavior 
might be hard to work-around: the process that creates the trigger file 
would have to make sure that the server has started recovery before 
creating the file.

Here's another idea: Let's modify xlog.c so that when the server asks 
for WAL file X, and restore_command returns "not found", the server will 
not ask for any WAL files >= X again (in that recovery session). 
Presumably if X doesn't exist, no later files will exist either. That 
would be pretty simple change, and it would allow us to go with the 
simpler implementation in pg_standby and just remove the trigger file 
immediately when it returns "not found" (instead of removing it when 
history file is requested). That would make it safe to copy extra WAL 
files into pg_xlog, even in fast failover mode.

Does anyone see a hole in that idea?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Tue, 2009-05-12 at 14:15 +0300, Heikki Linnakangas wrote:

> Here's another idea: Let's modify xlog.c so that when the server asks 
> for WAL file X, and restore_command returns "not found", the server
> will not ask for any WAL files >= X again (in that recovery session). 
> Presumably if X doesn't exist, no later files will exist either.

That was proposed and rejected earlier, though not in this context.
Sounds fine to me.

I agree that we should simplify/clarify the file request pattern, which
is the source of many difficulties over last few years.

I would further propose that I take pg_standby out as a module, so we
can more easily support earlier releases.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Tue, 2009-05-12 at 14:15 +0300, Heikki Linnakangas wrote:
> 
>> Here's another idea: Let's modify xlog.c so that when the server asks 
>> for WAL file X, and restore_command returns "not found", the server
>> will not ask for any WAL files >= X again (in that recovery session). 
>> Presumably if X doesn't exist, no later files will exist either.
> 
> That was proposed and rejected earlier, though not in this context.

Got a link? I'd like to check the past discussion.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Tue, 2009-05-12 at 14:38 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Tue, 2009-05-12 at 14:15 +0300, Heikki Linnakangas wrote:
> > 
> >> Here's another idea: Let's modify xlog.c so that when the server asks 
> >> for WAL file X, and restore_command returns "not found", the server
> >> will not ask for any WAL files >= X again (in that recovery session). 
> >> Presumably if X doesn't exist, no later files will exist either.
> > 
> > That was proposed and rejected earlier, though not in this context.
> 
> Got a link? I'd like to check the past discussion.

I'm sorry, I don't. One of the recovery related discussions started by
me.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Tue, May 12, 2009 at 8:15 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> On Thu, Apr 23, 2009 at 4:49 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>
>>> This is getting complicated, though. I guess it would be enough to
>>> document
>>> that you mustn't copy any extra files into pg_xlog if you use a fast
>>> trigger.
>>
>> Agreed. I added this note into document.
>>
>> Attached is the updated patch. I also fixed my bug which
>> pg_standby returns 0 even if the requested file fails to be
>> restored in smart mode.
>>
>> This patch is ready to commit, I think. Please review this.
>
> Looking at this again..
>
> Deleting the trigger file when a history file is requested:
>
>>                /*
>>                 * Get rid of the trigger file at the end of archive
>> recovery.
>>                 * Otherwise, it would unexpectedly cause the subsequent
>> warm-standby to
>>                 * end.
>>                 *
>>                 * Here is the right place to remove the trigger file since
>> a timeline
>>                 * history file is requested only at the beginning and end
>> of archive
>>                 * recovery.
>>                 */
>
> changes the behavior in a subtle way: if you create trigger file before
> starting recovery, it will be deleted when the recovery is started and no
> failover is done. Currently, it will end the recovery immediately.
>
> That makes me uncomfortable to back-patch this. That change in behavior
> might be hard to work-around: the process that creates the trigger file
> would have to make sure that the server has started recovery before creating
> the file.
>
> Here's another idea: Let's modify xlog.c so that when the server asks for
> WAL file X, and restore_command returns "not found", the server will not ask
> for any WAL files >= X again (in that recovery session). Presumably if X
> doesn't exist, no later files will exist either. That would be pretty simple
> change, and it would allow us to go with the simpler implementation in
> pg_standby and just remove the trigger file immediately when it returns "not
> found" (instead of removing it when history file is requested). That would
> make it safe to copy extra WAL files into pg_xlog, even in fast failover
> mode.
>
> Does anyone see a hole in that idea?

Probably yes. The trigger file would remain after failover if the
restored WAL file has the invalid record which means the end
of WAL. In this case, "not found" is not returned.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Tue, May 12, 2009 at 8:15 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Here's another idea: Let's modify xlog.c so that when the server asks for
>> WAL file X, and restore_command returns "not found", the server will not ask
>> for any WAL files >= X again (in that recovery session). Presumably if X
>> doesn't exist, no later files will exist either. That would be pretty simple
>> change, and it would allow us to go with the simpler implementation in
>> pg_standby and just remove the trigger file immediately when it returns "not
>> found" (instead of removing it when history file is requested). That would
>> make it safe to copy extra WAL files into pg_xlog, even in fast failover
>> mode.
>>
>> Does anyone see a hole in that idea?
> 
> Probably yes. The trigger file would remain after failover if the
> restored WAL file has the invalid record which means the end
> of WAL. In this case, "not found" is not returned.

Yep. That's not pleasant either :-(.

I don't think we're going to get this to work reliably without extending 
the interface between the backend and restore_command. We've discussed 
many methods and there's always some nasty corner-case like that.

I think we should leave back-branches as is, and go with Simon's 
suggestion to add new "recovery_end_command" that's run when the 
recovery is finished. That's simpler and more reliable than any of the 
other approaches we've discussed, and might become handy for other 
purposes as well.

Does someone want to take a stab at writing a patch for that?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I don't think we're going to get this to work reliably without extending 
> the interface between the backend and restore_command. We've discussed 
> many methods and there's always some nasty corner-case like that.

> I think we should leave back-branches as is, and go with Simon's 
> suggestion to add new "recovery_end_command" that's run when the 
> recovery is finished. That's simpler and more reliable than any of the 
> other approaches we've discussed, and might become handy for other 
> purposes as well.

> Does someone want to take a stab at writing a patch for that?

Does this conclusion mean that changing pg_standby is no longer
on the table for 8.4?  It certainly smells more like a new feature
than a bug fix.
        regards, tom lane


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 13:01 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > I don't think we're going to get this to work reliably without extending 
> > the interface between the backend and restore_command. We've discussed 
> > many methods and there's always some nasty corner-case like that.

Agreed.

> > I think we should leave back-branches as is, and go with Simon's 
> > suggestion to add new "recovery_end_command" that's run when the 
> > recovery is finished. That's simpler and more reliable than any of the 
> > other approaches we've discussed, and might become handy for other 
> > purposes as well.

That is the cleanest way, though we cannot really avoid acting for
backbranches also.

> > Does someone want to take a stab at writing a patch for that?

No, not if there is a likelihood the work would be wasted.

> Does this conclusion mean that changing pg_standby is no longer
> on the table for 8.4?  It certainly smells more like a new feature
> than a bug fix.

I don't really understand this comment. Why would fixing a memory leak
be worthwhile when fixing a potential for data loss be a deferrable
activity?


I will set-up pg_standby as an external module and we can change it from
there. No more discussions-for-8.4 and I can update as required to
support each release. So let's just remove it from contrib and be done.
Counterthoughts?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> I will set-up pg_standby as an external module and we can change it from
> there. No more discussions-for-8.4 and I can update as required to
> support each release. So let's just remove it from contrib and be done.

Huh?  The proposed fix involves a backend change, so I don't see how
removing pg_standby from the distribution frees you from the constraints
of our versioning.
        regards, tom lane


Re: New trigger option of pg_standby

From
Andrew Dunstan
Date:

Simon Riggs wrote:
> I will set-up pg_standby as an external module and we can change it from
> there. No more discussions-for-8.4 and I can update as required to
> support each release. So let's just remove it from contrib and be done.
> Counterthoughts?
>
>   

We're in Beta. You can't just go yanking stuff like that. Beta testers 
will be justifiably very annoyed.

Please calm down.

pg_standby is useful and needs to be correct. And its existence as a 
standard module is one of the things that has made me feel confident 
about recommending people to use the PITR stuff. I'll be very annoyed if 
it were to get pulled.

cheers

andrew


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> I don't think we're going to get this to work reliably without extending 
>> the interface between the backend and restore_command. We've discussed 
>> many methods and there's always some nasty corner-case like that.
> 
>> I think we should leave back-branches as is, and go with Simon's 
>> suggestion to add new "recovery_end_command" that's run when the 
>> recovery is finished. That's simpler and more reliable than any of the 
>> other approaches we've discussed, and might become handy for other 
>> purposes as well.
> 
>> Does someone want to take a stab at writing a patch for that?
> 
> Does this conclusion mean that changing pg_standby is no longer
> on the table for 8.4?  It certainly smells more like a new feature
> than a bug fix.

This whole thing can be considered to be a new feature. It's working as 
designed. But people seem to be surprised about the current behavior (me 
included), and we don't currently provide the behavior that most people 
actually want. I think we should fix it for 8.4.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
"Joshua D. Drake"
Date:
On Wed, 2009-05-13 at 14:14 -0400, Andrew Dunstan wrote:

> pg_standby is useful and needs to be correct. And its existence as a 
> standard module is one of the things that has made me feel confident 
> about recommending people to use the PITR stuff. I'll be very annoyed if 
> it were to get pulled.

Although I am not advocating one position or another there are benefits
to removing it. It would be nice to continue to enhance pg_standby
without the limitations of the core release schedule.

Sincerely,

Joshua D. Drake


-- 
PostgreSQL - XMPP: jdrake@jabber.postgresql.org  Consulting, Development, Support, Training  503-667-4564 -
http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
 



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2009-05-13 at 13:01 -0400, Tom Lane wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> Does someone want to take a stab at writing a patch for that?
> 
> No, not if there is a likelihood the work would be wasted.

There always is.

(I would've wrote the patch myself right away, but I'm extremely busy at 
the moment. :-( Might take one more day before I get the time to finish 
it, and we don't have much time)

>> Does this conclusion mean that changing pg_standby is no longer
>> on the table for 8.4?  It certainly smells more like a new feature
>> than a bug fix.
> 
> I don't really understand this comment. Why would fixing a memory leak
> be worthwhile when fixing a potential for data loss be a deferrable
> activity?

Because the data loss is working as designed and documented, even though 
the design is not what most people want and the documentation could say 
that more prominently. That said, I'm in favor of changing this for 8.4.

> I will set-up pg_standby as an external module and we can change it from
> there. No more discussions-for-8.4 and I can update as required to
> support each release. So let's just remove it from contrib and be done.
> Counterthoughts?

That's a lot more drastic change to make in beta. Besides, the proposed 
fix required backend changes. I think we should keep it in contrib. (At 
least for this release: If we get more integrated replication options in 
8.5, that would be a good time to move pg_standby out of contrib if 
that's what we want.)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:

> I think we should fix it for 8.4.

Agreed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Andreas Pflug
Date:
Andrew Dunstan wrote:
>>  
> We're in Beta. You can't just go yanking stuff like that. Beta testers 
> will be justifiably very annoyed.
>
> Please calm down.
>
> pg_standby is useful and needs to be correct. And its existence as a 
> standard module is one of the things that has made me feel confident 
> about recommending people to use the PITR stuff. I'll be very annoyed 
> if it were to get pulled.

Since mentioned in the docs, I consider it at least the semi-official 
tool for pgsql PITR handling. But as this discussion reveals, the api is 
flawed, and will not allow guaranteed consistency (whatever pg_standby 
tries) until fixed. While this may not be a bug of the restore_script 
call, the pitr procedure in total is partially broken (in the sense that 
it doesn't provide what most users expect in a secure way) and thus 
needs to be fixed. It seems a fix can't be provided without extending 
the api.

Regards,
Andreas


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 14:14 -0400, Andrew Dunstan wrote:

> pg_standby is useful and needs to be correct. 

My suggestion was designed to provide this. A misunderstanding.

> And its existence as a 
> standard module is one of the things that has made me feel confident 
> about recommending people to use the PITR stuff. I'll be very annoyed if 
> it were to get pulled.

If we cannot make it correct within core, then I will make it correct
somewhere else, beta or not. Other than that, I have no wish to remove
it from contrib.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Does this conclusion mean that changing pg_standby is no longer
>> on the table for 8.4?  It certainly smells more like a new feature
>> than a bug fix.

> This whole thing can be considered to be a new feature. It's working as 
> designed. But people seem to be surprised about the current behavior (me 
> included), and we don't currently provide the behavior that most people 
> actually want. I think we should fix it for 8.4.

Well, that's okay by me if it can be done in a timely fashion.  Bear in
mind that we are planning to wrap beta2 not much more than 24 hours from
now.
        regards, tom lane


Re: New trigger option of pg_standby

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> That's a lot more drastic change to make in beta. Besides, the proposed 
> fix required backend changes. I think we should keep it in contrib. (At 
> least for this release: If we get more integrated replication options in 
> 8.5, that would be a good time to move pg_standby out of contrib if 
> that's what we want.)

The proposed fix requires coordinated changes in the core and
pg_standby.  That would be a lot *harder* if pg_standby were external.
Since we've evidently not gotten this API quite right yet, I think we
should be keeping pg_standby in contrib until we do, ie the API has been
stable for awhile ...
        regards, tom lane


Re: New trigger option of pg_standby

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>   
>> That's a lot more drastic change to make in beta. Besides, the proposed 
>> fix required backend changes. I think we should keep it in contrib. (At 
>> least for this release: If we get more integrated replication options in 
>> 8.5, that would be a good time to move pg_standby out of contrib if 
>> that's what we want.)
>>     
>
> The proposed fix requires coordinated changes in the core and
> pg_standby.  That would be a lot *harder* if pg_standby were external.
> Since we've evidently not gotten this API quite right yet, I think we
> should be keeping pg_standby in contrib until we do, ie the API has been
> stable for awhile ...
>
>             
>   
Agreed.

Frankly, if anything it should move from contrib to the core proper. I 
regard it as an essential utility, not an optional extra.

cheers

andrew


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 14:53 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > Tom Lane wrote:
> >> Does this conclusion mean that changing pg_standby is no longer
> >> on the table for 8.4?  It certainly smells more like a new feature
> >> than a bug fix.
> 
> > This whole thing can be considered to be a new feature. It's working as 
> > designed. But people seem to be surprised about the current behavior (me 
> > included), and we don't currently provide the behavior that most people 
> > actually want. I think we should fix it for 8.4.
> 
> Well, that's okay by me if it can be done in a timely fashion.  Bear in
> mind that we are planning to wrap beta2 not much more than 24 hours from
> now.

I'll write it now then, so it can be reviewed tomorrow.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Wed, 2009-05-13 at 14:53 -0400, Tom Lane wrote:
>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>>> Tom Lane wrote:
>>>> Does this conclusion mean that changing pg_standby is no longer
>>>> on the table for 8.4?  It certainly smells more like a new feature
>>>> than a bug fix.
>>> This whole thing can be considered to be a new feature. It's working as 
>>> designed. But people seem to be surprised about the current behavior (me 
>>> included), and we don't currently provide the behavior that most people 
>>> actually want. I think we should fix it for 8.4.
>> Well, that's okay by me if it can be done in a timely fashion.  Bear in
>> mind that we are planning to wrap beta2 not much more than 24 hours from
>> now.
> 
> I'll write it now then, so it can be reviewed tomorrow.

Thanks, Simon!

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 15:05 -0400, Andrew Dunstan wrote:

> Frankly, if anything it should move from contrib to the core proper. I 
> regard it as an essential utility, not an optional extra.

I like that idea.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:

> This whole thing can be considered to be a new feature. 

recovery.conf will contain a new optional parameter:

recovery_end_command (string)

This parameter specifies a shell command that will be executed once only
at the end of recovery. This parameter is optional. The purpose of the
recovery_end_command is to provide a mechanism for cleanup following
replication or recovery. Any %r is replaced by the name of the file
containing the last valid restart point. That is the earliest file that
must be kept to allow a restore to be restartable, so this information
can be used to truncate the archive to just the minimum required to
support restart of the current restore. %r would only be used in a
warm-standby configuration (see Section 24.4). Write %% to embed an
actual % character in the command.


recovery_end_command is performed *after* the UpdateControlFile() once
the we are DB_IN_PRODUCTION.

This behaviour ensures that a crash prior to the final checkpoint will
continue to see the trigger file. Once we are safe, we can remove the
trigger file safely. We also can now ignore any complexity surrounding
whether WAL files are full or not, and whether WAL files were restored
from the archive or from the local directory.

Comments? 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> recovery_end_command is performed *after* the UpdateControlFile() once
> the we are DB_IN_PRODUCTION.

Hmm, shouldn't it be after the last checkpoint but before we go to
DB_IN_PRODUCTION?  I have to admit I've not been following this closely
though, so I may be missing something.
        regards, tom lane


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 16:47 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > recovery_end_command is performed *after* the UpdateControlFile() once
> > the we are DB_IN_PRODUCTION.
> 
> Hmm, shouldn't it be after the last checkpoint 

Definitely.

> but before we go to DB_IN_PRODUCTION?  

I think it can be either, so I'll go with your proposal.

(I'm aware Fujii-san is asleep right now, so we should expect another
viewpoint before tomorrow).

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

Sorry for the delay.

On Thu, May 14, 2009 at 6:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> but before we go to DB_IN_PRODUCTION?
>
> I think it can be either, so I'll go with your proposal.

I also think so.

> (I'm aware Fujii-san is asleep right now, so we should expect another
> viewpoint before tomorrow).

I'd like to avoid adding new parameter for warm-standby
if possible because currently the setup of it is already
complicated. But, I don't have another good idea yet other
than the already proposed. Sorry.

Personally, I'd rather make pg_standby delete a trigger file
when the timeline history file is requested even if this would
break the current behavior, than the setup of warm-standby
becomes more complicated.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote:
> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:
>
> > This whole thing can be considered to be a new feature.
>
> recovery.conf will contain a new optional parameter:
>
> recovery_end_command (string)

Implemented.

Some possibility of re-factoring in calc of %r, though that has not been
done to ensure code clarity and avoid need for retesting other aspects
of recovery at this stage of beta.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Attachment

Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote:
>> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:
>>
>> > This whole thing can be considered to be a new feature.
>>
>> recovery.conf will contain a new optional parameter:
>>
>> recovery_end_command (string)
>
> Implemented.

> +         ereport(signaled ? FATAL : WARNING,
> +                 (errmsg("recovery_end_command \"%s\": return code %d",
> +                                 xlogRecoveryEndCmd, rc)));

In fast failover case, pg_standby has to delete the trigger file immediately
if it's found. Otherwise, recovery may go wrong as I already described.
http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php

So, in fast mode, recovery_end_command would always fail to delete the
trigger file, and cause warning. This is odd behavior, I think. We should
change WARNING to DEBUG2 like RestoreArchivedFile() in the above code?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote:
>>> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:
>>>
>>>> This whole thing can be considered to be a new feature.
>>> recovery.conf will contain a new optional parameter:
>>>
>>> recovery_end_command (string)
>> Implemented.
> 
>> +         ereport(signaled ? FATAL : WARNING,
>> +                 (errmsg("recovery_end_command \"%s\": return code %d",
>> +                                 xlogRecoveryEndCmd, rc)));
> 
> In fast failover case, pg_standby has to delete the trigger file immediately
> if it's found. Otherwise, recovery may go wrong as I already described.
> http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php

And if you delete it immediately, you risk getting stuck if there's 
extra WAL files in pg_xlog. So still need the change in the backend to 
not call restore_command again for a WAL segment equal to or later than 
one that it already failed to restore. Or, we can truncate the trigger 
file to make it behave like a smart failover for the subsequent 
pg_standby calls.

> So, in fast mode, recovery_end_command would always fail to delete the
> trigger file, and cause warning. This is odd behavior, I think. We should
> change WARNING to DEBUG2 like RestoreArchivedFile() in the above code?

I think we should just change the pg_standby example to use "rm -f" 
rather than "rm". It seems useful to give a warning if the command fails.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Fri, 2009-05-15 at 03:49 +0900, Fujii Masao wrote:
> Hi,
> 
> On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote:
> >> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:
> >>
> >> > This whole thing can be considered to be a new feature.
> >>
> >> recovery.conf will contain a new optional parameter:
> >>
> >> recovery_end_command (string)
> >
> > Implemented.
> 
> > +         ereport(signaled ? FATAL : WARNING,
> > +                 (errmsg("recovery_end_command \"%s\": return code %d",
> > +                                 xlogRecoveryEndCmd, rc)));
> 
> In fast failover case, pg_standby has to delete the trigger file immediately
> if it's found. Otherwise, recovery may go wrong as I already described.
> http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php
> 
> So, in fast mode, recovery_end_command would always fail to delete the
> trigger file, and cause warning. This is odd behavior, I think. We should
> change WARNING to DEBUG2 like RestoreArchivedFile() in the above code?

Using rm -f would avoid the WARNING.

I'd rather keep it at WARNING, since not sure what command I'll be
running and what a non-zero rc means.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
I've finally committed Simon's recovery_end_command patch, as well as 
the changes to pg_standby. There's now smart and fast failover modes, 
chosen by the content of the trigger file, smart mode is the default. A 
"fast" trigger file is truncated, turning it into a "smart" trigger for 
subsequent pg_standby invocations. I believe this is now safe in all the  combinations discussed, in both fast and
smartmode, with or without 
 
extra WAL files copied to pg_xlog, and also if the last archived WAL 
file is incomplete.

You now need to set up recovery_end_command to clean up the trigger 
file; pg_standby no longer does that automatically.

Simon Riggs wrote:
> On Fri, 2009-05-15 at 03:49 +0900, Fujii Masao wrote:
>> Hi,
>>
>> On Fri, May 15, 2009 at 12:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Wed, 2009-05-13 at 21:43 +0100, Simon Riggs wrote:
>>>> On Wed, 2009-05-13 at 21:26 +0300, Heikki Linnakangas wrote:
>>>>
>>>>> This whole thing can be considered to be a new feature.
>>>> recovery.conf will contain a new optional parameter:
>>>>
>>>> recovery_end_command (string)
>>> Implemented.
>>> +         ereport(signaled ? FATAL : WARNING,
>>> +                 (errmsg("recovery_end_command \"%s\": return code %d",
>>> +                                 xlogRecoveryEndCmd, rc)));
>> In fast failover case, pg_standby has to delete the trigger file immediately
>> if it's found. Otherwise, recovery may go wrong as I already described.
>> http://archives.postgresql.org/pgsql-hackers/2009-04/msg01139.php
>>
>> So, in fast mode, recovery_end_command would always fail to delete the
>> trigger file, and cause warning. This is odd behavior, I think. We should
>> change WARNING to DEBUG2 like RestoreArchivedFile() in the above code?
> 
> Using rm -f would avoid the WARNING.
> 
> I'd rather keep it at WARNING, since not sure what command I'll be
> running and what a non-zero rc means.
> 


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Fujii Masao
Date:
Hi,

On Fri, May 15, 2009 at 5:31 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I've finally committed Simon's recovery_end_command patch, as well as the
> changes to pg_standby. There's now smart and fast failover modes, chosen by
> the content of the trigger file, smart mode is the default. A "fast" trigger
> file is truncated, turning it into a "smart" trigger for subsequent
> pg_standby invocations. I believe this is now safe in all the  combinations
> discussed, in both fast and smart mode, with or without extra WAL files
> copied to pg_xlog, and also if the last archived WAL file is incomplete.

Thanks for revising my patch and committing it! This seems to work fine
in all the case which I described.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Thu, 2009-05-14 at 23:31 +0300, Heikki Linnakangas wrote:

> I've finally committed ....changes to pg_standby. 

That was a good team effort. Thanks for committing.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Bruce Momjian
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > I don't think we're going to get this to work reliably without extending 
> > the interface between the backend and restore_command. We've discussed 
> > many methods and there's always some nasty corner-case like that.
> 
> > I think we should leave back-branches as is, and go with Simon's 
> > suggestion to add new "recovery_end_command" that's run when the 
> > recovery is finished. That's simpler and more reliable than any of the 
> > other approaches we've discussed, and might become handy for other 
> > purposes as well.
> 
> > Does someone want to take a stab at writing a patch for that?
> 
> Does this conclusion mean that changing pg_standby is no longer
> on the table for 8.4?  It certainly smells more like a new feature
> than a bug fix.

I think the big frustration is that this issue was first brought up
March 25 and it took two months to resolve it, at which point we were in
beta.  I think many hoped a better idea would emerge but often that just
doesn't happen and we have to do the best fix we can and move on.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > > I don't think we're going to get this to work reliably without extending 
> > > the interface between the backend and restore_command. We've discussed 
> > > many methods and there's always some nasty corner-case like that.
> > 
> > > I think we should leave back-branches as is, and go with Simon's 
> > > suggestion to add new "recovery_end_command" that's run when the 
> > > recovery is finished. That's simpler and more reliable than any of the 
> > > other approaches we've discussed, and might become handy for other 
> > > purposes as well.
> > 
> > > Does someone want to take a stab at writing a patch for that?
> > 
> > Does this conclusion mean that changing pg_standby is no longer
> > on the table for 8.4?  It certainly smells more like a new feature
> > than a bug fix.
> 
> I think the big frustration is that this issue was first brought up
> March 25 and it took two months to resolve it, at which point we were in
> beta.  I think many hoped a better idea would emerge but often that just
> doesn't happen and we have to do the best fix we can and move on.

I agree, very frustrating. Why do you bring it up again now? 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Bruce Momjian
Date:
Simon Riggs wrote:
> 
> On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > > > I don't think we're going to get this to work reliably without extending 
> > > > the interface between the backend and restore_command. We've discussed 
> > > > many methods and there's always some nasty corner-case like that.
> > > 
> > > > I think we should leave back-branches as is, and go with Simon's 
> > > > suggestion to add new "recovery_end_command" that's run when the 
> > > > recovery is finished. That's simpler and more reliable than any of the 
> > > > other approaches we've discussed, and might become handy for other 
> > > > purposes as well.
> > > 
> > > > Does someone want to take a stab at writing a patch for that?
> > > 
> > > Does this conclusion mean that changing pg_standby is no longer
> > > on the table for 8.4?  It certainly smells more like a new feature
> > > than a bug fix.
> > 
> > I think the big frustration is that this issue was first brought up
> > March 25 and it took two months to resolve it, at which point we were in
> > beta.  I think many hoped a better idea would emerge but often that just
> > doesn't happen and we have to do the best fix we can and move on.
> 
> I agree, very frustrating. Why do you bring it up again now? 

Wny not?  We are not going to improve unless we face our faults.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: New trigger option of pg_standby

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Simon Riggs wrote:
> > 
> > On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote:
> > > Tom Lane wrote:
> > > > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > > > > I don't think we're going to get this to work reliably without extending 
> > > > > the interface between the backend and restore_command. We've discussed 
> > > > > many methods and there's always some nasty corner-case like that.
> > > > 
> > > > > I think we should leave back-branches as is, and go with Simon's 
> > > > > suggestion to add new "recovery_end_command" that's run when the 
> > > > > recovery is finished. That's simpler and more reliable than any of the 
> > > > > other approaches we've discussed, and might become handy for other 
> > > > > purposes as well.
> > > > 
> > > > > Does someone want to take a stab at writing a patch for that?
> > > > 
> > > > Does this conclusion mean that changing pg_standby is no longer
> > > > on the table for 8.4?  It certainly smells more like a new feature
> > > > than a bug fix.
> > > 
> > > I think the big frustration is that this issue was first brought up
> > > March 25 and it took two months to resolve it, at which point we were in
> > > beta.  I think many hoped a better idea would emerge but often that just
> > > doesn't happen and we have to do the best fix we can and move on.
> > 
> > I agree, very frustrating. Why do you bring it up again now? 
> 
> Wny not?  We are not going to improve unless we face our faults.

Oh, and I am backlogged on email, which is why I didn't mention it a
week ago when this thread was active (which I think was your point).  :-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 09:48 -0400, Bruce Momjian wrote:

> We are not going to improve unless we face our faults.

True. Who or what is at fault, in your opinion?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Heikki Linnakangas
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
>> Simon Riggs wrote:
>>> On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote:
>>>> I think the big frustration is that this issue was first brought up
>>>> March 25 and it took two months to resolve it, at which point we were in
>>>> beta.  I think many hoped a better idea would emerge but often that just
>>>> doesn't happen and we have to do the best fix we can and move on.
>>> I agree, very frustrating. Why do you bring it up again now? 
>> Wny not?  We are not going to improve unless we face our faults.
> 
> Oh, and I am backlogged on email, which is why I didn't mention it a
> week ago when this thread was active (which I think was your point).  :-)

Did you catch up the backlog far enough to see that Simon coded and I 
committed the patch to add "recovery_end_command" just before beta2? So 
it's in 8.4 now 
(http://archives.postgresql.org/message-id/20090514203109.8EBA2754067@cvs.postgresql.org).

I agree we could've should've handled this more promptly, and I'll take 
my part of the blame for that. I let the various proposed patches sit 
for long times before reviewing them thoroughly, partly because I was 
busy and partly because I didn't feel good about the proposed 
approaches. I think what went in in the end is pretty simple and robust, 
but it's a shame we had to rush to get it into beta2, after sitting on 
our thumbs for months.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 09:51 -0400, Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > 
> > > On Wed, 2009-05-27 at 09:13 -0400, Bruce Momjian wrote:
> > > > 
> > > > I think the big frustration is that this issue was first brought up
> > > > March 25 and it took two months to resolve it, at which point we were in
> > > > beta.  I think many hoped a better idea would emerge but often that just
> > > > doesn't happen and we have to do the best fix we can and move on.
> > > 
> > > I agree, very frustrating. Why do you bring it up again now? 
> > 
> > Wny not?  We are not going to improve unless we face our faults.
> 
> Oh, and I am backlogged on email, which is why I didn't mention it a
> week ago when this thread was active (which I think was your point).  :-)

If there is criticism of someone or something, please let's hear it.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Bruce Momjian
Date:
Simon Riggs wrote:
> 
> On Wed, 2009-05-27 at 09:48 -0400, Bruce Momjian wrote:
> 
> > We are not going to improve unless we face our faults.
> 
> True. Who or what is at fault, in your opinion?

Well, we knew there was an issue but we didn't finalize our conclusions
and address it as best we could before beta;  instead it languished and
was only addressed when we had our back up against the wall for beta2. 
Obviously this is not the best aproach.  Ideally someone would have
taken ownership of the issue, summarized the email conclusions, gotten a
patch together, and submitted it for application.  I know patches were
posted but no one got group concensus on its usefulness until recently.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 12:08 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 
> > On Wed, 2009-05-27 at 09:48 -0400, Bruce Momjian wrote:
> > 
> > > We are not going to improve unless we face our faults.
> > 
> > True. Who or what is at fault, in your opinion?
> 
> Well, we knew there was an issue but we didn't finalize our conclusions
> and address it as best we could before beta;  instead it languished and
> was only addressed when we had our back up against the wall for beta2. 
> Obviously this is not the best aproach.  Ideally someone would have
> taken ownership of the issue, summarized the email conclusions, gotten a
> patch together, and submitted it for application.  I know patches were
> posted but no one got group consensus on its usefulness until recently.

My experience is that consensus/votes will be overruled by final
committer, if they disagree, and so only a committer has the real
authority to take the role you suggest.

http://en.wiktionary.org/wiki/responsibility
"The obligation to carry forward an assigned task to a successful
conclusion. With responsibility goes authority to direct and take the
necessary action to ensure success"

>From my side, Fujii-san's patch was adequate and backpatchable, though
needed docs changes; I proposed it should be committed and said so
clearly on open items wiki. My own suggestion was also an option, but it
was also clearly not backpatchable and seemed unlikely to be acceptable
at any time, let alone during beta. I am surprised at the final outcome,
but at least there is one, which is good.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Andrew Dunstan
Date:

Simon Riggs wrote:
> My experience is that consensus/votes will be overruled by final
> committer, if they disagree, 
>   

That's a fairly strong statement. I can't think of an occasion when this 
has happened on any matter of significance, and I can remember many 
times when Tom, Bruce and others have bowed to the consensus despite 
their own preferences.

Bruce said the other day that we are trustees of the code, and I don't 
think I could put it better. I know I am conscious of that.

cheers

andrew


Re: New trigger option of pg_standby

From
Robert Haas
Date:
On Wed, May 27, 2009 at 1:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Simon Riggs wrote:
>>
>> My experience is that consensus/votes will be overruled by final
>> committer, if they disagree,
>
> That's a fairly strong statement. I can't think of an occasion when this has
> happened on any matter of significance, and I can remember many times when
> Tom, Bruce and others have bowed to the consensus despite their own
> preferences.

Tom and Bruce do give way before a clear consensus, but on the other
hand I think Simon is right that there was never much chance of
getting anything committed here without Heikki's endorsement, which
was slow in coming by his own admission.  (I'm not in any way saying
he was wrong to withhold his endorsement, just that he did.)

I think it's undeniable that the voices of the committers carry
significantly more weight than those of others on this mailing list,
and especially that of Tom because of the sheer volume of what he
commits compared to anyone else.  Having one of the committers say
that they don't like your patch doesn't completely kill its chances of
getting accepted, but it definitely turns it into an uphill battle.
On the other hand, if one of the committers takes a fancy to your
patch it will occasionally jump ahead of the queue and get reviewed or
committed before patches submitted much earlier.  And more than one
committer got features into 8.4 that were not really done in time for
the 11/08 CommitFest, and likely would have been rejected if they'd
come from a non-committer.

As a thought experiment, consider two patches, one of which has a +1
from Tom Lane and a -1 from some other respected community member who
is not a committer, and the other of which has a +1 from the community
member and a -1 from Tom Lane.  Which do you think is more likely to
get committed?  I know what I'd pick.

Now, in many cases, the fact that the committers speak with the
loudest voices is a good thing, because they are mostly very good
coders with lots of PostgreSQL experience and a proven track record of
not breaking the tree too often.  But that doesn't make it any less
true.

...Robert


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 13:14 -0400, Andrew Dunstan wrote:
> 
> Simon Riggs wrote:
> > My experience is that consensus/votes will be overruled by final
> > committer, if they disagree,    
> 
> That's a fairly strong statement. 

I was attempting to be open and honest to allow us to face our faults,
as proposed, because I care about the health and efficiency of the
project. I expected your response and know you think my comments to be
blunt at times. We should worry about the people that don't speak their
mind, not those that do.

> I can't think of an occasion when this 
> has happened on any matter of significance, and I can remember many 
> times when Tom, Bruce and others have bowed to the consensus despite 
> their own preferences.

It depends on what you regard as matters of significance and which parts
of the code you pay attention to. Committers can act as they choose; I
am merely pointing out that it isn't possible for non-committers to
follow through on any responsibility they may attempt to assert.
Specifically, trying to propose solutions, achieve consensus and develop
patches only works if the committer will agree with conclusions at the
end and you won't know until you get there. You can do a ton of work and
it can all be for nothing if the consensus opposes the committer. After
a while you reach the conclusion: Why do the work, why not just wait for
the opinion and then act? Less work and same speed. 

> Bruce said the other day that we are trustees of the code, and I don't
> think I could put it better. I know I am conscious of that.

I doubt anyone reading this list feels any differently.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 17:39 +0300, Heikki Linnakangas wrote:

> I agree we could've should've handled this more promptly, and I'll take 
> my part of the blame for that. I let the various proposed patches sit 
> for long times before reviewing them thoroughly, partly because I was 
> busy and partly because I didn't feel good about the proposed 
> approaches. I think what went in in the end is pretty simple and robust, 
> but it's a shame we had to rush to get it into beta2, after sitting on 
> our thumbs for months.

Through everything I just said, I have no criticism of you Heikki. I had
the same emotions about the proposals. I was only explaining why I felt
unable to speed up the process myself.

Final patch took me an hour to code and test, so I wasn't put out or
rushed.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: New trigger option of pg_standby

From
Bruce Momjian
Date:
Robert Haas wrote:
> Tom and Bruce do give way before a clear consensus, but on the other
> hand I think Simon is right that there was never much chance of
> getting anything committed here without Heikki's endorsement, which
> was slow in coming by his own admission.  (I'm not in any way saying
> he was wrong to withhold his endorsement, just that he did.)
> 
> I think it's undeniable that the voices of the committers carry
> significantly more weight than those of others on this mailing list,
> and especially that of Tom because of the sheer volume of what he
> commits compared to anyone else.  Having one of the committers say
> that they don't like your patch doesn't completely kill its chances of
> getting accepted, but it definitely turns it into an uphill battle.
> On the other hand, if one of the committers takes a fancy to your
> patch it will occasionally jump ahead of the queue and get reviewed or
> committed before patches submitted much earlier.  And more than one
> committer got features into 8.4 that were not really done in time for
> the 11/08 CommitFest, and likely would have been rejected if they'd
> come from a non-committer.
> 
> As a thought experiment, consider two patches, one of which has a +1
> from Tom Lane and a -1 from some other respected community member who
> is not a committer, and the other of which has a +1 from the community
> member and a -1 from Tom Lane.  Which do you think is more likely to
> get committed?  I know what I'd pick.
> 
> Now, in many cases, the fact that the committers speak with the
> loudest voices is a good thing, because they are mostly very good
> coders with lots of PostgreSQL experience and a proven track record of
> not breaking the tree too often.  But that doesn't make it any less
> true.

The above comments by Robert are very perceptive, and there is certainly
truth that committer-endorsed patches are applied quicker than others. 
My only additional comment is that over time, reliable patch submitters
become committers, so hopefully things balance out over time.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: New trigger option of pg_standby

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 12:08 -0400, Bruce Momjian wrote:
> Ideally someone would have
> taken ownership of the issue, summarized the email conclusions, gotten
> a patch together, and submitted it for application.

Just a further comment on this, based upon the patch Heikki recently
committed.

I raised various issues with recovery *after* feature freeze in 8.3,
doing everything you mentioned above: patch (Jun '07), 4 months before
beta1. 3.5 months later the patch was still un-reviewed and you deferred
the patch until 8.4, without comment (Sep '07). Changes were eventually
committed more than a year after original discussion (Apr '08).
HACKERS "Minor changes to recovery related code"

My other comments relate to that experience, and others.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support