Thread: How should the primary behave when the sync standby goes away? Re: Sync Rep v17

On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The WALSender deliberately does *not* wake waiting users if the standby
>> disconnects. Doing so would break the whole reason for having sync rep
>> in the first place. What we do is allow a potential standby to takeover
>> the role of sync standby, if one is available. Or the failing standby
>> can reconnect and then release waiters.
>
> If there is potential standby when synchronous standby has gone, I agree
> that it's not good idea to release the waiting backends soon. In this case,
> those backends should wait for next synchronous standby.
>
> On the other hand, if there is no potential standby, I think that the waiting
> backends should not wait for the timeout and should wake up as soon as
> synchronous standby has gone. Otherwise, those backends suspend for
> a long time (i.e., until the timeout expires), which would decrease the
> high-availability, I'm afraid.
>
> Keeping those backends waiting for the failed standby to reconnect is an
> idea. But this looks like the behavior for "allow_standalone_primary = off".
> If allow_standalone_primary = on, it looks more natural to make the
> primary work alone without waiting the timeout.

Also I think that the waiting backends should be released as soon as the
last synchronous standby switches to asynchronous mode. Since there is
no standby which is planning to reconnect, obviously they no longer need
to wait.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: 
> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> The WALSender deliberately does *not* wake waiting users if the standby
> >> disconnects. Doing so would break the whole reason for having sync rep
> >> in the first place. What we do is allow a potential standby to takeover
> >> the role of sync standby, if one is available. Or the failing standby
> >> can reconnect and then release waiters.
> >
> > If there is potential standby when synchronous standby has gone, I agree
> > that it's not good idea to release the waiting backends soon. In this case,
> > those backends should wait for next synchronous standby.
> >
> > On the other hand, if there is no potential standby, I think that the waiting
> > backends should not wait for the timeout and should wake up as soon as
> > synchronous standby has gone. Otherwise, those backends suspend for
> > a long time (i.e., until the timeout expires), which would decrease the
> > high-availability, I'm afraid.
> >
> > Keeping those backends waiting for the failed standby to reconnect is an
> > idea. But this looks like the behavior for "allow_standalone_primary = off".
> > If allow_standalone_primary = on, it looks more natural to make the
> > primary work alone without waiting the timeout.
> 
> Also I think that the waiting backends should be released as soon as the
> last synchronous standby switches to asynchronous mode. Since there is
> no standby which is planning to reconnect, obviously they no longer need
> to wait.

I've not done this, but we could.

It can't run in a WALSender, so this code would need to live in either
WALWriter or BgWriter.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services




On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
>> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> The WALSender deliberately does *not* wake waiting users if the standby
>> >> disconnects. Doing so would break the whole reason for having sync rep
>> >> in the first place. What we do is allow a potential standby to takeover
>> >> the role of sync standby, if one is available. Or the failing standby
>> >> can reconnect and then release waiters.
>> >
>> > If there is potential standby when synchronous standby has gone, I agree
>> > that it's not good idea to release the waiting backends soon. In this case,
>> > those backends should wait for next synchronous standby.
>> >
>> > On the other hand, if there is no potential standby, I think that the waiting
>> > backends should not wait for the timeout and should wake up as soon as
>> > synchronous standby has gone. Otherwise, those backends suspend for
>> > a long time (i.e., until the timeout expires), which would decrease the
>> > high-availability, I'm afraid.
>> >
>> > Keeping those backends waiting for the failed standby to reconnect is an
>> > idea. But this looks like the behavior for "allow_standalone_primary = off".
>> > If allow_standalone_primary = on, it looks more natural to make the
>> > primary work alone without waiting the timeout.
>>
>> Also I think that the waiting backends should be released as soon as the
>> last synchronous standby switches to asynchronous mode. Since there is
>> no standby which is planning to reconnect, obviously they no longer need
>> to wait.
>
> I've not done this, but we could.
>
> It can't run in a WALSender, so this code would need to live in either
> WALWriter or BgWriter.

I would have thought that the last WALSender to switch to async would
have been responsible for doing this at that time.  Why doesn't that
work?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote:
> On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
> >> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

> >>
> >> Also I think that the waiting backends should be released as soon as the
> >> last synchronous standby switches to asynchronous mode. Since there is
> >> no standby which is planning to reconnect, obviously they no longer need
> >> to wait.
> >
> > I've not done this, but we could.
> >
> > It can't run in a WALSender, so this code would need to live in either
> > WALWriter or BgWriter.
> 
> I would have thought that the last WALSender to switch to async would
> have been responsible for doing this at that time.  Why doesn't that
> work?

The main time we get extended waits is when there are no WALsenders.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services