Thread: How should the primary behave when the sync standby goes away? Re: Sync Rep v17
How should the primary behave when the sync standby goes away? Re: Sync Rep v17
From
Fujii Masao
Date:
On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The WALSender deliberately does *not* wake waiting users if the standby >> disconnects. Doing so would break the whole reason for having sync rep >> in the first place. What we do is allow a potential standby to takeover >> the role of sync standby, if one is available. Or the failing standby >> can reconnect and then release waiters. > > If there is potential standby when synchronous standby has gone, I agree > that it's not good idea to release the waiting backends soon. In this case, > those backends should wait for next synchronous standby. > > On the other hand, if there is no potential standby, I think that the waiting > backends should not wait for the timeout and should wake up as soon as > synchronous standby has gone. Otherwise, those backends suspend for > a long time (i.e., until the timeout expires), which would decrease the > high-availability, I'm afraid. > > Keeping those backends waiting for the failed standby to reconnect is an > idea. But this looks like the behavior for "allow_standalone_primary = off". > If allow_standalone_primary = on, it looks more natural to make the > primary work alone without waiting the timeout. Also I think that the waiting backends should be released as soon as the last synchronous standby switches to asynchronous mode. Since there is no standby which is planning to reconnect, obviously they no longer need to wait. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: How should the primary behave when the sync standby goes away? Re: Sync Rep v17
From
Simon Riggs
Date:
On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: > On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> The WALSender deliberately does *not* wake waiting users if the standby > >> disconnects. Doing so would break the whole reason for having sync rep > >> in the first place. What we do is allow a potential standby to takeover > >> the role of sync standby, if one is available. Or the failing standby > >> can reconnect and then release waiters. > > > > If there is potential standby when synchronous standby has gone, I agree > > that it's not good idea to release the waiting backends soon. In this case, > > those backends should wait for next synchronous standby. > > > > On the other hand, if there is no potential standby, I think that the waiting > > backends should not wait for the timeout and should wake up as soon as > > synchronous standby has gone. Otherwise, those backends suspend for > > a long time (i.e., until the timeout expires), which would decrease the > > high-availability, I'm afraid. > > > > Keeping those backends waiting for the failed standby to reconnect is an > > idea. But this looks like the behavior for "allow_standalone_primary = off". > > If allow_standalone_primary = on, it looks more natural to make the > > primary work alone without waiting the timeout. > > Also I think that the waiting backends should be released as soon as the > last synchronous standby switches to asynchronous mode. Since there is > no standby which is planning to reconnect, obviously they no longer need > to wait. I've not done this, but we could. It can't run in a WALSender, so this code would need to live in either WALWriter or BgWriter. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Re: How should the primary behave when the sync standby goes away? Re: Sync Rep v17
From
Robert Haas
Date:
On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: >> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> The WALSender deliberately does *not* wake waiting users if the standby >> >> disconnects. Doing so would break the whole reason for having sync rep >> >> in the first place. What we do is allow a potential standby to takeover >> >> the role of sync standby, if one is available. Or the failing standby >> >> can reconnect and then release waiters. >> > >> > If there is potential standby when synchronous standby has gone, I agree >> > that it's not good idea to release the waiting backends soon. In this case, >> > those backends should wait for next synchronous standby. >> > >> > On the other hand, if there is no potential standby, I think that the waiting >> > backends should not wait for the timeout and should wake up as soon as >> > synchronous standby has gone. Otherwise, those backends suspend for >> > a long time (i.e., until the timeout expires), which would decrease the >> > high-availability, I'm afraid. >> > >> > Keeping those backends waiting for the failed standby to reconnect is an >> > idea. But this looks like the behavior for "allow_standalone_primary = off". >> > If allow_standalone_primary = on, it looks more natural to make the >> > primary work alone without waiting the timeout. >> >> Also I think that the waiting backends should be released as soon as the >> last synchronous standby switches to asynchronous mode. Since there is >> no standby which is planning to reconnect, obviously they no longer need >> to wait. > > I've not done this, but we could. > > It can't run in a WALSender, so this code would need to live in either > WALWriter or BgWriter. I would have thought that the last WALSender to switch to async would have been responsible for doing this at that time. Why doesn't that work? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: How should the primary behave when the sync standby goes away? Re: Sync Rep v17
From
Simon Riggs
Date:
On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote: > On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: > >> On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> > On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> > >> Also I think that the waiting backends should be released as soon as the > >> last synchronous standby switches to asynchronous mode. Since there is > >> no standby which is planning to reconnect, obviously they no longer need > >> to wait. > > > > I've not done this, but we could. > > > > It can't run in a WALSender, so this code would need to live in either > > WALWriter or BgWriter. > > I would have thought that the last WALSender to switch to async would > have been responsible for doing this at that time. Why doesn't that > work? The main time we get extended waits is when there are no WALsenders. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services