Thread: How should the waiting backends behave in sync rep?

How should the waiting backends behave in sync rep?

From
Fujii Masao
Date:
Hi,

How should the backends waiting for replication behave when
synchrnous_standby_names
is set to '' and the configuration file is reloaded? Now they keep
waiting for the ACK from the
standby. But I think that it's more natural for them to get out of the
wait state and complete
the transaction in that case. If we'll change them in that way, we
would no longer need
something like "pg_ctl standalone" which I mentioned in another thread. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should the waiting backends behave in sync rep?

From
Robert Haas
Date:
On Tue, Mar 8, 2011 at 10:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> How should the backends waiting for replication behave when
> synchrnous_standby_names
> is set to '' and the configuration file is reloaded? Now they keep
> waiting for the ACK from the
> standby. But I think that it's more natural for them to get out of the
> wait state and complete
> the transaction in that case. If we'll change them in that way, we
> would no longer need
> something like "pg_ctl standalone" which I mentioned in another thread. Thought?

I think so.  Looking at assign_synchronous_standby_names, the
following code just looks wrong:
       if (doit && list_length(elemlist) > 0)               sync_standbys_defined = true;

Once sync_standbys_defined becomes true, there's no way for it to ever
become false again.  That can't be right.  That means that if you
disable sync rep by zeroing out synchronous_standby_names, backends
that already existed at the time you made the change will continue to
act as though sync rep is enabled until they exit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: How should the waiting backends behave in sync rep?

From
Fujii Masao
Date:
On Thu, Mar 10, 2011 at 2:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 8, 2011 at 10:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> How should the backends waiting for replication behave when
>> synchrnous_standby_names
>> is set to '' and the configuration file is reloaded? Now they keep
>> waiting for the ACK from the
>> standby. But I think that it's more natural for them to get out of the
>> wait state and complete
>> the transaction in that case. If we'll change them in that way, we
>> would no longer need
>> something like "pg_ctl standalone" which I mentioned in another thread. Thought?
>
> I think so.  Looking at assign_synchronous_standby_names, the
> following code just looks wrong:
>
>        if (doit && list_length(elemlist) > 0)
>                sync_standbys_defined = true;
>
> Once sync_standbys_defined becomes true, there's no way for it to ever
> become false again.  That can't be right.  That means that if you
> disable sync rep by zeroing out synchronous_standby_names, backends
> that already existed at the time you made the change will continue to
> act as though sync rep is enabled until they exit.

Yes, that's a bug. Yeah, sync rep currently seems to have many TODO items.
I added some of them in wiki.
http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should the waiting backends behave in sync rep?

From
Robert Haas
Date:
On Wed, Mar 9, 2011 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Mar 10, 2011 at 2:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 8, 2011 at 10:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> How should the backends waiting for replication behave when
>>> synchrnous_standby_names
>>> is set to '' and the configuration file is reloaded? Now they keep
>>> waiting for the ACK from the
>>> standby. But I think that it's more natural for them to get out of the
>>> wait state and complete
>>> the transaction in that case. If we'll change them in that way, we
>>> would no longer need
>>> something like "pg_ctl standalone" which I mentioned in another thread. Thought?
>>
>> I think so.  Looking at assign_synchronous_standby_names, the
>> following code just looks wrong:
>>
>>        if (doit && list_length(elemlist) > 0)
>>                sync_standbys_defined = true;
>>
>> Once sync_standbys_defined becomes true, there's no way for it to ever
>> become false again.  That can't be right.  That means that if you
>> disable sync rep by zeroing out synchronous_standby_names, backends
>> that already existed at the time you made the change will continue to
>> act as though sync rep is enabled until they exit.
>
> Yes, that's a bug. Yeah, sync rep currently seems to have many TODO items.
> I added some of them in wiki.
> http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items

There's a comment that looks related to this issue in syncrep.c.  It reads:
               /*                * We don't receive SIGHUPs at this point, so resetting                *
synchronous_standby_nameshas no effect on waiters.                */ 

It's unclear to me what this actually means.  Is there some reason we
CAN'T receive SIGHUPs at that point, or have we just chosen not to
(for unexplained reasons)?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: How should the waiting backends behave in sync rep?

From
Fujii Masao
Date:
On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> There's a comment that looks related to this issue in syncrep.c.  It reads:
>
>                /*
>                 * We don't receive SIGHUPs at this point, so resetting
>                 * synchronous_standby_names has no effect on waiters.
>                 */
>
> It's unclear to me what this actually means.  Is there some reason we
> CAN'T receive SIGHUPs at that point, or have we just chosen not to
> (for unexplained reasons)?

Not sure. Simon?

It seems harmless to receive SIGHUP at that point.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should the waiting backends behave in sync rep?

From
Simon Riggs
Date:
On Wed, 2011-03-16 at 16:36 +0900, Fujii Masao wrote:
> On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > There's a comment that looks related to this issue in syncrep.c.  It reads:
> >
> >                /*
> >                 * We don't receive SIGHUPs at this point, so resetting
> >                 * synchronous_standby_names has no effect on waiters.
> >                 */
> >
> > It's unclear to me what this actually means.  Is there some reason we
> > CAN'T receive SIGHUPs at that point, or have we just chosen not to
> > (for unexplained reasons)?
> 
> Not sure. Simon?
> 
> It seems harmless to receive SIGHUP at that point.

You pointed out this out to me, so if you want I can explain back to you
again ;-)   Signals are blocked over that section of code.

We could write a scary bit of code to get around that, but it smells
badly of kludge.

What do you think we should do?

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: How should the waiting backends behave in sync rep?

From
Fujii Masao
Date:
On Wed, Mar 16, 2011 at 5:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2011-03-16 at 16:36 +0900, Fujii Masao wrote:
>> On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > There's a comment that looks related to this issue in syncrep.c.  It reads:
>> >
>> >                /*
>> >                 * We don't receive SIGHUPs at this point, so resetting
>> >                 * synchronous_standby_names has no effect on waiters.
>> >                 */
>> >
>> > It's unclear to me what this actually means.  Is there some reason we
>> > CAN'T receive SIGHUPs at that point, or have we just chosen not to
>> > (for unexplained reasons)?
>>
>> Not sure. Simon?
>>
>> It seems harmless to receive SIGHUP at that point.
>
> You pointed out this out to me, so if you want I can explain back to you
> again ;-)   Signals are blocked over that section of code.

Yeah, I pointed out that SIGINT and SIGTERM are blocked there.
But not SIGHUP ;)

> We could write a scary bit of code to get around that, but it smells
> badly of kludge.
>
> What do you think we should do?

What I'm thinking is to make the waiting backends get out of the wait
state if synchronous_standby_names is emptied and configuration file
is reloaded. IOW, I'd like to change SyncRepWaitForLSN() so that it
calls ProcessConfigFile() when the flag got_SIGHUP is true, and then
gets out of the wait loop if there is no name in synchronous_standby_names
(i.e., when the variable sync_standbys_defined is FALSE).

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: How should the waiting backends behave in sync rep?

From
Simon Riggs
Date:
On Wed, 2011-03-16 at 18:41 +0900, Fujii Masao wrote:
> On Wed, Mar 16, 2011 at 5:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Wed, 2011-03-16 at 16:36 +0900, Fujii Masao wrote:
> >> On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> > There's a comment that looks related to this issue in syncrep.c.  It reads:
> >> >
> >> >                /*
> >> >                 * We don't receive SIGHUPs at this point, so resetting
> >> >                 * synchronous_standby_names has no effect on waiters.
> >> >                 */
> >> >
> >> > It's unclear to me what this actually means.  Is there some reason we
> >> > CAN'T receive SIGHUPs at that point, or have we just chosen not to
> >> > (for unexplained reasons)?
> >>
> >> Not sure. Simon?
> >>
> >> It seems harmless to receive SIGHUP at that point.
> >
> > You pointed out this out to me, so if you want I can explain back to you
> > again ;-)   Signals are blocked over that section of code.
> 
> Yeah, I pointed out that SIGINT and SIGTERM are blocked there.
> But not SIGHUP ;)
> 
> > We could write a scary bit of code to get around that, but it smells
> > badly of kludge.
> >
> > What do you think we should do?
> 
> What I'm thinking is to make the waiting backends get out of the wait
> state if synchronous_standby_names is emptied and configuration file
> is reloaded. IOW, I'd like to change SyncRepWaitForLSN() so that it
> calls ProcessConfigFile() when the flag got_SIGHUP is true, and then
> gets out of the wait loop if there is no name in synchronous_standby_names
> (i.e., when the variable sync_standbys_defined is FALSE).

I did try that and it didn't work.

If you think it will, I'll try again.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: How should the waiting backends behave in sync rep?

From
Robert Haas
Date:
On Wed, Mar 16, 2011 at 6:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Yeah, I pointed out that SIGINT and SIGTERM are blocked there.
>> But not SIGHUP ;)
>>
>> > We could write a scary bit of code to get around that, but it smells
>> > badly of kludge.
>> >
>> > What do you think we should do?
>>
>> What I'm thinking is to make the waiting backends get out of the wait
>> state if synchronous_standby_names is emptied and configuration file
>> is reloaded. IOW, I'd like to change SyncRepWaitForLSN() so that it
>> calls ProcessConfigFile() when the flag got_SIGHUP is true, and then
>> gets out of the wait loop if there is no name in synchronous_standby_names
>> (i.e., when the variable sync_standbys_defined is FALSE).
>
> I did try that and it didn't work.
>
> If you think it will, I'll try again.

There are two potential problems here.  One is that we don't normally
reload the config file except in between toplevel commands, so doing
it here would be slightly inconsistent.  A bigger problem is that
doing complicated stuff at this particular point in the code is a
really bad idea.  It's too late for the commit to fail, and as I
learned the hard way yesterday while fooling around with it, anything
that throws an ERROR here causes a database-wide PANIC and abnormal
system shutdown.  So doing something as complicated as reloading the
config file doesn't seem like a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company