Thread: How should the waiting backends behave in sync rep?
Hi, How should the backends waiting for replication behave when synchrnous_standby_names is set to '' and the configuration file is reloaded? Now they keep waiting for the ACK from the standby. But I think that it's more natural for them to get out of the wait state and complete the transaction in that case. If we'll change them in that way, we would no longer need something like "pg_ctl standalone" which I mentioned in another thread. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 8, 2011 at 10:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > How should the backends waiting for replication behave when > synchrnous_standby_names > is set to '' and the configuration file is reloaded? Now they keep > waiting for the ACK from the > standby. But I think that it's more natural for them to get out of the > wait state and complete > the transaction in that case. If we'll change them in that way, we > would no longer need > something like "pg_ctl standalone" which I mentioned in another thread. Thought? I think so. Looking at assign_synchronous_standby_names, the following code just looks wrong: if (doit && list_length(elemlist) > 0) sync_standbys_defined = true; Once sync_standbys_defined becomes true, there's no way for it to ever become false again. That can't be right. That means that if you disable sync rep by zeroing out synchronous_standby_names, backends that already existed at the time you made the change will continue to act as though sync rep is enabled until they exit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 10, 2011 at 2:06 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 8, 2011 at 10:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> How should the backends waiting for replication behave when >> synchrnous_standby_names >> is set to '' and the configuration file is reloaded? Now they keep >> waiting for the ACK from the >> standby. But I think that it's more natural for them to get out of the >> wait state and complete >> the transaction in that case. If we'll change them in that way, we >> would no longer need >> something like "pg_ctl standalone" which I mentioned in another thread. Thought? > > I think so. Looking at assign_synchronous_standby_names, the > following code just looks wrong: > > if (doit && list_length(elemlist) > 0) > sync_standbys_defined = true; > > Once sync_standbys_defined becomes true, there's no way for it to ever > become false again. That can't be right. That means that if you > disable sync rep by zeroing out synchronous_standby_names, backends > that already existed at the time you made the change will continue to > act as though sync rep is enabled until they exit. Yes, that's a bug. Yeah, sync rep currently seems to have many TODO items. I added some of them in wiki. http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 9, 2011 at 9:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Mar 10, 2011 at 2:06 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Mar 8, 2011 at 10:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> How should the backends waiting for replication behave when >>> synchrnous_standby_names >>> is set to '' and the configuration file is reloaded? Now they keep >>> waiting for the ACK from the >>> standby. But I think that it's more natural for them to get out of the >>> wait state and complete >>> the transaction in that case. If we'll change them in that way, we >>> would no longer need >>> something like "pg_ctl standalone" which I mentioned in another thread. Thought? >> >> I think so. Looking at assign_synchronous_standby_names, the >> following code just looks wrong: >> >> if (doit && list_length(elemlist) > 0) >> sync_standbys_defined = true; >> >> Once sync_standbys_defined becomes true, there's no way for it to ever >> become false again. That can't be right. That means that if you >> disable sync rep by zeroing out synchronous_standby_names, backends >> that already existed at the time you made the change will continue to >> act as though sync rep is enabled until they exit. > > Yes, that's a bug. Yeah, sync rep currently seems to have many TODO items. > I added some of them in wiki. > http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items There's a comment that looks related to this issue in syncrep.c. It reads: /* * We don't receive SIGHUPs at this point, so resetting * synchronous_standby_nameshas no effect on waiters. */ It's unclear to me what this actually means. Is there some reason we CAN'T receive SIGHUPs at that point, or have we just chosen not to (for unexplained reasons)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > There's a comment that looks related to this issue in syncrep.c. It reads: > > /* > * We don't receive SIGHUPs at this point, so resetting > * synchronous_standby_names has no effect on waiters. > */ > > It's unclear to me what this actually means. Is there some reason we > CAN'T receive SIGHUPs at that point, or have we just chosen not to > (for unexplained reasons)? Not sure. Simon? It seems harmless to receive SIGHUP at that point. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2011-03-16 at 16:36 +0900, Fujii Masao wrote: > On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > There's a comment that looks related to this issue in syncrep.c. It reads: > > > > /* > > * We don't receive SIGHUPs at this point, so resetting > > * synchronous_standby_names has no effect on waiters. > > */ > > > > It's unclear to me what this actually means. Is there some reason we > > CAN'T receive SIGHUPs at that point, or have we just chosen not to > > (for unexplained reasons)? > > Not sure. Simon? > > It seems harmless to receive SIGHUP at that point. You pointed out this out to me, so if you want I can explain back to you again ;-) Signals are blocked over that section of code. We could write a scary bit of code to get around that, but it smells badly of kludge. What do you think we should do? -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Mar 16, 2011 at 5:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2011-03-16 at 16:36 +0900, Fujii Masao wrote: >> On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> > There's a comment that looks related to this issue in syncrep.c. It reads: >> > >> > /* >> > * We don't receive SIGHUPs at this point, so resetting >> > * synchronous_standby_names has no effect on waiters. >> > */ >> > >> > It's unclear to me what this actually means. Is there some reason we >> > CAN'T receive SIGHUPs at that point, or have we just chosen not to >> > (for unexplained reasons)? >> >> Not sure. Simon? >> >> It seems harmless to receive SIGHUP at that point. > > You pointed out this out to me, so if you want I can explain back to you > again ;-) Signals are blocked over that section of code. Yeah, I pointed out that SIGINT and SIGTERM are blocked there. But not SIGHUP ;) > We could write a scary bit of code to get around that, but it smells > badly of kludge. > > What do you think we should do? What I'm thinking is to make the waiting backends get out of the wait state if synchronous_standby_names is emptied and configuration file is reloaded. IOW, I'd like to change SyncRepWaitForLSN() so that it calls ProcessConfigFile() when the flag got_SIGHUP is true, and then gets out of the wait loop if there is no name in synchronous_standby_names (i.e., when the variable sync_standbys_defined is FALSE). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2011-03-16 at 18:41 +0900, Fujii Masao wrote: > On Wed, Mar 16, 2011 at 5:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2011-03-16 at 16:36 +0900, Fujii Masao wrote: > >> On Sat, Mar 12, 2011 at 3:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> > There's a comment that looks related to this issue in syncrep.c. It reads: > >> > > >> > /* > >> > * We don't receive SIGHUPs at this point, so resetting > >> > * synchronous_standby_names has no effect on waiters. > >> > */ > >> > > >> > It's unclear to me what this actually means. Is there some reason we > >> > CAN'T receive SIGHUPs at that point, or have we just chosen not to > >> > (for unexplained reasons)? > >> > >> Not sure. Simon? > >> > >> It seems harmless to receive SIGHUP at that point. > > > > You pointed out this out to me, so if you want I can explain back to you > > again ;-) Signals are blocked over that section of code. > > Yeah, I pointed out that SIGINT and SIGTERM are blocked there. > But not SIGHUP ;) > > > We could write a scary bit of code to get around that, but it smells > > badly of kludge. > > > > What do you think we should do? > > What I'm thinking is to make the waiting backends get out of the wait > state if synchronous_standby_names is emptied and configuration file > is reloaded. IOW, I'd like to change SyncRepWaitForLSN() so that it > calls ProcessConfigFile() when the flag got_SIGHUP is true, and then > gets out of the wait loop if there is no name in synchronous_standby_names > (i.e., when the variable sync_standbys_defined is FALSE). I did try that and it didn't work. If you think it will, I'll try again. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Mar 16, 2011 at 6:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Yeah, I pointed out that SIGINT and SIGTERM are blocked there. >> But not SIGHUP ;) >> >> > We could write a scary bit of code to get around that, but it smells >> > badly of kludge. >> > >> > What do you think we should do? >> >> What I'm thinking is to make the waiting backends get out of the wait >> state if synchronous_standby_names is emptied and configuration file >> is reloaded. IOW, I'd like to change SyncRepWaitForLSN() so that it >> calls ProcessConfigFile() when the flag got_SIGHUP is true, and then >> gets out of the wait loop if there is no name in synchronous_standby_names >> (i.e., when the variable sync_standbys_defined is FALSE). > > I did try that and it didn't work. > > If you think it will, I'll try again. There are two potential problems here. One is that we don't normally reload the config file except in between toplevel commands, so doing it here would be slightly inconsistent. A bigger problem is that doing complicated stuff at this particular point in the code is a really bad idea. It's too late for the commit to fail, and as I learned the hard way yesterday while fooling around with it, anything that throws an ERROR here causes a database-wide PANIC and abnormal system shutdown. So doing something as complicated as reloading the config file doesn't seem like a good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company