Thread: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
jhedden@apple.com
Date:
The following bug has been logged on the website: Bug reference: 9118 Logged by: Joel Hedden Email address: jhedden@apple.com PostgreSQL version: 9.3.2 Operating system: Mac OS X 10.9.1 Description: I connect a pg_receivexlog instance and have "hot_standby" archiving enabled, with "archive_command" defined correctly. When the WAL Sender process receives a SIGUSR2 from the postmaster (or me), it fails to shut down and pg_receivexlog remains connected. Upon inspection, it looks like the test for "sentPtr == MyWalSnd->flush" is always false at walsender.c:1058 (sentPtr is still non-zero) where the wal sender should be shutting down. Replication and archiving seem to be working otherwise. Killing pg_receivexlog allows for the WAL Sender to terminate. This didn't affect 9.2.4 for me. I've tested the EnterpriseDB 9.3.2 release as well as the HEAD branch for 9.3. To reproduce: 1. Download the EnterpriseDB 9.3.2 distribution for Mac OS X and install onto 10.9.1. 2. Use initdb to create a fresh cluster. 3. Update postgresql.conf: - max_wal_senders = 2 - wal_level = hot_standby - archive_mode = on - archive_command = '/usr/bin/true' 4. Update pg_hba.conf to allow local replication connections. 5. Start postgres. 6. Connect pg_receivexlog via local socket file. 7. kill -TERM the postmaster. The WAL sender, logger, and postmaster continue to run indefinitely and pg_receivexlog is not disconnected. WAL sender receives the SIGUSR2 from postmaster but does not disconnect pg_receivexlog and shut down as expected.
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Heikki Linnakangas
Date:
On 02/06/2014 05:08 AM, jhedden@apple.com wrote: > The following bug has been logged on the website: > > Bug reference: 9118 > Logged by: Joel Hedden > Email address: jhedden@apple.com > PostgreSQL version: 9.3.2 > Operating system: Mac OS X 10.9.1 > Description: > > I connect a pg_receivexlog instance and have "hot_standby" archiving > enabled, with "archive_command" defined correctly. When the WAL Sender > process receives a SIGUSR2 from the postmaster (or me), it fails to shut > down and pg_receivexlog remains connected. Upon inspection, it looks like > the test for "sentPtr == MyWalSnd->flush" is always false at > walsender.c:1058 (sentPtr is still non-zero) where the wal sender should be > shutting down. Replication and archiving seem to be working otherwise. > Killing pg_receivexlog allows for the WAL Sender to terminate. Hmm. Before exiting, walsender waits until the client has flushed all the WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to the server, so the server waits forever. The first question is, why does pg_receivexlog not send its "flush" pointer back to the server? It *does* fsync the files to disk. However, currently it only fsyncs when closing a full segment, but when shutting down, the last segment would not be full, so to fix this issue it should be taught to fsync also partial segments. Second question is, does it make sense for the server to wait for all replication clients to flush the WAL? It seems reasonable for a standby server; after shutting down, the standby has all the WAL safely on disk. Although if a standby is not connected at the moment, all bets are off. It also seems reasonable for pg_receivexlog, except for the fact that pg_receivexlog never sends a flush pointer back. > This didn't affect 9.2.4 for me. Yeah, the waiting was introduced by commit bee4a4d361c054c531c3a27024f9ff3efef3635b, and wasn't included in 9.2.4. It is in 9.2.5, however. Quoting the original email Fujii posted about this (http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_VFwRe3oJWnTYsyBDUbRYa6rr0DQ@mail.gmail.com): > But there is one problem: though walsender tries to send all the outstanding > WAL records, it doesn't wait for them to be replicated to the standby. IOW, > walsender closes the replication connection as soon as it sends WAL records. > Then, before receiving all the WAL records, walreceiver can detect > the closure of connection and exit. We cannot guarantee that there is no > missing WAL in the standby after clean shutdown of the master. In this case, > backup from new master is required when restarting the stopped master as > new standby. I have experienced this case several times, especially when > enabling WAL archiving. Fujii-san, how can walreceiver detect the closure of the connection, before reading all the buffered WAL from the TCP connection? What kind of log messages do you get when it happens? I tried to reproduce that with commit bee4a4d361c054c531c3a27024f9ff3efef3635b reverted, but couldn't. Although this was with master and standby running on same laptop, and this is essentially a race condition, so it's possible that I just didn't get the timing right to make it happen. - Heikki
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Fujii Masao
Date:
Sorry for the delay... On Thu, Feb 6, 2014 at 5:05 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 02/06/2014 05:08 AM, jhedden@apple.com wrote: >> >> The following bug has been logged on the website: >> >> Bug reference: 9118 >> Logged by: Joel Hedden >> Email address: jhedden@apple.com >> PostgreSQL version: 9.3.2 >> Operating system: Mac OS X 10.9.1 >> Description: >> >> I connect a pg_receivexlog instance and have "hot_standby" archiving >> enabled, with "archive_command" defined correctly. When the WAL Sender >> process receives a SIGUSR2 from the postmaster (or me), it fails to shut >> down and pg_receivexlog remains connected. Upon inspection, it looks like >> the test for "sentPtr == MyWalSnd->flush" is always false at >> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should >> be >> shutting down. Replication and archiving seem to be working otherwise. >> Killing pg_receivexlog allows for the WAL Sender to terminate. > > > Hmm. Before exiting, walsender waits until the client has flushed all the > WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to > the server, so the server waits forever. > > The first question is, why does pg_receivexlog not send its "flush" pointer > back to the server? It *does* fsync the files to disk. However, currently it > only fsyncs when closing a full segment, but when shutting down, the last > segment would not be full, so to fix this issue it should be taught to fsync > also partial segments. Yes. And, pg_receivexlog returns InvalidXLogRecPtr as the flush location, so "sentPtr == MyWalSnd->flush" will never be true when using pg_receivexlog... The quick-fix seems not to wait for that condition to be true whenever the flush location is invalid. > Fujii-san, how can walreceiver detect the closure of the connection, before > reading all the buffered WAL from the TCP connection? What kind of log > messages do you get when it happens? I got the following messages. [MASTER] LOG: database system is shut down [STANDBY] FATAL: could not send data to WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. > I tried to reproduce that with commit > bee4a4d361c054c531c3a27024f9ff3efef3635b reverted, but couldn't. Although > this was with master and standby running on same laptop, and this is > essentially a race condition, so it's possible that I just didn't get the > timing right to make it happen. You would need to enable WAL archiving. Whenever I was able to reproduce the problem, I enabled WAL archiving. Regards, -- Fujii Masao
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Fujii Masao
Date:
On Thu, Mar 13, 2014 at 7:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Sorry for the delay... > > On Thu, Feb 6, 2014 at 5:05 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 02/06/2014 05:08 AM, jhedden@apple.com wrote: >>> >>> The following bug has been logged on the website: >>> >>> Bug reference: 9118 >>> Logged by: Joel Hedden >>> Email address: jhedden@apple.com >>> PostgreSQL version: 9.3.2 >>> Operating system: Mac OS X 10.9.1 >>> Description: >>> >>> I connect a pg_receivexlog instance and have "hot_standby" archiving >>> enabled, with "archive_command" defined correctly. When the WAL Sender >>> process receives a SIGUSR2 from the postmaster (or me), it fails to shut >>> down and pg_receivexlog remains connected. Upon inspection, it looks like >>> the test for "sentPtr == MyWalSnd->flush" is always false at >>> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should >>> be >>> shutting down. Replication and archiving seem to be working otherwise. >>> Killing pg_receivexlog allows for the WAL Sender to terminate. >> >> >> Hmm. Before exiting, walsender waits until the client has flushed all the >> WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to >> the server, so the server waits forever. >> >> The first question is, why does pg_receivexlog not send its "flush" pointer >> back to the server? It *does* fsync the files to disk. However, currently it >> only fsyncs when closing a full segment, but when shutting down, the last >> segment would not be full, so to fix this issue it should be taught to fsync >> also partial segments. > > Yes. And, pg_receivexlog returns InvalidXLogRecPtr as the flush location, > so "sentPtr == MyWalSnd->flush" will never be true when using pg_receivexlog... > The quick-fix seems not to wait for that condition to be true whenever the flush > location is invalid. On second thought, I think that it's better to check a write location instead if walsender is connecting to a standby such as pg_receivexlog which always returns an invalid flush location. Attached patch does this. Thought? Regards, -- Fujii Masao
Attachment
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Michael Paquier
Date:
On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On second thought, I think that it's better to check a write location instead > if walsender is connecting to a standby such as pg_receivexlog which > always returns an invalid flush location. Attached patch does this. Thought? Just a small one: could it be possible to put the condition checking for the validity of flush/write position into a separate variable out of this if() for readability? sentPtr == MyWalSnd->flush || ! (XLogRecPtrIsInvalid(MyWalSnd->flush) && ! sentPtr == MyWalSnd->write) Are there risks that a standby sends an invalid flush location and a valid write location? Regards, -- Michael
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Fujii Masao
Date:
On Fri, Mar 14, 2014 at 12:22 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On second thought, I think that it's better to check a write location instead >> if walsender is connecting to a standby such as pg_receivexlog which >> always returns an invalid flush location. Attached patch does this. Thought? > Just a small one: could it be possible to put the condition checking > for the validity of flush/write position into a separate variable out > of this if() for readability? > sentPtr == MyWalSnd->flush || > ! (XLogRecPtrIsInvalid(MyWalSnd->flush) && > ! sentPtr == MyWalSnd->write) Yes, that's possible. What about the attached patch? > Are there risks that a standby sends an invalid flush location and a > valid write location? No unless I'm missing something. /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown * possibly are waiting for a later location. So we send pings * containing the flush location every now and then. */ if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response) { BTW, ISTM that the above condition in walsender.c has the same problem. If the standby is pg_receivexlog, MyWalSnd->flush is always an invalid location and that condition would always be TRUE. We would need the same fix also there. Regards, -- Fujii Masao
Attachment
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Michael Paquier
Date:
On Sun, Mar 16, 2014 at 12:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Mar 14, 2014 at 12:22 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On second thought, I think that it's better to check a write location instead >>> if walsender is connecting to a standby such as pg_receivexlog which >>> always returns an invalid flush location. Attached patch does this. Thought? >> Just a small one: could it be possible to put the condition checking >> for the validity of flush/write position into a separate variable out >> of this if() for readability? >> sentPtr == MyWalSnd->flush || >> ! (XLogRecPtrIsInvalid(MyWalSnd->flush) && >> ! sentPtr == MyWalSnd->write) > > Yes, that's possible. What about the attached patch? It is cleaner, thanks. > /* > * We only send regular messages to the client for full decoded > * transactions, but a synchronous replication and walsender shutdown > * possibly are waiting for a later location. So we send pings > * containing the flush location every now and then. > */ > if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response) > { > > BTW, ISTM that the above condition in walsender.c has the same problem. > If the standby is pg_receivexlog, MyWalSnd->flush is always an invalid > location and that condition would always be TRUE. We would need the > same fix also there. Comments of WalSndWaitForWal would need an update as well in this case. It is written on top of this function that it waits "until WAL < loc is flushed to disk". But this is not the case of pg_receivexlog as you mentioned... -- Michael
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Fujii Masao
Date:
On Sun, Mar 16, 2014 at 10:40 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Mar 16, 2014 at 12:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Fri, Mar 14, 2014 at 12:22 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> On second thought, I think that it's better to check a write location instead >>>> if walsender is connecting to a standby such as pg_receivexlog which >>>> always returns an invalid flush location. Attached patch does this. Thought? >>> Just a small one: could it be possible to put the condition checking >>> for the validity of flush/write position into a separate variable out >>> of this if() for readability? >>> sentPtr == MyWalSnd->flush || >>> ! (XLogRecPtrIsInvalid(MyWalSnd->flush) && >>> ! sentPtr == MyWalSnd->write) >> >> Yes, that's possible. What about the attached patch? > It is cleaner, thanks. > >> /* >> * We only send regular messages to the client for full decoded >> * transactions, but a synchronous replication and walsender shutdown >> * possibly are waiting for a later location. So we send pings >> * containing the flush location every now and then. >> */ >> if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response) >> { >> >> BTW, ISTM that the above condition in walsender.c has the same problem. >> If the standby is pg_receivexlog, MyWalSnd->flush is always an invalid >> location and that condition would always be TRUE. We would need the >> same fix also there. > Comments of WalSndWaitForWal would need an update as well in this > case. It is written on top of this function that it waits "until WAL < > loc is flushed to disk". I don't think that comment needs to be updated because it means that "until WAL < loc is flushed to the *master local* disk". It's not related to the flush location that the standby returns. Anyway, I attached the patch. Barring objection, I will commit this. Regards, -- Fujii Masao
Attachment
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Andres Freund
Date:
On 2014-03-19 02:16:00 +0900, Fujii Masao wrote: > > Comments of WalSndWaitForWal would need an update as well in this > > case. It is written on top of this function that it waits "until WAL < > > loc is flushed to disk". > > I don't think that comment needs to be updated because it means that > "until WAL < loc is flushed to the *master local* disk". It's not related to > the flush location that the standby returns. Anyway, I attached the patch. > Barring objection, I will commit this. I object on behalf of WalSndWaitForWal(). We really need the flush location there because otherwise we can't remove old WAL. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Fujii Masao
Date:
On Wed, Mar 19, 2014 at 2:20 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-03-19 02:16:00 +0900, Fujii Masao wrote: >> > Comments of WalSndWaitForWal would need an update as well in this >> > case. It is written on top of this function that it waits "until WAL < >> > loc is flushed to disk". >> >> I don't think that comment needs to be updated because it means that >> "until WAL < loc is flushed to the *master local* disk". It's not related to >> the flush location that the standby returns. Anyway, I attached the patch. >> Barring objection, I will commit this. > > I object on behalf of WalSndWaitForWal(). We really need the flush > location there because otherwise we can't remove old WAL. MyWalSnd->flush < sentPtr condition in WalSndWaitForWait() is always true if the flush location that the client returns is invalid. Is this OK? Regards, -- Fujii Masao
Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
From
Andres Freund
Date:
On 2014-03-19 02:53:18 +0900, Fujii Masao wrote: > On Wed, Mar 19, 2014 at 2:20 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-03-19 02:16:00 +0900, Fujii Masao wrote: > >> > Comments of WalSndWaitForWal would need an update as well in this > >> > case. It is written on top of this function that it waits "until WAL < > >> > loc is flushed to disk". > >> > >> I don't think that comment needs to be updated because it means that > >> "until WAL < loc is flushed to the *master local* disk". It's not related to > >> the flush location that the standby returns. Anyway, I attached the patch. > >> Barring objection, I will commit this. > > > > I object on behalf of WalSndWaitForWal(). We really need the flush > > location there because otherwise we can't remove old WAL. > > MyWalSnd->flush < sentPtr condition in WalSndWaitForWait() is always > true if the flush location that the client returns is invalid. Is this OK? Yes. A client doing so would retain WAL on the master indefinitely. Which isn't a good idea... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services