Thread: BUG #9118: WAL Sender does not disconnect replication clients during shutdown

BUG #9118: WAL Sender does not disconnect replication clients during shutdown

From
jhedden@apple.com
Date:
The following bug has been logged on the website:

Bug reference:      9118
Logged by:          Joel Hedden
Email address:      jhedden@apple.com
PostgreSQL version: 9.3.2
Operating system:   Mac OS X 10.9.1
Description:

I connect a pg_receivexlog instance and have "hot_standby" archiving
enabled, with "archive_command" defined correctly.  When the WAL Sender
process receives a SIGUSR2 from the postmaster (or me), it fails to shut
down and pg_receivexlog remains connected.  Upon inspection, it looks like
the test for "sentPtr == MyWalSnd->flush" is always false at
walsender.c:1058 (sentPtr is still non-zero) where the wal sender should be
shutting down.  Replication and archiving seem to be working otherwise.
Killing pg_receivexlog allows for the WAL Sender to terminate.

This didn't affect 9.2.4 for me.

I've tested the EnterpriseDB 9.3.2 release as well as the HEAD branch for
9.3.

To reproduce:
1.  Download the EnterpriseDB 9.3.2 distribution for Mac OS X and install
onto 10.9.1.
2.  Use initdb to create a fresh cluster.
3.  Update postgresql.conf:
- max_wal_senders = 2
- wal_level = hot_standby
- archive_mode = on
- archive_command = '/usr/bin/true'
4. Update pg_hba.conf to allow local replication connections.
5. Start postgres.
6. Connect pg_receivexlog via local socket file.
7. kill -TERM the postmaster.
The WAL sender, logger, and postmaster continue to run indefinitely and
pg_receivexlog is not disconnected.  WAL sender receives the SIGUSR2 from
postmaster but does not disconnect pg_receivexlog and shut down as expected.

Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown

From
Heikki Linnakangas
Date:
On 02/06/2014 05:08 AM, jhedden@apple.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      9118
> Logged by:          Joel Hedden
> Email address:      jhedden@apple.com
> PostgreSQL version: 9.3.2
> Operating system:   Mac OS X 10.9.1
> Description:
>
> I connect a pg_receivexlog instance and have "hot_standby" archiving
> enabled, with "archive_command" defined correctly.  When the WAL Sender
> process receives a SIGUSR2 from the postmaster (or me), it fails to shut
> down and pg_receivexlog remains connected.  Upon inspection, it looks like
> the test for "sentPtr == MyWalSnd->flush" is always false at
> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should be
> shutting down.  Replication and archiving seem to be working otherwise.
> Killing pg_receivexlog allows for the WAL Sender to terminate.

Hmm. Before exiting, walsender waits until the client has flushed all
the WAL to disk. However, pg_receivexlog never sends a "flush" pointer
back to the server, so the server waits forever.

The first question is, why does pg_receivexlog not send its "flush"
pointer back to the server? It *does* fsync the files to disk. However,
currently it only fsyncs when closing a full segment, but when shutting
down, the last segment would not be full, so to fix this issue it should
be taught to fsync also partial segments.

Second question is, does it make sense for the server to wait for all
replication clients to flush the WAL? It seems reasonable for a standby
server; after shutting down, the standby has all the WAL safely on disk.
Although if a standby is not connected at the moment, all bets are off.
It also seems reasonable for pg_receivexlog, except for the fact that
pg_receivexlog never sends a flush pointer back.

> This didn't affect 9.2.4 for me.

Yeah, the waiting was introduced by commit
bee4a4d361c054c531c3a27024f9ff3efef3635b, and wasn't included in 9.2.4.
It is in 9.2.5, however.

Quoting the original email Fujii posted about this
(http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_VFwRe3oJWnTYsyBDUbRYa6rr0DQ@mail.gmail.com):

> But there is one problem: though walsender tries to send all the outstanding
> WAL records, it doesn't wait for them to be replicated to the standby. IOW,
> walsender closes the replication connection as soon as it sends WAL records.
> Then, before receiving all the WAL records, walreceiver can detect
> the closure of connection and exit. We cannot guarantee that there is no
> missing WAL in the standby after clean shutdown of the master. In this case,
> backup from new master is required when restarting the stopped master as
> new standby. I have experienced this case several times, especially when
> enabling WAL archiving.

Fujii-san, how can walreceiver detect the closure of the connection,
before reading all the buffered WAL from the TCP connection? What kind
of log messages do you get when it happens?

I tried to reproduce that with commit
bee4a4d361c054c531c3a27024f9ff3efef3635b reverted, but couldn't.
Although this was with master and standby running on same laptop, and
this is essentially a race condition, so it's possible that I just
didn't get the timing right to make it happen.

- Heikki
Sorry for the delay...

On Thu, Feb 6, 2014 at 5:05 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 02/06/2014 05:08 AM, jhedden@apple.com wrote:
>>
>> The following bug has been logged on the website:
>>
>> Bug reference:      9118
>> Logged by:          Joel Hedden
>> Email address:      jhedden@apple.com
>> PostgreSQL version: 9.3.2
>> Operating system:   Mac OS X 10.9.1
>> Description:
>>
>> I connect a pg_receivexlog instance and have "hot_standby" archiving
>> enabled, with "archive_command" defined correctly.  When the WAL Sender
>> process receives a SIGUSR2 from the postmaster (or me), it fails to shut
>> down and pg_receivexlog remains connected.  Upon inspection, it looks like
>> the test for "sentPtr == MyWalSnd->flush" is always false at
>> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should
>> be
>> shutting down.  Replication and archiving seem to be working otherwise.
>> Killing pg_receivexlog allows for the WAL Sender to terminate.
>
>
> Hmm. Before exiting, walsender waits until the client has flushed all the
> WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to
> the server, so the server waits forever.
>
> The first question is, why does pg_receivexlog not send its "flush" pointer
> back to the server? It *does* fsync the files to disk. However, currently it
> only fsyncs when closing a full segment, but when shutting down, the last
> segment would not be full, so to fix this issue it should be taught to fsync
> also partial segments.

Yes. And, pg_receivexlog returns InvalidXLogRecPtr as the flush location,
so "sentPtr == MyWalSnd->flush" will never be true when using pg_receivexlog...
The quick-fix seems not to wait for that condition to be true whenever the flush
location is invalid.

> Fujii-san, how can walreceiver detect the closure of the connection, before
> reading all the buffered WAL from the TCP connection? What kind of log
> messages do you get when it happens?

I got the following messages.

[MASTER]
LOG:  database system is shut down

[STANDBY]
FATAL:  could not send data to WAL stream: server closed the
connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

> I tried to reproduce that with commit
> bee4a4d361c054c531c3a27024f9ff3efef3635b reverted, but couldn't. Although
> this was with master and standby running on same laptop, and this is
> essentially a race condition, so it's possible that I just didn't get the
> timing right to make it happen.

You would need to enable WAL archiving. Whenever I was able to
reproduce the problem,
I enabled WAL archiving.

Regards,

--
Fujii Masao
On Thu, Mar 13, 2014 at 7:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Sorry for the delay...
>
> On Thu, Feb 6, 2014 at 5:05 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> On 02/06/2014 05:08 AM, jhedden@apple.com wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      9118
>>> Logged by:          Joel Hedden
>>> Email address:      jhedden@apple.com
>>> PostgreSQL version: 9.3.2
>>> Operating system:   Mac OS X 10.9.1
>>> Description:
>>>
>>> I connect a pg_receivexlog instance and have "hot_standby" archiving
>>> enabled, with "archive_command" defined correctly.  When the WAL Sender
>>> process receives a SIGUSR2 from the postmaster (or me), it fails to shut
>>> down and pg_receivexlog remains connected.  Upon inspection, it looks like
>>> the test for "sentPtr == MyWalSnd->flush" is always false at
>>> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should
>>> be
>>> shutting down.  Replication and archiving seem to be working otherwise.
>>> Killing pg_receivexlog allows for the WAL Sender to terminate.
>>
>>
>> Hmm. Before exiting, walsender waits until the client has flushed all the
>> WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to
>> the server, so the server waits forever.
>>
>> The first question is, why does pg_receivexlog not send its "flush" pointer
>> back to the server? It *does* fsync the files to disk. However, currently it
>> only fsyncs when closing a full segment, but when shutting down, the last
>> segment would not be full, so to fix this issue it should be taught to fsync
>> also partial segments.
>
> Yes. And, pg_receivexlog returns InvalidXLogRecPtr as the flush location,
> so "sentPtr == MyWalSnd->flush" will never be true when using pg_receivexlog...
> The quick-fix seems not to wait for that condition to be true whenever the flush
> location is invalid.

On second thought, I think that it's better to check a write location instead
if walsender is connecting to a standby such as pg_receivexlog which
always returns an invalid flush location. Attached patch does this. Thought?

Regards,

--
Fujii Masao

Attachment

Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown

From
Michael Paquier
Date:
On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On second thought, I think that it's better to check a write location instead
> if walsender is connecting to a standby such as pg_receivexlog which
> always returns an invalid flush location. Attached patch does this. Thought?
Just a small one: could it be possible to put the condition checking
for the validity of flush/write position into a separate variable out
of this if() for readability?
sentPtr == MyWalSnd->flush ||
!          (XLogRecPtrIsInvalid(MyWalSnd->flush) &&
!           sentPtr == MyWalSnd->write)
Are there risks that a standby sends an invalid flush location and a
valid write location?
Regards,
--
Michael
On Fri, Mar 14, 2014 at 12:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On second thought, I think that it's better to check a write location instead
>> if walsender is connecting to a standby such as pg_receivexlog which
>> always returns an invalid flush location. Attached patch does this. Thought?
> Just a small one: could it be possible to put the condition checking
> for the validity of flush/write position into a separate variable out
> of this if() for readability?
> sentPtr == MyWalSnd->flush ||
> !          (XLogRecPtrIsInvalid(MyWalSnd->flush) &&
> !           sentPtr == MyWalSnd->write)

Yes, that's possible. What about the attached patch?

> Are there risks that a standby sends an invalid flush location and a
> valid write location?

No unless I'm missing something.

        /*
         * We only send regular messages to the client for full decoded
         * transactions, but a synchronous replication and walsender shutdown
         * possibly are waiting for a later location. So we send pings
         * containing the flush location every now and then.
         */
        if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
        {

BTW, ISTM that the above condition in walsender.c has the same problem.
If the standby is pg_receivexlog, MyWalSnd->flush is always an invalid
location and that condition would always be TRUE. We would need the
same fix also there.

Regards,

--
Fujii Masao

Attachment

Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown

From
Michael Paquier
Date:
On Sun, Mar 16, 2014 at 12:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Mar 14, 2014 at 12:22 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On second thought, I think that it's better to check a write location instead
>>> if walsender is connecting to a standby such as pg_receivexlog which
>>> always returns an invalid flush location. Attached patch does this. Thought?
>> Just a small one: could it be possible to put the condition checking
>> for the validity of flush/write position into a separate variable out
>> of this if() for readability?
>> sentPtr == MyWalSnd->flush ||
>> !          (XLogRecPtrIsInvalid(MyWalSnd->flush) &&
>> !           sentPtr == MyWalSnd->write)
>
> Yes, that's possible. What about the attached patch?
It is cleaner, thanks.

>         /*
>          * We only send regular messages to the client for full decoded
>          * transactions, but a synchronous replication and walsender shutdown
>          * possibly are waiting for a later location. So we send pings
>          * containing the flush location every now and then.
>          */
>         if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
>         {
>
> BTW, ISTM that the above condition in walsender.c has the same problem.
> If the standby is pg_receivexlog, MyWalSnd->flush is always an invalid
> location and that condition would always be TRUE. We would need the
> same fix also there.
Comments of WalSndWaitForWal would need an update as well in this
case. It is written on top of this function that it waits "until WAL <
loc is flushed to disk". But this is not the case of pg_receivexlog as
you mentioned...
--
Michael
On Sun, Mar 16, 2014 at 10:40 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Mar 16, 2014 at 12:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Fri, Mar 14, 2014 at 12:22 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Mar 14, 2014 at 3:59 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On second thought, I think that it's better to check a write location instead
>>>> if walsender is connecting to a standby such as pg_receivexlog which
>>>> always returns an invalid flush location. Attached patch does this. Thought?
>>> Just a small one: could it be possible to put the condition checking
>>> for the validity of flush/write position into a separate variable out
>>> of this if() for readability?
>>> sentPtr == MyWalSnd->flush ||
>>> !          (XLogRecPtrIsInvalid(MyWalSnd->flush) &&
>>> !           sentPtr == MyWalSnd->write)
>>
>> Yes, that's possible. What about the attached patch?
> It is cleaner, thanks.
>
>>         /*
>>          * We only send regular messages to the client for full decoded
>>          * transactions, but a synchronous replication and walsender shutdown
>>          * possibly are waiting for a later location. So we send pings
>>          * containing the flush location every now and then.
>>          */
>>         if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
>>         {
>>
>> BTW, ISTM that the above condition in walsender.c has the same problem.
>> If the standby is pg_receivexlog, MyWalSnd->flush is always an invalid
>> location and that condition would always be TRUE. We would need the
>> same fix also there.
> Comments of WalSndWaitForWal would need an update as well in this
> case. It is written on top of this function that it waits "until WAL <
> loc is flushed to disk".

I don't think that comment needs to be updated because it means that
"until WAL < loc is flushed to the *master local* disk". It's not related to
the flush location that the standby returns. Anyway, I attached the patch.
Barring objection, I will commit this.

Regards,

--
Fujii Masao

Attachment
On 2014-03-19 02:16:00 +0900, Fujii Masao wrote:
> > Comments of WalSndWaitForWal would need an update as well in this
> > case. It is written on top of this function that it waits "until WAL <
> > loc is flushed to disk".
>
> I don't think that comment needs to be updated because it means that
> "until WAL < loc is flushed to the *master local* disk". It's not related to
> the flush location that the standby returns. Anyway, I attached the patch.
> Barring objection, I will commit this.

I object on behalf of WalSndWaitForWal(). We really need the flush
location there because otherwise we can't remove old WAL.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Mar 19, 2014 at 2:20 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-03-19 02:16:00 +0900, Fujii Masao wrote:
>> > Comments of WalSndWaitForWal would need an update as well in this
>> > case. It is written on top of this function that it waits "until WAL <
>> > loc is flushed to disk".
>>
>> I don't think that comment needs to be updated because it means that
>> "until WAL < loc is flushed to the *master local* disk". It's not related to
>> the flush location that the standby returns. Anyway, I attached the patch.
>> Barring objection, I will commit this.
>
> I object on behalf of WalSndWaitForWal(). We really need the flush
> location there because otherwise we can't remove old WAL.

MyWalSnd->flush < sentPtr condition in WalSndWaitForWait() is always
true if the flush location that the client returns is invalid. Is this OK?

Regards,

--
Fujii Masao
On 2014-03-19 02:53:18 +0900, Fujii Masao wrote:
> On Wed, Mar 19, 2014 at 2:20 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-03-19 02:16:00 +0900, Fujii Masao wrote:
> >> > Comments of WalSndWaitForWal would need an update as well in this
> >> > case. It is written on top of this function that it waits "until WAL <
> >> > loc is flushed to disk".
> >>
> >> I don't think that comment needs to be updated because it means that
> >> "until WAL < loc is flushed to the *master local* disk". It's not related to
> >> the flush location that the standby returns. Anyway, I attached the patch.
> >> Barring objection, I will commit this.
> >
> > I object on behalf of WalSndWaitForWal(). We really need the flush
> > location there because otherwise we can't remove old WAL.
>
> MyWalSnd->flush < sentPtr condition in WalSndWaitForWait() is always
> true if the flush location that the client returns is invalid. Is this OK?

Yes. A client doing so would retain WAL on the master
indefinitely. Which isn't a good idea...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services