Re: walsender vs. XLogBackgroundFlush during shutdown - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: walsender vs. XLogBackgroundFlush during shutdown
Date
Msg-id 20190501150220.z27mt2y6gt3nssdf@development
Whole thread Raw
In response to Re: walsender vs. XLogBackgroundFlush during shutdown  (Alexander Kukushkin <cyberdemn@gmail.com>)
Responses Re: walsender vs. XLogBackgroundFlush during shutdown
List pgsql-hackers
On Wed, May 01, 2019 at 02:13:10PM +0200, Alexander Kukushkin wrote:
> Hi Tomas,
>
>On Wed, 1 May 2019 at 02:28, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>> I see there's an ongoing discussion about race conditions in walreceiver
>> blocking shutdown, so let me start a new thread about a race condition in
>> walsender during shutdown.
>>
>> The symptoms are rather simple - 'pg_ctl -m fast shutdown' gets stuck,
>> waiting for walsender processes to catch-up and terminate indefinitely.
>
>I can confirm, during the past couple of years we observed such a
>problem a few times and this is really annoying.
>
>> The reason for that is pretty simple - the walsenders are doing logical
>> decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
>> But per XLogSendLogical() this only happens if
>>
>>     if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
>>     {
>>         WalSndCaughtUp = true;
>>         ...
>>     }
>
>After a couple of days investigating and debugging I came to a
>slightly different conclusion, WalSndCaughtUp is set to true in my
>case.
>Since I am mostly a python person, I decided to use psycopg2 for my
>investigation. I took an example from
>http://initd.org/psycopg/docs/advanced.html#logical-replication-quick-start
>as a starting point, created a slot and started the script.
>I wasn't running any transactions, therefore the DemoConsumer.__call__
>was never executed and cursor.send_feedback(flush_lsn=msg.data_start)
>was never called either. Basically, the only what the psycopg2
>internals was doing - periodically sending keepalive messages or
>replying to keepalives sent by postgres.

OK, so that seems like a separate issue, somewhat unrelated to the issue I
reported. And I'm not sure it's a walsender issue - it seems it might be a
psycopg2 issue in not reporting the flush properly, no?

>Actually, the same problem will happen even in the case when the
>consumer script receives some message, but not very intensively, but
>it is just a bit harder to reproduce it.
>
>If you attach to the walsender with gdb, you'll see the following picture:
>(gdb) bt
>#0  0x00007fd6623d296a in __libc_send (fd=8, buf=0x55cb958dca08,
>len=94, flags=0) at ../sysdeps/unix/sysv/linux/send.c:28
>#1  0x000055cb93aa7ce9 in secure_raw_write (port=0x55cb958d71e0,
>ptr=0x55cb958dca08, len=94) at be-secure.c:318
>#2  0x000055cb93aa7b87 in secure_write (port=0x55cb958d71e0,
>ptr=0x55cb958dca08, len=94) at be-secure.c:265
>#3  0x000055cb93ab6bf9 in internal_flush () at pqcomm.c:1433
>#4  0x000055cb93ab6b89 in socket_flush () at pqcomm.c:1409
>#5  0x000055cb93dac30b in send_message_to_frontend
>(edata=0x55cb942b4380 <errordata>) at elog.c:3317
>#6  0x000055cb93da8973 in EmitErrorReport () at elog.c:1481
>#7  0x000055cb93da5abf in errfinish (dummy=0) at elog.c:481
>#8  0x000055cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3
>"sending replication keepalive") at elog.c:1376
>#9  0x000055cb93bcae71 in WalSndKeepalive (requestReply=true) at
>walsender.c:3358

Is it stuck in the send() call forever, or did you happen to grab
this bracktrace?

>
>All above text probably looks like a brain dump, but I don't think
>that it conflicts with Tomas's findings it rather compliments them.
>I am very glad that now I know how to mitigate the problem on the
>client side, but IMHO it is also very important to fix the server
>behavior if it is ever possible.
>

I think having a report of an issue, with a way to reproduce it is a first
(and quite important) step. So thanks for doing that.

That being said, I think those are two separate issues, with different
causes and likely different fixes. I don't think fixing the xlog flush
will resolve your issue, and vice versa.

FWIW attached is a patch that I used to reliably trigger the xlog flush
issue - it simply adds LogLogicalMessage() to commit handler in the
built-in output plugin. So all you need to do is create a subscription,
start generating commit and trigger a restart. The message is 8kB, so it's
definitely long enough to overflow to the next xlog page.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: to_timestamp docs
Next
From: Bruce Momjian
Date:
Subject: Re: to_timestamp docs