Re: walsender bug: stuck during shutdown - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: walsender bug: stuck during shutdown
Date
Msg-id 20201204182707.GA8461@alvherre.pgsql
Whole thread Raw
In response to Re: walsender bug: stuck during shutdown  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2020-Nov-26, Fujii Masao wrote:

> Yes, so the problem here is that walsender goes into the busy loop
> in that case. Seems this happens only in logical replication walsender.
> In physical replication walsender, WaitLatchOrSocket() in WalSndLoop()
> seems to work as expected and prevent it from entering into busy loop
> even in that case.
> 
>         /*
>          * If postmaster asked us to stop, don't wait anymore.
>          *
>          * It's important to do this check after the recomputation of
>          * RecentFlushPtr, so we can send all remaining data before shutting
>          * down.
>          */
>         if (got_STOPPING)
>             break;
> 
> The above code in WalSndWaitForWal() seems to cause this issue. But I've
> not come up with idea about how to fix yet.

With DEBUG1 I observe that walsender is getting a lot of 'r' messages
(standby reply) with all zeroes:

2020-12-01 21:01:24.100 -03 [15307] DEBUG:  write 0/0 flush 0/0 apply 0/0

However, while doing that I also observed that if I do send some
activity to the logical replication stream, with the provided program,
it will *still* have the 'write' pointer set to 0/0, and the 'flush'
pointer has moved forward to what was sent.  I'm not clear on what
causes the write pointer to move forward in logical replication.

Still, the previously proposed patch does resolve the problem in either
case.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Next
From: Stephen Frost
Date:
Subject: Re: WIP: WAL prefetch (another approach)