Re: walsender bug: stuck during shutdown - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: walsender bug: stuck during shutdown
Date
Msg-id abd3220d-bf25-6118-7060-5e9cf7cdfc74@oss.nttdata.com
Whole thread Raw
In response to Re: walsender bug: stuck during shutdown  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: walsender bug: stuck during shutdown
List pgsql-hackers

On 2020/11/26 11:45, Alvaro Herrera wrote:
> On 2020-Nov-26, Fujii Masao wrote:
> 
>> On the second thought, walsender doesn't wait forever unless
>> wal_sender_timeout is disabled, even in the case in discussion?
>> Or if there is the case where wal_sender_timeout doesn't work expectedly,
>> we might need to fix that at first.
> 
> Hmm, no, it doesn't wait forever in that sense; tracing with the
> debugger shows that the process is looping continuously.

Yes, so the problem here is that walsender goes into the busy loop
in that case. Seems this happens only in logical replication walsender.
In physical replication walsender, WaitLatchOrSocket() in WalSndLoop()
seems to work as expected and prevent it from entering into busy loop
even in that case.

        /*
         * If postmaster asked us to stop, don't wait anymore.
         *
         * It's important to do this check after the recomputation of
         * RecentFlushPtr, so we can send all remaining data before shutting
         * down.
         */
        if (got_STOPPING)
            break;

The above code in WalSndWaitForWal() seems to cause this issue. But I've
not come up with idea about how to fix yet.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Stronger safeguard for archive recovery not to miss data
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Stronger safeguard for archive recovery not to miss data