Thread: pgsql: When WalSndCaughtUp, sleep only in WalSndWaitForWal().

pgsql: When WalSndCaughtUp, sleep only in WalSndWaitForWal().

From
Noah Misch
Date:
When WalSndCaughtUp, sleep only in WalSndWaitForWal().

Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
< sentPtr.  That is important in logical replication.  When the latest
physical LSN yields no logical replication messages (a common case),
that keepalive elicits a reply, and processing the reply updates
pg_stat_replication.replay_lsn.  WalSndLoop() lacks that; when
WalSndLoop() slept, replay_lsn advancement could stall until
wal_receiver_status_interval elapsed.  This sometimes stalled
src/test/subscription/t/001_rep_changes.pl for up to 10s.

Discussion: https://postgr.es/m/20200406063649.GA3738151@rfd.leadboat.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/421685812290406daea58b78dfab0346eb683bbb

Modified Files
--------------
src/backend/replication/walsender.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)


Re: pgsql: When WalSndCaughtUp, sleep only in WalSndWaitForWal().

From
Fujii Masao
Date:

On 2020/04/12 2:35, Noah Misch wrote:
> When WalSndCaughtUp, sleep only in WalSndWaitForWal().
> 
> Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
> < sentPtr.  That is important in logical replication.  When the latest
> physical LSN yields no logical replication messages (a common case),
> that keepalive elicits a reply, and processing the reply updates
> pg_stat_replication.replay_lsn.  WalSndLoop() lacks that; when
> WalSndLoop() slept, replay_lsn advancement could stall until
> wal_receiver_status_interval elapsed.  This sometimes stalled
> src/test/subscription/t/001_rep_changes.pl for up to 10s.

Since this commit, walsender started consuming CPU resource too much in my env.

              wakeEvents = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT |
-                WL_SOCKET_READABLE;
+                WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE;

I wonder if this change caused WaitLatchOrSocket() in WalSndLoop() to wake up
frequently more than necessary.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: pgsql: When WalSndCaughtUp, sleep only in WalSndWaitForWal().

From
Noah Misch
Date:
On Fri, Apr 17, 2020 at 04:50:38AM +0900, Fujii Masao wrote:
> On 2020/04/12 2:35, Noah Misch wrote:
> >When WalSndCaughtUp, sleep only in WalSndWaitForWal().
> >
> >Before sleeping, WalSndWaitForWal() sends a keepalive if MyWalSnd->write
> >< sentPtr.  That is important in logical replication.  When the latest
> >physical LSN yields no logical replication messages (a common case),
> >that keepalive elicits a reply, and processing the reply updates
> >pg_stat_replication.replay_lsn.  WalSndLoop() lacks that; when
> >WalSndLoop() slept, replay_lsn advancement could stall until
> >wal_receiver_status_interval elapsed.  This sometimes stalled
> >src/test/subscription/t/001_rep_changes.pl for up to 10s.
> 
> Since this commit, walsender started consuming CPU resource too much in my env.

Confirmed.  I have shared this with the main thread and added details there.

>              wakeEvents = WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT |
> -                WL_SOCKET_READABLE;
> +                WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE;
> 
> I wonder if this change caused WaitLatchOrSocket() in WalSndLoop() to wake up
> frequently more than necessary.

I collected lower wakeup counts after the commit.  The problem is a shortage
of waits.