Re: 001_rep_changes.pl stalls - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: 001_rep_changes.pl stalls
Date
Msg-id dcdb33ba-4bcc-d41e-4aad-23207e195ccf@oss.nttdata.com
Whole thread Raw
In response to Re: 001_rep_changes.pl stalls  (Noah Misch <noah@leadboat.com>)
Responses Re: 001_rep_changes.pl stalls  (Noah Misch <noah@leadboat.com>)
Re: 001_rep_changes.pl stalls  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers

On 2020/04/18 16:01, Noah Misch wrote:
> On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote:
>> At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>>> By the way, if latch is consumed in WalSndLoop, succeeding call to
>>> WalSndWaitForWal cannot be woke-up by the latch-set.  Doesn't that
>>> cause missing wakeups? (in other words, overlooking of wakeup latch).
>>
>> - Since the only source other than timeout of walsender wakeup is latch,
>> - we should avoid wasteful consuming of latch. (It is the same issue
>> - with [1]).
>>
>> + Since walsender is wokeup by LSN advancement via latch, we should
>> + avoid wasteful consuming of latch. (It is the same issue with [1]).
>>
>>
>>> If wakeup signal is not remembered on walsender (like
>>> InterruptPending), WalSndPhysical cannot enter a sleep with
>>> confidence.
> 
> No; per latch.h, "What must be avoided is placing any checks for asynchronous
> events after WaitLatch and before ResetLatch, as that creates a race
> condition."  In other words, the thing to avoid is calling ResetLatch()
> without next examining all pending work that a latch would signal.  Each
> walsender.c WaitLatch call does follow the rules.
> 
> On Sat, Apr 18, 2020 at 12:29:58AM +0900, Fujii Masao wrote:
>> On 2020/04/17 14:41, Noah Misch wrote:
>>> 1. Make XLogSendPhysical() more like XLogSendLogical(), calling
>>>     WalSndWaitForWal() when no WAL is available.  A quick version of this
>>>     passes tests, but I'll need to audit WalSndWaitForWal() for things that are
>>>     wrong for physical replication.
>>
>> (1) makes even physical replication walsender sleep in two places and
>> which seems to make the code for physical replication complicated
>> more than necessary. I'd like to avoid (1) if possible.
> 
> Good point.
> 
>>> 2. Make XLogSendLogical() more like XLogSendPhysical(), returning when
>>>     insufficient WAL is available.  This complicates the xlogreader.h API to
>>>     pass back "wait for this XLogRecPtr", and we'd then persist enough state to
>>>     resume decoding.  This doesn't have any advantages to make up for those.
>>>
>>> 3. Don't avoid waiting in WalSndLoop(); instead, fix the stall by copying the
>>>     WalSndKeepalive() call from WalSndWaitForWal() to WalSndLoop().  This risks
>>>     further drift between the two wait sites; on the other hand, one could
>>>     refactor later to help avoid that.
>>
>> Since the additional call of WalSndKeepalive() is necessary only for
>> logical replication, it should be copied to, e.g., XLogSendLogical(),
>> instead of WalSndLoop()? For example, when XLogSendLogical() sets
>> WalSndCaughtUp to true, it should call WalSndKeepalive()?
> 
> We'd send a keepalive even when pq_flush_if_writable() can't empty the output
> buffer.  That could be acceptable, but it's not ideal.
> 
>> The root problem seems that when WAL record that's no-opes for
>> logical rep is processed, keep alive message has not sent immediately,
>> in spite of that we want pg_stat_replication to be updated promptly.
> 
> The degree of promptness should be predictable, at least.  If we removed the
> WalSndKeepalive() from WalSndWaitForWal(), pg_stat_replication updates would
> not be prompt, but they would be predictable.  I do, however, think prompt
> updates are worthwhile.
> 
>> (3) seems to try to address this problem straightly and looks better to me.
>>
>>> 4. Keep the WalSndLoop() wait, but condition it on !logical.  This is the
>>>     minimal fix, but it crudely punches through the abstraction between
>>>     WalSndLoop() and its WalSndSendDataCallback.
>>
>> (4) also looks good because it's simple, if we can redesign those
>> functions in good shape.
> 
> Let's do that.  I'm attaching the replacement implementation and the revert of
> v1.

Thanks for the patch! Though referencing XLogSendLogical inside WalSndLoop()
might be a bit ugly,, I'm fine with this change because it's simple and easier
to understand.

+         * Block if we have unsent data.  XXX For logical replication, let
+         * WalSndWaitForWal(), handle any other blocking; idle receivers need
+         * its additional actions.  For physical replication, also block if
+         * caught up; its send_data does not block.

It might be better to s/WalSndWaitForWal()/send_data()? Because not only
WalSndWaitForWal() but also WalSndWriteData() seems to handle the blocking.
WalSndWriteData() is called also under send_data, i.e., XLogSendLogical().

     frame #2: 0x0000000106bcfa84 postgres`WalSndWriteData(ctx=0x00007fb2a4812d20, lsn=22608080, xid=488,
last_write=false)at walsender.c:1247:2
 
     frame #3: 0x0000000106b98295 postgres`OutputPluginWrite(ctx=0x00007fb2a4812d20, last_write=false) at
logical.c:540:2
     frame #4: 0x00000001073fe9b8 pgoutput.so`send_relation_and_attrs(relation=0x00000001073ba2c0,
ctx=0x00007fb2a4812d20)at pgoutput.c:353:2
 
     frame #5: 0x00000001073fe7a0 pgoutput.so`maybe_send_schema(ctx=0x00007fb2a4812d20, relation=0x00000001073ba2c0,
relentry=0x00007fb2a483aa60)at pgoutput.c:315:2
 
     frame #6: 0x00000001073fd4c0 pgoutput.so`pgoutput_change(ctx=0x00007fb2a4812d20, txn=0x00007fb2a502e428,
relation=0x00000001073ba2c0,change=0x00007fb2a5030428) at pgoutput.c:394:2
 
     frame #7: 0x0000000106b99094 postgres`change_cb_wrapper(cache=0x00007fb2a482ed20, txn=0x00007fb2a502e428,
relation=0x00000001073ba2c0,change=0x00007fb2a5030428) at logical.c:753:2
 
     frame #8: 0x0000000106ba2200 postgres`ReorderBufferCommit(rb=0x00007fb2a482ed20, xid=488, commit_lsn=22621088,
end_lsn=22621136,commit_time=640675460323211, origin_id=0, origin_lsn=0) at reorderbuffer.c:1653:7
 
     frame #9: 0x0000000106b93c10 postgres`DecodeCommit(ctx=0x00007fb2a4812d20, buf=0x00007ffee954c0f8,
parsed=0x00007ffee954bf90,xid=488) at decode.c:637:2
 
     frame #10: 0x0000000106b92fa9 postgres`DecodeXactOp(ctx=0x00007fb2a4812d20, buf=0x00007ffee954c0f8) at
decode.c:245:5
     frame #11: 0x0000000106b92aee postgres`LogicalDecodingProcessRecord(ctx=0x00007fb2a4812d20,
record=0x00007fb2a4812fe0)at decode.c:114:4
 
     frame #12: 0x0000000106bd2a16 postgres`XLogSendLogical at walsender.c:2863:3

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: While restoring -getting error if dump contain sql statementsgenerated from generated.sql file
Next
From: Masahiko Sawada
Date:
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority