On Thu, Jan 20, 2022 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> It might be not reaching the actual send_keep_alive logic in
> WalSndKeepaliveIfNecessary because of below code:
> {
> ...
> /*
> * Don't send keepalive messages if timeouts are globally disabled or
> * we're doing something not partaking in timeouts.
> */
> if (wal_sender_timeout <= 0 || last_reply_timestamp <= 0) return; ..
> }
>
> I think you can add elog before the above return and before updating progress
> in the below code:
> case REORDER_BUFFER_CHANGE_INSERT:
> if (!relentry->pubactions.pubinsert)
> + {
> + OutputPluginUpdateProgress(ctx);
> return;
>
> This will help us to rule out one possibility.
Thanks for your advices!
According to your advices, I applied 0001,0002 and 0003 to run the test script.
When subscriber timeout, I filter publisher-side log:
$ grep "before invoking update progress" pub.log | wc -l
60373557
$ grep "return because wal_sender_timeout or last_reply_timestamp" pub.log | wc -l
0
$ grep "return because waiting_for_ping_response" pub.log | wc -l
0
Based on this result, I think function WalSndKeepaliveIfNecessary was invoked,
but function WalSndKeepalive was not invoked because (last_processing >=
ping_time) is false.
So I tried to see changes about last_processing and last_reply_timestamp
(because ping_time is based on last_reply_timestamp).
I found last_processing and last_reply_timestamp is set in function
ProcessRepliesIfAny.
last_processing is set to the time when function ProcessRepliesIfAny is
invoked.
Only when publisher receive a response from subscriber, last_reply_timestamp is
set to last_processing and the flag waiting_for_ping_response is reset to
false.
When we are during the loop to skip all the changes of transaction, IIUC, we do
not invoke function ProcessRepliesIfAny. So I think last_processing and
last_reply_timestamp will not be changed in this loop.
Therefore I think about our use case, we should modify the condition of
invoking WalSndKeepalive.(please refer to
0004-Simple-modification-of-timing.patch, and note that this is only a patch
for testing).
At the same time I modify the input of WalSndKeepalive from true to false. This
is because when input is true, waiting_for_ping_response is set to true in
WalSndKeepalive. As mentioned above, ProcessRepliesIfAny is not invoked in the
loop, so I think waiting_for_ping_response will not be reset to false and
keepalive messages will not be sent.
I tested after applying patches(0001 and 0004), I found the timeout was not
printed in subscriber-side log. And the added messages "begin load changes" and
"commit the log" were printed in publisher-side log:
$ grep -ir "begin load changes" pub.log
2022-01-21 11:17:06.934 CST [2577699] LOG: begin load changes
$ grep -ir "commit the log" pub.log
2022-01-21 11:21:15.564 CST [2577699] LOG: commit the log
Attach the patches and test script mentioned above, in case someone wants to
try.
Regards,
Wang wei