On Tue, Mar 8, 2022 at 3:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've looked at the patch and have a question:
Thanks for your review and comments.
> +void
> +SendKeepaliveIfNecessary(LogicalDecodingContext *ctx, bool skipped) {
> + static int skipped_changes_count = 0;
> +
> + /*
> + * skipped_changes_count is reset when processing changes that do not
> + * need to be skipped.
> + */
> + if (!skipped)
> + {
> + skipped_changes_count = 0;
> + return;
> + }
> +
> + /*
> + * After continuously skipping SKIPPED_CHANGES_THRESHOLD
> changes, try to send a
> + * keepalive message.
> + */
> + #define SKIPPED_CHANGES_THRESHOLD 10000
> +
> + if (++skipped_changes_count >= SKIPPED_CHANGES_THRESHOLD)
> + {
> + /* Try to send a keepalive message. */
> + OutputPluginUpdateProgress(ctx, true);
> +
> + /* After trying to send a keepalive message, reset the flag. */
> + skipped_changes_count = 0;
> + }
> +}
>
> Since we send a keepalive after continuously skipping 10000 changes, the
> originally reported issue can still occur if skipping 10000 changes took more than
> the timeout and the walsender didn't send any change while that, is that right?
Yes, theoretically so.
But after testing, I think this value should be conservative enough not to reproduce
this bug.
After the previous discussion[1], it is currently considered that it is better
to directly set a conservative threshold than to calculate the threshold based
on wal_sender_timeout.
[1] -
https://www.postgresql.org/message-id/OS3PR01MB6275FEB9F83081F1C87539B99E019%40OS3PR01MB6275.jpnprd01.prod.outlook.com
Regards,
Wang wei