Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Logical replication timeout problem
Date
Msg-id CAD21AoAo6x3rAQ7VuzPT9paA4Y7uuWPqdQn_XYk1bWpCMF_N5g@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Mar 25, 2022 at 5:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Mar 25, 2022 at 11:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Mar 25, 2022 at 2:23 PM wangw.fnst@fujitsu.com
> > <wangw.fnst@fujitsu.com> wrote:
> >
> > Since commit 75b1521 added decoding of sequence to logical
> > replication, the patch needs to have pgoutput_sequence() call
> > update_progress().
> >
>
> Yeah, I also think this needs to be addressed. But apart from this, I
> want to know your and other's opinion on the following two points:
> a. Both this and the patch discussed in the nearby thread [1] add an
> additional parameter to
> WalSndUpdateProgress/OutputPluginUpdateProgress and it seems to me
> that both are required. The additional parameter 'last_write' added by
> this patch indicates: "If the last write is skipped then try (if we
> are close to wal_sender_timeout) to send a keepalive message to the
> receiver to avoid timeouts.". This means it can be used after any
> 'write' message. OTOH, the parameter 'skipped_xact' added by another
> patch [1] indicates if we have skipped sending anything for a
> transaction then sendkeepalive for synchronous replication to avoid
> any delays in such a transaction. Does this sound reasonable or can
> you think of a better way to deal with it?

These current approaches look good to me.

> b. Do we want to backpatch the patch in this thread? I am reluctant to
> backpatch because it changes the exposed API which can have an impact
> and second there exists a workaround (user can increase
> wal_sender_timeout/wal_receiver_timeout).

Yeah, we should avoid API changes between minor versions. I feel it's
better to fix it also for back-branches but probably we need another
fix for them. The issue reported on this thread seems quite
confusable; it looks like a network problem but is not true. Also, the
user who faced this issue has to increase wal_sender_timeout due to
the decoded data size, which also means to delay detecting network
problems. It seems an unrelated trade-off.

Regards,
-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add pg_freespacemap extension sql test
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side