Re: Logical replication timeout problem - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Logical replication timeout problem |
Date | |
Msg-id | CAA4eK1KYbj+-=2ZdhqcW9m+xS9dHHKc1e5f2LG=R+Qg2zWbNcw@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication timeout problem (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
RE: Logical replication timeout problem
|
List | pgsql-hackers |
On Fri, Jan 6, 2023 at 12:35 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > + > + /* > + * We don't want to try sending a keepalive message after processing each > + * change as that can have overhead. Tests revealed that there is no > + * noticeable overhead in doing it after continuously processing 100 or so > + * changes. > + */ > +#define CHANGES_THRESHOLD 100 > > I think a time based threashold makes more sense. What if the timeout was > nearing and those 100 changes just took little more time causing a timeout? We > already have a time based threashold in WalSndKeepaliveIfNecessary(). And that > function is invoked after reading every WAL record in WalSndLoop(). So it does > not look like it's an expensive function. If it is expensive we might want to > worry about WalSndLoop as well. Does it make more sense to remove this > threashold? > We have previously tried this for every change [1] and it brings noticeable overhead. In fact, even doing it for every 10 changes also had some overhead which is why we reached this threshold number. I don't think it can lead to timeout due to skipping changes but sure if we see any such report we can further fine-tune this setting or will try to make it time-based but for now I feel it would be safe to use this threshold. > + > + /* > + * After continuously processing CHANGES_THRESHOLD changes, we > + * try to send a keepalive message if required. > + */ > + if (++changes_count >= CHANGES_THRESHOLD) > + { > + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false); > + changes_count = 0; > + } > +} > + > > On the other thread, I mentioned that we don't have a TAP test for it. > I agree with > Amit's opinion there that it's hard to create a test which will timeout > everywhere. I think what we need is a way to control the time required for > decoding a transaction. > > A rough idea is to induce a small sleep after decoding every change. The amount > of sleep * number of changes will help us estimate and control the amount of > time taken to decode a transaction. Then we create a transaction which will > take longer than the timeout threashold to decode. But that's a > significant code. I > don't think PostgreSQL has a facility to induce a delay at a particular place > in the code. > Yeah, I don't know how to induce such a delay while decoding changes. One more thing, I think it would be better to expose a new callback API via reorder buffer as suggested previously [2] similar to other reorder buffer APIs instead of directly using reorderbuffer API to invoke plugin API. [1] - https://www.postgresql.org/message-id/OS3PR01MB6275DFFDAC7A59FA148931529E209%40OS3PR01MB6275.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/CAA4eK1%2BfQjndoBOFUn9Wy0hhm3MLyUWEpcT9O7iuCELktfdBiQ%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: