RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB6275982DA7BBA25DE88F6E839EF59@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Logical replication timeout problem  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, Apr 18, 2022 at 00:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Apr 18, 2022 at 9:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 14, 2022 at 5:52 PM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:
> > >
> > > Sawada-San, Euler, do you have any opinion on this approach? I
> > > personally still prefer the approach implemented in v10 [1]
> > > especially due to the latest finding by Wang-San that we can't
> > > update the lag-tracker apart from when it is invoked at the transaction end.
> > > However, I am fine if we like this approach more.
> > >
> > > It seems v15 is simpler and less error prone than v10. v10 has a mix
> > > of
> > > OutputPluginUpdateProgress() and the new function update_progress().
> > > The v10 also calls update_progress() for every change action in
> > > pgoutput_change(). It is not a good approach for maintainability --
> > > new changes like sequences need extra calls.
> > >
> >
> > Okay, let's use the v15 approach as Sawada-San also seems to have a
> > preference for that.
> >
> > > However, as you mentioned there should handle the track lag case.
> > >
> > > Both patches change the OutputPluginUpdateProgress() so it cannot be
> > > backpatched. Are you planning to backpatch it? If so, the boolean
> > > variable (last_write or end_xacts depending of which version you are
> > > considering) could be added to LogicalDecodingContext.
> > >
> >
> > If we add it to LogicalDecodingContext then I think we have to always
> > reset the variable after its use which will make it look ugly and
> > error-prone. I was not thinking to backpatch it because of the API
> > change but I guess if we want to backpatch then we can add it to
> > LogicalDecodingContext for back-branches. I am not sure if that will
> > look committable but surely we can try.
> >
> 
> Even, if we want to add the variable in the struct in back-branches, we need to
> ensure not to change the size of the struct as it is exposed, see email [1] for a
> similar mistake we made in another case.
> 
> [1] - https://www.postgresql.org/message-
> id/2358496.1649168259%40sss.pgh.pa.us
Thanks for your comments.

I did some checks about adding the new variable in LogicalDecodingContext.
I found that because of padding, if we add the new variable at the end of
structure, it dose not make the structure size change. I verified this in
REL_10~REL_14.

So as suggested by Euler-San and Amit-San, I wrote the patch for REL_14. Attach
this patch. To prevent patch confusion, the patch for HEAD is also attached.
The patch for REL_14:
    REL_14_v1-0001-Fix-the-logical-replication-timeout-during-large-.patch
The patch for HEAD:
    v17-0001-Fix-the-logical-replication-timeout-during-large.patch

The following is the details of checks.
On gcc/Linux/x86-64, in REL_14, by looking at the size of each member variable
in the structure LogicalDecodingContext, I found that there are three parts
padding behind the following member variables:
- 7 bytes after fast_forward
- 4 bytes after prepared_write
- 4 bytes after write_xid

If we add the new variable at the end of structure (bool takes one byte), it
means we will only consume one byte of padding after member write_xid. And
then, at the end of the struct, 3 padding are still required. For easy
understanding, please refer to the following simple calculation.
(In REL14, the size of structure LogicalDecodingContext is 304 bytes.)
Before adding new variable (In REL14):
8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4  =  ‭289 (if padding is not considered)
         +7                          +4  +4  =  +15 (the padding)
So, the size of structure LogicalDecodingContext is 289+15=304.
After adding new variable (In REL14 with patch):
8+8+8+8+8+1+168+8+8+8+8+8+8+8+8+1+1+1+1+8+4+1  =  ‭290‬ (if padding is not considered)
         +7                          +4    +3  =  +14 (the padding)
So, the size of structure LogicalDecodingContext is 290+14=304.

BTW, the size of structure LogicalDecodingContext in REL_10~REL_13 is 184, 200,
200,200 respectively. And I found that at the end of the structure
LogicalDecodingContext, there are always the following members:
```
    XLogRecPtr  write_location;   --> 8
    TransactionId write_xid;      --> 4
                                  --> There are 4 padding after write_xid.
```
It means at the end of structure LogicalDecodingContext, there are 4 bytes
padding. So, if we add a new bool type variable (It takes one byte) at the end
of the structure LogicalDecodingContext, I think in the current REL_10~REL_14,
because of padding, the size of the structure LogicalDecodingContext will not
change.

Regards,
Wang wei

Attachment

pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: pg14 psql broke \d datname.nspname.relname
Next
From: Bruce Momjian
Date:
Subject: effective_io_concurrency and NVMe devices