RE: logical replication empty transactions - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: logical replication empty transactions
Date
Msg-id OS0PR01MB571653811244224062F3CA87941A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: logical replication empty transactions  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: logical replication empty transactions
List pgsql-hackers
On Thursday, March 24, 2022 11:19 AM houzj.fnst@fujitsu.com wrote:
> On Tuesday, March 22, 2022 7:50 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > On Tue, Mar 22, 2022 at 7:25 AM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > > On Monday, March 21, 2022 6:01 PM Amit Kapila
> > > > <amit.kapila16@gmail.com>
> > > > wrote:
> > >
> > > Oh, sorry, I posted the wrong patch, here is the correct one.
> > >
> >
> > The test change looks good to me. I think additionally we can verify
> > that the record is not reflected in the subscriber table. Apart from
> > that, I had made minor changes mostly in the comments in the attached
> > patch. If those look okay to you, please include those in the next version.
> 
> Thanks, the changes look good to me, I merged the diff patch.
> 
> Attach the new version patch which include the following changes:
> 
> - Fix a typo
> - Change the requestreply flag of the newly added WalSndKeepalive to false,
>   because the subscriber can judge whether it's necessary to post a reply
> based
>   on the received LSN.
> - Add a testcase to make sure there is no data in subscriber side when the
>   transaction is skipped.
> - Change the name of flag skipped_empty_xact to skipped_xact which seems
> more
>   understandable.
> - Merge Amit's suggested changes.
> 

I did some more review for the newly added keepalive message and confirmed that
it's necessary to send this in sync mode.

+    if (skipped_xact &&
+        SyncRepRequested() &&
+        ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+        WalSndKeepalive(false, ctx->write_location);

Because in sync replication, the publisher need to get the reply from
subscirber to release the waiter. After applying the patch, we don't send empty
transaction to subscriber, so we won't get a reply without this keepalive
message. Although the walsender usually invoke WalSndWaitForWal() which will
also send a keepalive message to subscriber, and we could get a reply and
release the wait. But WalSndWaitForWal() is not always invoked for each record.
When reading the page, we won't invoke WalSndWaitForWal() if we already have
the record in our buffer[1].

[1] ReadPageInternal(
...
    /* check whether we have all the requested data already */
    if (targetSegNo == state->seg.ws_segno &&
        targetPageOff == state->segoff && reqLen <= state->readLen)
        return state->readLen;
...

Based on above, if we don't have the newly added keepalive message in the
patch, the transaction could wait for a bit more time to finish.

For example, I did some experiments to confirm:
1. Set LOG_SNAPSHOT_INTERVAL_MS and checkpoint_timeout to a bigger value to
   make sure it doesn't generate extra WAL which could affect the test.
2. Use debugger to attach the walsender and let it stop in the WalSndWaitForWal()
3. Start two clients and modify un-published table
postgres1 # INSERT INTO not_rep VALUES(1);
----        waiting
postgres2 # INSERT INTO not_rep VALUES(1);
----        waiting
4. Release the walsender, and we can see it won't send a keepalive to
   subscriber until it has handled all the above two transactions, which means
   the two transaction will wait until all of them has been decoded. This
   behavior doesn't looks good and is inconsistent with the current
   behavior(the transaction will finish after decoding it or after sending it
   to sub if necessary).

So, I think the newly add keepalive message makes sense.

Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: pg_dump new feature: exporting functions only. Bad or good idea ?
Next
From: Robert Haas
Date:
Subject: Re: Corruption during WAL replay