Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAExHW5uXKDVH9Y1p35PmOs6y-WK-xU82Enr-96OPxnVUkBOhDA@mail.gmail.com
Whole thread Raw
In response to Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Single transaction in the tablesync worker?
List pgsql-hackers
On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> The tablesync worker in logical replication performs the table data
> sync in a single transaction which means it will copy the initial data
> and then catch up with apply worker in the same transaction. There is
> a comment in LogicalRepSyncTableStart ("We want to do the table data
> sync in a single transaction.") saying so but I can't find the
> concrete theory behind the same. Is there any fundamental problem if
> we commit the transaction after initial copy and slot creation in
> LogicalRepSyncTableStart and then allow the apply of transactions as
> it happens in apply worker? I have tried doing so in the attached (a
> quick prototype to test) and didn't find any problems with regression
> tests. I have tried a few manual tests as well to see if it works and
> didn't find any problem. Now, it is quite possible that it is
> mandatory to do the way we are doing currently, or maybe something
> else is required to remove this requirement but I think we can do
> better with respect to comments in this area.

If we commit the initial copy, the data upto the initial copy's
snapshot will be visible downstream. If we apply the changes by
committing changes per transaction, the data visible to the other
transactions will differ as the apply progresses. You haven't
clarified whether we will respect the transaction boundaries in the
apply log or not. I assume we will. Whereas if we apply all the
changes in one go, other transactions either see the data before
resync or after it without any intermediate states. That will not
violate consistency, I think.

That's all I can think of as the reason behind doing a whole resync as
a single transaction.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Commitfest 2020-11 is closed
Next
From: Craig Ringer
Date:
Subject: Re: pg_ctl.exe file deleted automatically