Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers
| From | Amit Kapila |
|---|---|
| Subject | Re: Single transaction in the tablesync worker? |
| Date | |
| Msg-id | CAA4eK1K+TuF7u_VQK4rUfz8VaSP+jnxkTqG6qQ0cdJ4=MM8Mww@mail.gmail.com Whole thread |
| In response to | Re: Single transaction in the tablesync worker? (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
| Responses |
Re: Single transaction in the tablesync worker?
|
| List | pgsql-hackers |
On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > The tablesync worker in logical replication performs the table data
> > sync in a single transaction which means it will copy the initial data
> > and then catch up with apply worker in the same transaction. There is
> > a comment in LogicalRepSyncTableStart ("We want to do the table data
> > sync in a single transaction.") saying so but I can't find the
> > concrete theory behind the same. Is there any fundamental problem if
> > we commit the transaction after initial copy and slot creation in
> > LogicalRepSyncTableStart and then allow the apply of transactions as
> > it happens in apply worker? I have tried doing so in the attached (a
> > quick prototype to test) and didn't find any problems with regression
> > tests. I have tried a few manual tests as well to see if it works and
> > didn't find any problem. Now, it is quite possible that it is
> > mandatory to do the way we are doing currently, or maybe something
> > else is required to remove this requirement but I think we can do
> > better with respect to comments in this area.
>
> If we commit the initial copy, the data upto the initial copy's
> snapshot will be visible downstream. If we apply the changes by
> committing changes per transaction, the data visible to the other
> transactions will differ as the apply progresses.
>
It is not clear what you mean by the above. The way you have written
appears that you are saying that instead of copying the initial data,
I am saying to copy it transaction-by-transaction. But that is not the
case. I am saying copy the initial data by using REPEATABLE READ
isolation level as we are doing now, commit it and then process
transaction-by-transaction till we reach sync-point (point till where
apply worker has already received the data).
> You haven't
> clarified whether we will respect the transaction boundaries in the
> apply log or not. I assume we will.
>
It will be transaction-by-transaction.
> Whereas if we apply all the
> changes in one go, other transactions either see the data before
> resync or after it without any intermediate states.
>
What is the problem even if the user is able to see the data after the
initial copy?
> That will not
> violate consistency, I think.
>
I am not sure how consistency will be broken.
> That's all I can think of as the reason behind doing a whole resync as
> a single transaction.
>
Thanks for sharing your thoughts.
--
With Regards,
Amit Kapila.
pgsql-hackers by date: