Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAA4eK1JYoxoa=GdQ73G1Ohz+b2jfAECFFAHtvkj-+qPJKUsNgA@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker?
List pgsql-hackers
On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer
> <craig.ringer@enterprisedb.com> wrote:
> >
>
> >>
> >> I am not sure why but it seems acceptable to original authors that the
> >> data of transactions are visibly partially during the initial
> >> synchronization phase for a subscription.
> >
> >
> > I don't think there's much alternative there.
> >
>
> I am not sure about this. I think it is primarily to allow some more
> parallelism among apply and sync workers. One primitive way to achieve
> parallelism and don't have this problem is to allow apply worker to
> wait till all the tablesync workers are in DONE state.
>

As the slot of apply worker is created before all the tablesync
workers it should never miss any LSN which tablesync workers would
have processed. Also, the table sync workers should not process any
xact if the apply worker has not processed anything. I think tablesync
currently always processes one transaction (because we call
process_sync_tables at commit of a txn) even if that is not required
to be in sync with the apply worker. This should solve both the
problems (a) visibility of partial transactions (b) allow prepared
transactions because tablesync worker no longer needs to combine
multiple transactions data.

I think the other advantages of this would be that it would reduce the
load (both CPU and I/O) on the publisher-side by allowing to decode
the data only once instead of for each table sync worker once and
separately for the apply worker. I think it will use fewer resources
to finish the work.

Is there any flaw in this idea which I am missing?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Parallel copy
Next
From: Amit Kapila
Date:
Subject: Re: Logical archiving