Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1JYFXEoFhJAvg1qU=nZrZLw_87X=2YWQGFBbcBGirAUwA@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Oct 11, 2022 at 5:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > About your point that having different partition structures for
> > publisher and subscriber, I don't know how common it will be once we
> > have DDL replication. Also, the default value of
> > publish_via_partition_root is false which doesn't seem to indicate
> > that this is a quite common case.
>
> So how can we consider these concurrent issues that could happen only
> when streaming = 'parallel'? Can we restrict some use cases to avoid
> the problem or can we have a safeguard against these conflicts?
>

Yeah, right now the strategy is to disallow parallel apply for such
cases as you can see in *0003* patch.

> We
> could find a new problematic scenario in the future and if it happens,
> logical replication gets stuck, it cannot be resolved only by apply
> workers themselves.
>

I think users can change streaming option to on/off and internally the
parallel apply worker can detect and restart to allow replication to
proceed. Having said that, I think that would be a bug in the code and
we should try to fix it. We may need to disable parallel apply in the
problematic case.

The other ideas that occurred to me in this regard are (a) provide a
reloption (say parallel_apply) at table level and we can use that to
bypass various checks like different Unique Key between
publisher/subscriber, constraints/expressions having mutable
functions, Foreign Key (when enabled on subscriber), operations on
Partitioned Table. We can't detect whether those are safe or not
(primarily because of a different structure in publisher and
subscriber) so we prohibit parallel apply but if users use this
option, we can allow it even in those cases. (b) While enabling the
parallel option in the subscription, we can try to match all the
table(s) information of the publisher/subscriber. It will be tricky to
make this work because say even if match some trigger function name,
we won't be able to match the function body. The other thing is when
at a later point the table definition is changed on the subscriber, we
need to again validate the information between publisher and
subscriber which I think would be difficult as we would be already in
between processing some message and getting information from the
publisher at that stage won't be possible.

Thoughts?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Expose Parallelism counters planned/execute in pg_stat_statements
Next
From: Dilip Kumar
Date:
Subject: Re: Support logical replication of DDLs