Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAD21AoCWovvhGBD2uKcQqbk6px6apswuBrs6dR9+WhP1j2LdsQ@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Mon, Oct 24, 2022 at 8:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 12, 2022 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 11, 2022 at 5:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > About your point that having different partition structures for
> > > > publisher and subscriber, I don't know how common it will be once we
> > > > have DDL replication. Also, the default value of
> > > > publish_via_partition_root is false which doesn't seem to indicate
> > > > that this is a quite common case.
> > >
> > > So how can we consider these concurrent issues that could happen only
> > > when streaming = 'parallel'? Can we restrict some use cases to avoid
> > > the problem or can we have a safeguard against these conflicts?
> > >
> >
> > Yeah, right now the strategy is to disallow parallel apply for such
> > cases as you can see in *0003* patch.
>
> Tightening the restrictions could work in some cases but there might
> still be coner cases and it could reduce the usability. I'm not really
> sure that we can ensure such a deadlock won't happen with the current
> restrictions. I think we need something safeguard just in case. For
> example, if the leader apply worker is waiting for a lock acquired by
> its parallel worker, it cancels the parallel worker's transaction,
> commits its transaction, and restarts logical replication. Or the
> leader can log the deadlock to let the user know.
>

As another direction, we could make the parallel apply feature robust
if we can detect deadlocks that happen among the leader worker and
parallel workers. I'd like to summarize the idea discussed off-list
(with Amit, Hou-San, and Kuroda-San) for discussion. The basic idea is
that when the leader worker or parallel worker needs to wait for
something (eg. transaction completion, messages) we use lmgr
functionality so that we can create wait-for edges and detect
deadlocks in lmgr.

For example, a scenario where a deadlock occurs is the following:

[Publisher]
create table tab1(a int);
create publication pub for table tab1;

[Subcriber]
creat table tab1(a int primary key);
create subscription sub connection 'port=10000 dbname=postgres'
publication pub with (streaming = parallel);

TX1:
BEGIN;
INSERT INTO tab1 SELECT i FROM generate_series(1, 5000) s(i); -- streamed
    Tx2:
    BEGIN;
    INSERT INTO tab1 SELECT i FROM generate_series(1, 5000) s(i); -- streamed
    COMMIT;
COMMIT;

Suppose a parallel apply worker (PA-1) is executing TX-1 and the
leader apply worker (LA) is executing TX-2 concurrently on the
subscriber. Now, LA is waiting for PA-1 because of the unique key of
tab1 while PA-1 is waiting for LA to send further messages. There is a
deadlock between PA-1 and LA but lmgr cannot detect it.

One idea to resolve this issue is that we have LA acquire a session
lock on a shared object (by LockSharedObjectForSession()) and have
PA-1 wait on the lock before trying to receive messages. IOW,  LA
acquires the lock before sending STREAM_STOP and releases it if
already acquired before sending STREAM_START, STREAM_PREPARE and
STREAM_COMMIT. For PA-1, it always needs to acquire the lock after
processing STREAM_STOP and then release immediately after acquiring
it. That way, when PA-1 is waiting for LA, we can have a wait-edge
from PA-1 to LA in lmgr, which will make a deadlock in lmgr like:

LA (waiting to acquire lock) -> PA-1 (waiting to acquire the shared
object) -> LA

We would need the shared objects per parallel apply worker.

After detecting a deadlock, we can restart logical replication with
temporarily disabling the parallel apply, which is done by 0005 patch.

Another scenario is similar to the previous case but TX-1 and TX-2 are
executed by two parallel apply workers (PA-1 and PA-2 respectively).
In this scenario, PA-2 is waiting for PA-1 to complete its transaction
while PA-1 is waiting for subsequent input from LA. Also, LA is
waiting for PA-2 to complete its transaction in order to preserve the
commit order. There is a deadlock among three processes but it cannot
be detected in lmgr because the fact that LA is waiting for PA-2 to
complete its transaction doesn't appear in lmgr (see
parallel_apply_wait_for_xact_finish()). To fix it, we can use
XactLockTableWait() instead.

However, since XactLockTableWait() considers PREPARED TRANSACTION as
still in progress, probably we need a similar trick as above in case
where a transaction is prepared. For example, suppose that TX-2 was
prepared instead of committed in the above scenario, PA-2 acquires
another shared lock at START_STREAM and releases it at
STREAM_COMMIT/PREPARE. LA can wait on the lock.

Yet another scenario where LA has to wait is the case where the shm_mq
buffer is full. In the above scenario (ie. PA-1 and PA-2 are executing
transactions concurrently), if  the shm_mq buffer between LA and PA-2
is full, LA has to wait to send messages, and this wait doesn't appear
in lmgr. To fix it, probably we have to use non-blocking write and
wait with a timeout. If timeout is exceeded, the LA will write to file
and indicate PA-2 that it needs to read file for remaining messages.
Then LA will start waiting for commit which will detect deadlock if
any.

If we can detect deadlocks by having such a functionality or some
other way then we don't need to tighten the restrictions of subscribed
tables' schemas etc.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Next
From: Pavel Stehule
Date:
Subject: Re: PL/pgSQL cursors should get generated portal names by default