Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1LJMpBYbgDVz=g4qig8C2bM10PK=DgW7o1zn8X426vJ+Q@mail.gmail.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Thu, Dec 8, 2022 at 12:37 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>

Review comments
==============
1. Currently, we don't release the stream lock in LA (leade apply
worker) for "rollback to savepoint" and the reason is mentioned in
comments of apply_handle_stream_abort() in the patch. But, today,
while testing, I found that can lead to deadlock which otherwise,
won't happen on the publisher. The key point is rollback to savepoint
releases the locks acquired by the particular subtransaction, so
parallel apply worker should also do the same. Consider the following
example where the transaction in session-1 is being performed by the
parallel apply worker and the transaction in session-2 is being
performed by the leader apply worker. I have simulated it by using GUC
force_stream_mode.

Publisher
==========
Session-1
postgres=# begin;
BEGIN
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# truncate t1;
TRUNCATE TABLE

Session-2
postgres=# begin;
BEGIN
postgres=*# insert into t1 values(4);

Session-1
postgres=*# rollback to savepoint s1;
ROLLBACK

Session-2
Commit;

With or without commit of Session-2, this scenario will lead to
deadlock on the subscriber because PA (parallel apply worker) is
waiting for LA to send the next command, and LA is blocked by
Exclusive of PA. There is no deadlock on the publisher because
rollback to savepoint will release the lock acquired by truncate.

To solve this, How about if we do three things before sending abort of
sub-transaction (a) unlock the stream lock, (b) increment
pending_stream_count, (c) take the stream lock again?

Now, if the PA is not already waiting on the stop, it will not wait at
stream_stop but will wait after applying abort of sub-transaction and
if it is already waiting at stream_stop, the wait will be released. If
this works then probably we should try to do (b) before (a) to match
the steps with stream_start.

2. There seems to be another general problem in the way the patch
waits for stream_stop in PA (parallel apply worker). Currently, PA
checks, if there are no more pending streams then it tries to wait for
the next stream by waiting on a stream lock. However, it is possible
after PA checks there is no pending stream and before it actually
starts waiting on a lock, the LA sends another stream for which even
stream_stop is sent, in this case, PA will start waiting for the next
stream whereas there is actually a pending stream available. In this
case, it won't lead to any problem apart from delay in applying the
changes in such cases but for the case mentioned in the previous point
(Pont 1), it can lead to deadlock even after we implement the solution
proposed to solve it.

3. The other point to consider is that for
stream_commit/prepare/abort, in LA, we release the stream lock after
sending the message whereas for stream_start we release it before
sending the message. I think for the earlier cases
(stream_commit/prepare/abort), the patch has done like this because
pa_send_data() may need to require the lock again when it times out
and start serializing, so there will be no sense in first releasing
it, then re-acquiring it, and then again releasing it. Can't we also
release the lock for stream_start after pa_send_data() only if it is
not switched to serialize mode?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: on placeholder entries in view rule action query's range table
Next
From: Peter Eisentraut
Date:
Subject: static assert cleanup