RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | houzj.fnst@fujitsu.com |
---|---|
Subject | RE: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | OS0PR01MB5716410E7551394B730D433294389@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
RE: Perform streaming logical transactions by background workers and parallel apply
Re: Perform streaming logical transactions by background workers and parallel apply Re: Perform streaming logical transactions by background workers and parallel apply Re: Perform streaming logical transactions by background workers and parallel apply RE: Perform streaming logical transactions by background workers and parallel apply |
List | pgsql-hackers |
On Wednesday, November 2, 2022 10:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 24, 2022 at 8:42 PM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > On Wed, Oct 12, 2022 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > On Tue, Oct 11, 2022 at 5:52 AM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > > > > > On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> > wrote: > > > > > > > > > > About your point that having different partition structures for > > > > > publisher and subscriber, I don't know how common it will be once we > > > > > have DDL replication. Also, the default value of > > > > > publish_via_partition_root is false which doesn't seem to indicate > > > > > that this is a quite common case. > > > > > > > > So how can we consider these concurrent issues that could happen only > > > > when streaming = 'parallel'? Can we restrict some use cases to avoid > > > > the problem or can we have a safeguard against these conflicts? > > > > > > > > > > Yeah, right now the strategy is to disallow parallel apply for such > > > cases as you can see in *0003* patch. > > > > Tightening the restrictions could work in some cases but there might > > still be coner cases and it could reduce the usability. I'm not really > > sure that we can ensure such a deadlock won't happen with the current > > restrictions. I think we need something safeguard just in case. For > > example, if the leader apply worker is waiting for a lock acquired by > > its parallel worker, it cancels the parallel worker's transaction, > > commits its transaction, and restarts logical replication. Or the > > leader can log the deadlock to let the user know. > > > > As another direction, we could make the parallel apply feature robust > if we can detect deadlocks that happen among the leader worker and > parallel workers. I'd like to summarize the idea discussed off-list > (with Amit, Hou-San, and Kuroda-San) for discussion. The basic idea is > that when the leader worker or parallel worker needs to wait for > something (eg. transaction completion, messages) we use lmgr > functionality so that we can create wait-for edges and detect > deadlocks in lmgr. > > For example, a scenario where a deadlock occurs is the following: > > [Publisher] > create table tab1(a int); > create publication pub for table tab1; > > [Subcriber] > creat table tab1(a int primary key); > create subscription sub connection 'port=10000 dbname=postgres' > publication pub with (streaming = parallel); > > TX1: > BEGIN; > INSERT INTO tab1 SELECT i FROM generate_series(1, 5000) s(i); -- streamed > Tx2: > BEGIN; > INSERT INTO tab1 SELECT i FROM generate_series(1, 5000) s(i); -- streamed > COMMIT; > COMMIT; > > Suppose a parallel apply worker (PA-1) is executing TX-1 and the > leader apply worker (LA) is executing TX-2 concurrently on the > subscriber. Now, LA is waiting for PA-1 because of the unique key of > tab1 while PA-1 is waiting for LA to send further messages. There is a > deadlock between PA-1 and LA but lmgr cannot detect it. > > One idea to resolve this issue is that we have LA acquire a session > lock on a shared object (by LockSharedObjectForSession()) and have > PA-1 wait on the lock before trying to receive messages. IOW, LA > acquires the lock before sending STREAM_STOP and releases it if > already acquired before sending STREAM_START, STREAM_PREPARE and > STREAM_COMMIT. For PA-1, it always needs to acquire the lock after > processing STREAM_STOP and then release immediately after acquiring > it. That way, when PA-1 is waiting for LA, we can have a wait-edge > from PA-1 to LA in lmgr, which will make a deadlock in lmgr like: > > LA (waiting to acquire lock) -> PA-1 (waiting to acquire the shared > object) -> LA > > We would need the shared objects per parallel apply worker. > > After detecting a deadlock, we can restart logical replication with > temporarily disabling the parallel apply, which is done by 0005 patch. > > Another scenario is similar to the previous case but TX-1 and TX-2 are > executed by two parallel apply workers (PA-1 and PA-2 respectively). > In this scenario, PA-2 is waiting for PA-1 to complete its transaction > while PA-1 is waiting for subsequent input from LA. Also, LA is > waiting for PA-2 to complete its transaction in order to preserve the > commit order. There is a deadlock among three processes but it cannot > be detected in lmgr because the fact that LA is waiting for PA-2 to > complete its transaction doesn't appear in lmgr (see > parallel_apply_wait_for_xact_finish()). To fix it, we can use > XactLockTableWait() instead. > > However, since XactLockTableWait() considers PREPARED TRANSACTION as > still in progress, probably we need a similar trick as above in case > where a transaction is prepared. For example, suppose that TX-2 was > prepared instead of committed in the above scenario, PA-2 acquires > another shared lock at START_STREAM and releases it at > STREAM_COMMIT/PREPARE. LA can wait on the lock. > > Yet another scenario where LA has to wait is the case where the shm_mq > buffer is full. In the above scenario (ie. PA-1 and PA-2 are executing > transactions concurrently), if the shm_mq buffer between LA and PA-2 > is full, LA has to wait to send messages, and this wait doesn't appear > in lmgr. To fix it, probably we have to use non-blocking write and > wait with a timeout. If timeout is exceeded, the LA will write to file > and indicate PA-2 that it needs to read file for remaining messages. > Then LA will start waiting for commit which will detect deadlock if > any. > > If we can detect deadlocks by having such a functionality or some > other way then we don't need to tighten the restrictions of subscribed > tables' schemas etc. Thanks for the analysis and summary ! I tried to implement the above idea and here is the patch set. I have done some basic tests for the new codes and it work fine. But I am going to test some conner cases to make sure all the codes work fine. I removed the old 0003 patch which was used to check the parallel apply safety because now we can detect the deadlock problem. Besides, there are few tasks left which I will handle soon and update the patch set: * Address previous comment from Amit[1], Shi-san[2] and Peter[3] (Already done but haven't merged them). * Rebase the original 0005 patch which is "retry to apply streaming xact only in leader apply worker". * Adjust some comments and documentation related to new codes. [1] https://www.postgresql.org/message-id/CAA4eK1Lsn%3D_gz1-3LqZ-wEDQDmChUsOX8LvHS8WV39wC1iRR%3DQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/OSZPR01MB631042582805A8E8615BC413FD329%40OSZPR01MB6310.jpnprd01.prod.outlook.com [3] https://www.postgresql.org/message-id/CAHut%2BPsJWHRoRzXtMrJ1RaxmkS2LkiMR_4S2pSionxXmYsyOww%40mail.gmail.com Best regards, Hou zj
Attachment
pgsql-hackers by date: