Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | CAD21AoBQ39RWUdX-4Rm1AAxNiiYWYf-oX2CRmyXr4EHs4XKDQQ@mail.gmail.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Wed, May 4, 2022 at 12:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, May 3, 2022 at 9:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, May 2, 2022 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Mon, May 2, 2022 at 6:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Mon, May 2, 2022 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > Are you planning to support "Transaction dependency" Amit mentioned in > > > > > his first mail in this patch? IIUC since the background apply worker > > > > > applies the streamed changes as soon as receiving them from the main > > > > > apply worker, a conflict that doesn't happen in the current streaming > > > > > logical replication could happen. > > > > > > > > > > > > > This patch seems to be waiting for stream_stop to finish, so I don't > > > > see how the issues related to "Transaction dependency" can arise? What > > > > type of conflict/issues you have in mind? > > > > > > Suppose we set both publisher and subscriber: > > > > > > On publisher: > > > create table test (i int); > > > insert into test values (0); > > > create publication test_pub for table test; > > > > > > On subscriber: > > > create table test (i int primary key); > > > create subscription test_sub connection '...' publication test_pub; -- > > > value 0 is replicated via initial sync > > > > > > Now, both 'test' tables have value 0. > > > > > > And suppose two concurrent transactions are executed on the publisher > > > in following order: > > > > > > TX-1: > > > begin; > > > insert into test select generate_series(0, 10000); -- changes will be streamed; > > > > > > TX-2: > > > begin; > > > delete from test where c = 0; > > > commit; > > > > > > TX-1: > > > commit; > > > > > > With the current streaming logical replication, these changes will be > > > applied successfully since the deletion is applied before the > > > (streamed) insertion. Whereas with the apply bgworker, it fails due to > > > an unique constraint violation since the insertion is applied first. > > > I've confirmed that it happens with v5 patch. > > > > > > > Good point but I am not completely sure if doing transaction > > dependency tracking for such cases is really worth it. I feel for such > > concurrent cases users can anyway now also get conflicts, it is just a > > matter of timing. One more thing to check transaction dependency, we > > might need to spill the data for streaming transactions in which case > > we might lose all the benefits of doing it via a background worker. Do > > we see any simple way to avoid this? > > I agree that it is just a matter of timing. I think new issues that haven't happened on the current streaming logical replication depending on the timing could happen with this feature and vice versa. > > I think the other kind of problem that can happen here is delete > followed by an insert. If in the example provided by you, TX-1 > performs delete (say it is large enough to cause streaming) and TX-2 > performs insert then I think it will block the apply worker because > insert will start waiting infinitely. Currently, I think it will lead > to conflict due to insert but that is still solvable by allowing users > to remove conflicting rows. > > It seems both these problems are due to the reason that the table on > publisher and subscriber has different constraints otherwise, we would > have seen the same behavior on the publisher as well. > > There could be a few ways to avoid these and similar problems: > a. detect the difference in constraints between publisher and > subscribers like primary key and probably others (like whether there > is any volatile function present in index expression) when applying > the change and then we give ERROR to the user that she must change the > streaming mode to 'spill' instead of 'apply' (aka parallel apply). > b. Same as (a) but instead of ERROR just LOG this information and > change the mode to spill for the transactions that operate on that > particular relation. Given that it doesn't introduce a new kind of problem I don't think we need special treatment for that at least in this feature. If we want such modes we can discuss it separately. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: