Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1LYWZ+reJ-jSz7naM6GwdS2VqydmJaHmL_mpjbtMC9vgg@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, May 2, 2022 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, May 2, 2022 at 6:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 2, 2022 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > Are you planning to support "Transaction dependency" Amit mentioned in
> > > his first mail in this patch? IIUC since the background apply worker
> > > applies the streamed changes as soon as receiving them from the main
> > > apply worker, a conflict that doesn't happen in the current streaming
> > > logical replication could happen.
> > >
> >
> > This patch seems to be waiting for stream_stop to finish, so I don't
> > see how the issues related to "Transaction dependency" can arise? What
> > type of conflict/issues you have in mind?
>
> Suppose we set both publisher and subscriber:
>
> On publisher:
> create table test (i int);
> insert into test values (0);
> create publication test_pub for table test;
>
> On subscriber:
> create table test (i int primary key);
> create subscription test_sub connection '...' publication test_pub; --
> value 0 is replicated via initial sync
>
> Now, both 'test' tables have value 0.
>
> And suppose two concurrent transactions are executed on the publisher
> in following order:
>
> TX-1:
> begin;
> insert into test select generate_series(0, 10000); -- changes will be streamed;
>
>     TX-2:
>     begin;
>     delete from test where c = 0;
>     commit;
>
> TX-1:
> commit;
>
> With the current streaming logical replication, these changes will be
> applied successfully since the deletion is applied before the
> (streamed) insertion. Whereas with the apply bgworker, it fails due to
> an unique constraint violation since the insertion is applied first.
> I've confirmed that it happens with v5 patch.
>

Good point but I am not completely sure if doing transaction
dependency tracking for such cases is really worth it. I feel for such
concurrent cases users can anyway now also get conflicts, it is just a
matter of timing. One more thing to check transaction dependency, we
might need to spill the data for streaming transactions in which case
we might lose all the benefits of doing it via a background worker. Do
we see any simple way to avoid this?


-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: mylodon's failures in the back branches
Next
From: Pavel Stehule
Date:
Subject: Re: strange slow query - lost lot of time somewhere