Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAFiTN-taQYnSn_p040o1k=GWDytMrX+dGU_D5uU1ZmAf3VdZsQ@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Tue, Jul 26, 2022 at 2:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Jul 22, 2022 at 8:27 AM wangw.fnst@fujitsu.com
> <wangw.fnst@fujitsu.com> wrote:
> >
> > On Tues, Jul 19, 2022 at 10:29 AM I wrote:
> > > Attach the news patches.
> >
> > Not able to apply patches cleanly because the change in HEAD (366283961a).
> > Therefore, I rebased the patch based on the changes in HEAD.
> >
> > Attach the new patches.
>
> +    /* Check the foreign keys. */
> +    fkeys = RelationGetFKeyList(entry->localrel);
> +    if (fkeys)
> +        entry->parallel_apply = PARALLEL_APPLY_UNSAFE;
>
> So if there is a foreign key on any of the tables which are parts of a
> subscription then we do not allow changes for that subscription to be
> applied in parallel?  I think this is a big limitation because having
> foreign key on the table is very normal right?  I agree that if we
> allow them then there could be failure due to out of order apply
> right? but IMHO we should not put the restriction instead let it fail
> if there is ever such conflict.  Because if there is a conflict the
> transaction will be sent again.  Do we see that there could be wrong
> or inconsistent results if we allow such things to be executed in
> parallel.  If not then IMHO just to avoid some corner case failure we
> are restricting very normal cases.

some more comments..
1.
+            /*
+             * If we have found a free worker or if we are already
applying this
+             * transaction in an apply background worker, then we
pass the data to
+             * that worker.
+             */
+            if (first_segment)
+                apply_bgworker_send_data(stream_apply_worker, s->len, s->data);

Comment says that if we have found a free worker or we are already
applying in the worker then pass the changes to the worker
but actually as per the code here we are only passing in case of first_segment?

I think what you are trying to say is that if it is first segment then send the

2.
+        /*
+         * This is the main apply worker. Check if there is any free apply
+         * background worker we can use to process this transaction.
+         */
+        if (first_segment)
+            stream_apply_worker = apply_bgworker_start(stream_xid);
+        else
+            stream_apply_worker = apply_bgworker_find(stream_xid);

So currently, whenever we get a new streamed transaction we try to
start a new background worker for that.  Why do we need to start/close
the background apply worker every time we get a new streamed
transaction.  I mean we can keep the worker in the pool for time being
and if there is a new transaction looking for a worker then we can
find from that.  Starting a worker is costly operation and since we
are using parallelism for this mean we are expecting that there would
be frequent streamed transaction needing parallel apply worker so why
not to let it wait for a certain amount of time so that if load is low
it will anyway stop and if the load is high it will be reused for next
streamed transaction.


3.
Why are we restricting parallel apply workers only for the streamed
transactions, because streaming depends upon the size of the logical
decoding work mem so making steaming and parallel apply tightly
coupled seems too restrictive to me.  Do we see some obvious problems
in applying other transactions in parallel?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Refactoring postgres_fdw/connection.c
Next
From: Simon Riggs
Date:
Subject: Max compact as an FSM strategy