Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1+4cSdeF=04jcj5c9KhftR=Hnx1AhooVwvOvnvqR4RaLg@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Tue, Aug 9, 2022 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Aug 9, 2022 at 11:09 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > Some more comments
> >
> > +    /*
> > +     * Exit if any relation is not in the READY state and if any worker is
> > +     * handling the streaming transaction at the same time. Because for
> > +     * streaming transactions that is being applied in apply background
> > +     * worker, we cannot decide whether to apply the change for a relation
> > +     * that is not in the READY state (see should_apply_changes_for_rel) as we
> > +     * won't know remote_final_lsn by that time.
> > +     */
> > +    if (list_length(ApplyBgworkersFreeList) !=
> > list_length(ApplyBgworkersList) &&
> > +        !AllTablesyncsReady())
> > +    {
> > +        ereport(LOG,
> > +                (errmsg("logical replication apply workers for
> > subscription \"%s\" will restart",
> > +                        MySubscription->name),
> > +                 errdetail("Cannot handle streamed replication
> > transaction by apply "
> > +                           "background workers until all tables are
> > synchronized")));
> > +
> > +        proc_exit(0);
> > +    }
> >
> > How this situation can occur? I mean while starting a background
> > worker itself we can check whether all tables are sync ready or not
> > right?
> >
>
> We are already checking at the start in apply_bgworker_can_start() but
> I think it is required to check at the later point of time as well
> because the new rels can be added to pg_subscription_rel via Alter
> Subscription ... Refresh. I feel if that reasoning is correct then we
> can probably expand comments to make it clear.
>
> > +    /* Check the status of apply background worker if any. */
> > +    apply_bgworker_check_status();
> > +
> >
> > What is the need to checking each worker status on every commit?  I
> > mean if there are a lot of small transactions along with some
> > steamiing transactions
> > then it will affect the apply performance for those small transactions?
> >
>
> I don't think performance will be a concern because this won't do any
> costly operation unless invalidation happens in which case it will
> access system catalogs. However, if my above understanding is correct
> that new tables can be added during the apply process then not sure
> doing it at commit time is sufficient/correct because it can change
> even during the transaction.
>

One idea that may handle it cleanly is to check for
SUBREL_STATE_SYNCDONE state in should_apply_changes_for_rel() and
error out for apply_bg_worker(). For the SUBREL_STATE_READY state, it
should return true and for any other state, it can return false. The
one advantage of this approach could be that the parallel apply worker
will give an error only if the corresponding transaction has performed
any operation on the relation that has reached the SYNCDONE state.
OTOH, checking at each transaction end can also lead to erroring out
of workers even if the parallel apply transaction doesn't perform any
operation on the relation which is not in the READY state.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Next
From: John Naylor
Date:
Subject: Re: optimize lookups in snapshot [sub]xip arrays