Re: Deadlock between logrep apply worker and tablesync worker - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Deadlock between logrep apply worker and tablesync worker |
Date | |
Msg-id | CALDaNm2yFD3yC4G+1WXdywmf=iEYCw-zqmBtJC93Yj1LoQY99A@mail.gmail.com Whole thread Raw |
In response to | Re: Deadlock between logrep apply worker and tablesync worker (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Deadlock between logrep apply worker and tablesync worker
|
List | pgsql-hackers |
On Sat, 28 Jan 2023 at 11:26, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 28, 2023 at 9:36 AM houzj.fnst@fujitsu.com > <houzj.fnst@fujitsu.com> wrote: > > > > On Friday, January 27, 2023 8:16 PM Amit Kapila <amit.kapila16@gmail.com> > > > > > > On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com> > > > wrote: > > > > > > > > > > IIRC, this is done to prevent concurrent drops of origin drop say by > > > > > exposed API pg_replication_origin_drop(). See the discussion in [1] > > > > > related to it. If we want we can optimize it so that we can acquire > > > > > the lock on the specific origin as mentioned in comments > > > > > replorigin_drop_by_name() but it was not clear that this operation > > > > > would be frequent enough. > > > > > > > > Here is an attached patch to lock the replication origin record using > > > > LockSharedObject instead of locking pg_replication_origin relation in > > > > ExclusiveLock mode. Now tablesync worker will wait only if the > > > > tablesync worker is trying to drop the same replication origin which > > > > has already been dropped by the apply worker, the other tablesync > > > > workers will be able to successfully drop the replication origin > > > > without any wait. > > > > > > > > > > There is a code in the function replorigin_drop_guts() that uses the > > > functionality introduced by replorigin_exists(). Can we reuse this function for > > > the same? > > > > Maybe we can use SearchSysCacheExists1 to check the existence instead of > > adding a new function. > > > > Yeah, I think that would be better. > > > One comment about the patch. > > > > @@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait) > > ... > > + /* Drop the replication origin if it has not been dropped already */ > > + if (replorigin_exists(roident)) > > replorigin_drop_guts(rel, roident, nowait); > > > > If developer pass missing_ok as false, should we report an ERROR here > > instead of silently return ? > > > > One thing that looks a bit odd is that we will anyway have a similar > check in replorigin_drop_guts() which is a static function and called > from only one place, so, will it be required to check at both places? There is a possibility that the initial check to verify if replication origin exists in replorigin_drop_by_name was successful but later one of either table sync worker or apply worker process might have dropped the replication origin, so it is better to check again before calling replorigin_drop_guts, ideally the tuple should be valid in replorigin_drop_guts, but can we keep the check as it is to maintain the consistency before calling CatalogTupleDelete. Regards, Vignesh
pgsql-hackers by date: