Re: Deadlock between logrep apply worker and tablesync worker - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Deadlock between logrep apply worker and tablesync worker
Date
Msg-id CAA4eK1Lf3QAq1NmmhkX1OYjT8HLtpaz0XmX=+CLdpipyEYgwYw@mail.gmail.com
Whole thread Raw
In response to RE: Deadlock between logrep apply worker and tablesync worker  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses Re: Deadlock between logrep apply worker and tablesync worker
List pgsql-hackers
On Sat, Jan 28, 2023 at 9:36 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, January 27, 2023 8:16 PM Amit Kapila <amit.kapila16@gmail.com>
> >
> > On Fri, Jan 27, 2023 at 3:45 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Mon, 23 Jan 2023 at 10:52, Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > > > IIRC, this is done to prevent concurrent drops of origin drop say by
> > > > exposed API pg_replication_origin_drop(). See the discussion in [1]
> > > > related to it. If we want we can optimize it so that we can acquire
> > > > the lock on the specific origin as mentioned in comments
> > > > replorigin_drop_by_name() but it was not clear that this operation
> > > > would be frequent enough.
> > >
> > > Here is an attached patch to lock the replication origin record using
> > > LockSharedObject instead of locking pg_replication_origin relation in
> > > ExclusiveLock mode. Now tablesync worker will wait only if the
> > > tablesync worker is trying to drop the same replication origin which
> > > has already been dropped by the apply worker, the other tablesync
> > > workers will be able to successfully drop the replication origin
> > > without any wait.
> > >
> >
> > There is a code in the function replorigin_drop_guts() that uses the
> > functionality introduced by replorigin_exists(). Can we reuse this function for
> > the same?
>
> Maybe we can use SearchSysCacheExists1 to check the existence instead of
> adding a new function.
>

Yeah, I think that would be better.

> One comment about the patch.
>
> @@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
> ...
> +       /* Drop the replication origin if it has not been dropped already */
> +       if (replorigin_exists(roident))
>                 replorigin_drop_guts(rel, roident, nowait);
>
> If developer pass missing_ok as false, should we report an ERROR here
> instead of silently return ?
>

One thing that looks a bit odd is that we will anyway have a similar
check in replorigin_drop_guts() which is a static function and called
from only one place, so, will it be required to check at both places?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Next
From: Nathan Bossart
Date:
Subject: Re: recovery modules