Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAHut+PtN18hcM_jiD8PZb_j-CUZzNHgW8om37MsZqt_qGkRRpQ@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Jan 4, 2021 at 8:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 30, 2020 at 11:51 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Dec 23, 2020 at 8:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 1.
> > > + * Rarely, the DropSubscription may be issued when a tablesync still
> > > + * is in SYNCDONE but not yet in READY state. If this happens then
> > > + * the drop slot could fail because it is already dropped.
> > > + * In this case suppress and drop slot error.
> > > + *
> > > + * FIXME - Is there a better way than this?
> > > + */
> > > + if (rstate->state != SUBREL_STATE_SYNCDONE)
> > > + PG_RE_THROW();
> > >
> > > So, does this situation happens when we try to drop subscription after
> > > the state is changed to syncdone but not syncready. If so, then can't
> > > we write a function GetSubscriptionNotDoneRelations similar to
> > > GetSubscriptionNotReadyRelations where we get a list of relations that
> > > are not in done stage. I think this should be safe because once we are
> > > here we shouldn't be allowed to start a new worker and old workers are
> > > already stopped by this function.
> >
> > Yes, but I don't see how adding such a function is an improvement over
> > the existing code:
> >
>
> The advantage is that we don't need to use try..catch to deal with
> such conditions which I don't think is a good way to deal with such
> cases. Also, even after using try...catch, still, we can leak the
> slots because the patch drops the slot after changing the state to
> syncdone and if there is any error while dropping the slot, it simply
> skips it. So, it is possible that the rel state is syncdone but the
> slot still exists and we get an error due to some different reason,
> and then we will silently skip it again and allow the subscription to
> be dropped.
>
> I think instead what we should do is to drop the slot before we change
> the rel state to syncdone. Also, if the apply workers fail to drop the
> slot, it should try to again drop it after restart. In
> DropSubscription, we can then check if the rel state is not SYNC or
> READY, we can drop the corresponding slots.
>

Fixed as suggested in latest patch [v12]

----

[v12] = https://www.postgresql.org/message-id/CAHut%2BPsonJzarxSBWkOM%3DMjoEpaq53ShBJoTT9LHJskwP3OvZA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From:
Date:
Subject: RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.
Next
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?