Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAHut+PvkPQ5yugxWe4JhhqtxvW5HrhnRaTm=X3dKc8TZ4vZoAA@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Jan 13, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 13, 2021 at 11:18 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Mon, Jan 4, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > 7.
> > > @@ -905,7 +905,7 @@ replorigin_advance(RepOriginId node,
> > >   LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
> > >
> > >   /* Make sure it's not used by somebody else */
> > > - if (replication_state->acquired_by != 0)
> > > + if (replication_state->acquired_by != 0 &&
> > > replication_state->acquired_by != MyProcPid)
> > >   {
> > >
> > > I think you won't need this change if you do replorigin_advance before
> > > replorigin_session_setup in your patch.
> > >
> >
> > As you know the replorigin_session_setup sets the
> > replication_state->acquired_by to be the current PID. So without this
> > change the replorigin_advance rejects that same slot state thinking
> > that it is already active for a different process. Root problem is
> > that the same process/PID calling both functions would hang.
> >
>
> I think the hang happens only if we call unchanged replorigin_advance
> after session_setup API, right?
>
> > So this
> > patch change allows replorigin_advance code to be called by self.
> >
> > IIUC that acquired_by check condition is like a sanity check for the
> > originid passed. The patched code only does just like what the comment
> > says:
> > "/* Make sure it's not used by somebody else */"
> > Doesn't "somebody else" means "anyone but me" (i.e. anyone but MyProcPid).
> >
> > Also, “setup” of a thing generally comes before usage of that thing,
> > so won't it seem strange to do (like the suggestion) and deliberately
> > call the "setup" function 2nd instead of 1st?
> >
> > Can you please explain why is it better to do it the suggested way
> > (switch the calls around) than keep the patch code? Probably there is
> > a good reason but I am just not understanding it.
> >
>
> Because there is no requirement for origin_advance API to be called
> after session setup. Session setup is required to mark the node as
> replaying from a remote node, see [1] whereas origin_advance is used
> for setting up the initial location or setting a new location, see [2]
> (pg_replication_origin_advance).
>
> Now here, after creating the origin, we need to set up the initial
> location and it seems fine to call origin_advance before
> session_setup. In short, as such, I don't see any problem with your
> change in replorigin_advance but OTOH, I don't see the need for the
> same as well. So, let's try to avoid that change unless we can't do
> without it.
>
> Also, another thing is we need to take RowExclusiveLock on
> pg_replication_origin as written in comments atop replorigin_advance
> before calling it. See its usage in pg_replication_origin_advance.
> Also, write comments on why we need to use replorigin_advance here
> (... something, like we need to WAL log this for the purpose of
> recovery...).
>

Modified in latest patch [v15].

----
[v15] = https://www.postgresql.org/message-id/CAHut%2BPu3he2rOWjbXcNUO6z3aH2LYzW03KV%2BfiMWim49qW9etQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
Next
From: Masahiko Sawada
Date:
Subject: Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion