Re: [HACKERS] some review comments on logical rep code - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] some review comments on logical rep code
Date
Msg-id 20170417.193955.42456174.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] some review comments on logical rep code  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] some review comments on logical rep code  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > Hi,
> >
> > Though I've read only a part of the logical rep code yet, I'd like to
> > share some (relatively minor) review comments that I got so far.
> 
> It seems nobody is working on dealing with these review comments, so
> I've attached patches. Since there are lots of review comment I
> numbered each review comment. The prefix of patches I attached is
> corresponding to review comment number.
> 
> 1.
> >
> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> > value from the argument, instead of DatumGetObjectId().
> 
> Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

> logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
>              Oid relid)
> {
>     int          slot;
...
>     for (slot = 0; slot < max_logical_replication_workers; slot++)
...
>     bgw.bgw_main_arg = slot;



> 2.
> >
> > No one resets on_commit_launcher_wakeup flag to false.
> 
> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
> up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

> 3.
> >
> > ApplyLauncherWakeup() should be static function.
> 
> Attached 003 patch fixes it (and also fixes #5 review comment).
> 
> 4.
> >
> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> 
> This is also reported by Horiguchi-san on another thread[1], and patch exists.
> 
> 5.
> >
> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> 
> I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

> 6.
> >
> > Normal statements that the walsender for logical rep runs are logged
> > by log_replication_commands. So if log_statement is also set,
> > those commands are logged twice.
> 
> Attached 006 patch fixes it by adding  "-c log_statement = none" to
> connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

> 7.
> >
> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> > an error. The callback function to reset it needs to be registered
> > via on_shmem_exit().
> 
> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
> 
> 8.
> >
> > Typo: "an subscriber" should be "a subscriber" in some places.
> 
> It seems that there is no longer these typo.
> 
> 9.
> >     /* Get conninfo */
> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
> >         tup,
> >         Anum_pg_subscription_subconninfo,
> >         &isnull);
> >     Assert(!isnull);
> >
> > This assertion makes me think that pg_subscription.subconnifo should
> > have NOT NULL constraint. Also subslotname and subpublications
> > should have NOT NULL constraint because of the same reason.
> 
> Agreed. Attached 009 patch add NOT NULL constraint to all other
> columns of pg_subscription. pg_subscription.subsynccommit should have
> it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

> 10.
> >
> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> >     MyLogicalRepWorker->relstate =
> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
> >             MyLogicalRepWorker->relid,
> >             &MyLogicalRepWorker->relstate_lsn,
> >             false);
> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >
> > Non-"short-term" function like GetSubscriptionRelState() should not
> > be called while spinlock is being held.
> >
> 
> One option is adding new LWLock for the relation state change but this
> lock is used at many locations where the operation actually doesn't
> take time. I think that the discussion would be needed to get
> consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock.  (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)


> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: [HACKERS] Failed recovery with new faster 2PC code
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] pgbench - allow to store select results intovariables