Thread: [HACKERS] some review comments on logical rep code

[HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
Hi,

Though I've read only a part of the logical rep code yet, I'd like to
share some (relatively minor) review comments that I got so far.

In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
value from the argument, instead of DatumGetObjectId().

No one resets on_commit_launcher_wakeup flag to false.

ApplyLauncherWakeup() should be static function.

Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
subcommands (e.g., ENABLED one) should wake the launcher up on commit.

Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

Normal statements that the walsender for logical rep runs are logged
by log_replication_commands. So if log_statement is also set,
those commands are logged twice.

LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
an error. The callback function to reset it needs to be registered
via on_shmem_exit().

Typo: "an subscriber" should be "a subscriber" in some places.
   /* Get conninfo */   datum = SysCacheGetAttr(SUBSCRIPTIONOID,       tup,       Anum_pg_subscription_subconninfo,
 &isnull);   Assert(!isnull);
 

This assertion makes me think that pg_subscription.subconnifo should
have NOT NULL constraint. Also subslotname and subpublications
should have NOT NULL constraint because of the same reason.
   SpinLockAcquire(&MyLogicalRepWorker->relmutex);   MyLogicalRepWorker->relstate =
GetSubscriptionRelState(MyLogicalRepWorker->subid,          MyLogicalRepWorker->relid,
&MyLogicalRepWorker->relstate_lsn,          false);   SpinLockRelease(&MyLogicalRepWorker->relmutex);
 

Non-"short-term" function like GetSubscriptionRelState() should not
be called while spinlock is being held.

Regards,

-- 
Fujii Masao



Re: [HACKERS] some review comments on logical rep code

From
Noah Misch
Date:
On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:
> Though I've read only a part of the logical rep code yet, I'd like to
> share some (relatively minor) review comments that I got so far.
> 
> In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> value from the argument, instead of DatumGetObjectId().
> 
> No one resets on_commit_launcher_wakeup flag to false.
> 
> ApplyLauncherWakeup() should be static function.
> 
> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> 
> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> 
> Normal statements that the walsender for logical rep runs are logged
> by log_replication_commands. So if log_statement is also set,
> those commands are logged twice.
> 
> LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> an error. The callback function to reset it needs to be registered
> via on_shmem_exit().
> 
> Typo: "an subscriber" should be "a subscriber" in some places.
> 
>     /* Get conninfo */
>     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
>         tup,
>         Anum_pg_subscription_subconninfo,
>         &isnull);
>     Assert(!isnull);
> 
> This assertion makes me think that pg_subscription.subconnifo should
> have NOT NULL constraint. Also subslotname and subpublications
> should have NOT NULL constraint because of the same reason.
> 
>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>     MyLogicalRepWorker->relstate =
>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>             MyLogicalRepWorker->relid,
>             &MyLogicalRepWorker->relstate_lsn,
>             false);
>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
> 
> Non-"short-term" function like GetSubscriptionRelState() should not
> be called while spinlock is being held.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi,
>
> Though I've read only a part of the logical rep code yet, I'd like to
> share some (relatively minor) review comments that I got so far.

It seems nobody is working on dealing with these review comments, so
I've attached patches. Since there are lots of review comment I
numbered each review comment. The prefix of patches I attached is
corresponding to review comment number.

1.
>
> In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> value from the argument, instead of DatumGetObjectId().

Attached 001 patch fixes it.

2.
>
> No one resets on_commit_launcher_wakeup flag to false.

Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
up the launcher process.

3.
>
> ApplyLauncherWakeup() should be static function.

Attached 003 patch fixes it (and also fixes #5 review comment).

4.
>
> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> subcommands (e.g., ENABLED one) should wake the launcher up on commit.

This is also reported by Horiguchi-san on another thread[1], and patch exists.

5.
>
> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?

I also guess it's necessary. This change is included in #3 patch.

6.
>
> Normal statements that the walsender for logical rep runs are logged
> by log_replication_commands. So if log_statement is also set,
> those commands are logged twice.

Attached 006 patch fixes it by adding  "-c log_statement = none" to
connection parameter of libpqrcv if logical = true.

7.
>
> LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> an error. The callback function to reset it needs to be registered
> via on_shmem_exit().

Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.

8.
>
> Typo: "an subscriber" should be "a subscriber" in some places.

It seems that there is no longer these typo.

9.
>     /* Get conninfo */
>     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
>         tup,
>         Anum_pg_subscription_subconninfo,
>         &isnull);
>     Assert(!isnull);
>
> This assertion makes me think that pg_subscription.subconnifo should
> have NOT NULL constraint. Also subslotname and subpublications
> should have NOT NULL constraint because of the same reason.

Agreed. Attached 009 patch add NOT NULL constraint to all other
columns of pg_subscription. pg_subscription.subsynccommit should have
it as well.

10.
>
>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>     MyLogicalRepWorker->relstate =
>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>             MyLogicalRepWorker->relid,
>             &MyLogicalRepWorker->relstate_lsn,
>             false);
>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>
> Non-"short-term" function like GetSubscriptionRelState() should not
> be called while spinlock is being held.
>

One option is adding new LWLock for the relation state change but this
lock is used at many locations where the operation actually doesn't
take time. I think that the discussion would be needed to get
consensus, so patch for it is not attached.

[1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > Hi,
> >
> > Though I've read only a part of the logical rep code yet, I'd like to
> > share some (relatively minor) review comments that I got so far.
> 
> It seems nobody is working on dealing with these review comments, so
> I've attached patches. Since there are lots of review comment I
> numbered each review comment. The prefix of patches I attached is
> corresponding to review comment number.
> 
> 1.
> >
> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> > value from the argument, instead of DatumGetObjectId().
> 
> Attached 001 patch fixes it.

Hmm. I looked at the launcher side and found that it assigns bare
integer to bgw_main_arg. It also should use Int32GetDatum.

> logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
>              Oid relid)
> {
>     int          slot;
...
>     for (slot = 0; slot < max_logical_replication_workers; slot++)
...
>     bgw.bgw_main_arg = slot;



> 2.
> >
> > No one resets on_commit_launcher_wakeup flag to false.
> 
> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
> up the launcher process.

It is omitting the abort case. Maybe it should be
AtEOXact_ApplyLauncher(bool commit)?

> 3.
> >
> > ApplyLauncherWakeup() should be static function.
> 
> Attached 003 patch fixes it (and also fixes #5 review comment).
> 
> 4.
> >
> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> 
> This is also reported by Horiguchi-san on another thread[1], and patch exists.
> 
> 5.
> >
> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> 
> I also guess it's necessary. This change is included in #3 patch.

IsBackendPid() is not free, I suppose that just ignoring failure
with ESRCH is enough.

> 6.
> >
> > Normal statements that the walsender for logical rep runs are logged
> > by log_replication_commands. So if log_statement is also set,
> > those commands are logged twice.
> 
> Attached 006 patch fixes it by adding  "-c log_statement = none" to
> connection parameter of libpqrcv if logical = true.

The control by log_replication_commands is performed on
walsender, so this also shuld be done on the same place. Anyway
log_statement is irrelevant to walsender.

> 7.
> >
> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> > an error. The callback function to reset it needs to be registered
> > via on_shmem_exit().
> 
> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
> 
> 8.
> >
> > Typo: "an subscriber" should be "a subscriber" in some places.
> 
> It seems that there is no longer these typo.
> 
> 9.
> >     /* Get conninfo */
> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
> >         tup,
> >         Anum_pg_subscription_subconninfo,
> >         &isnull);
> >     Assert(!isnull);
> >
> > This assertion makes me think that pg_subscription.subconnifo should
> > have NOT NULL constraint. Also subslotname and subpublications
> > should have NOT NULL constraint because of the same reason.
> 
> Agreed. Attached 009 patch add NOT NULL constraint to all other
> columns of pg_subscription. pg_subscription.subsynccommit should have
> it as well.

I'm not sure the policy here, but I suppose that the assertions
there are still required irrelevantly from the nun-nullness of
the attribute.

> 10.
> >
> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> >     MyLogicalRepWorker->relstate =
> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
> >             MyLogicalRepWorker->relid,
> >             &MyLogicalRepWorker->relstate_lsn,
> >             false);
> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >
> > Non-"short-term" function like GetSubscriptionRelState() should not
> > be called while spinlock is being held.
> >
> 
> One option is adding new LWLock for the relation state change but this
> lock is used at many locations where the operation actually doesn't
> take time. I think that the discussion would be needed to get
> consensus, so patch for it is not attached.

From the point of view of time, I agree that it doesn't seem to
harm. Bt I thing it as more significant problem that
potentially-throwable function is called within the mutex
region. It potentially causes a kind of dead lock.  (That said,
theoretically dosn't occur and I'm not sure what happens by the
dead lock..)


> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> > Hi,
>> >
>> > Though I've read only a part of the logical rep code yet, I'd like to
>> > share some (relatively minor) review comments that I got so far.
>>
>> It seems nobody is working on dealing with these review comments, so
>> I've attached patches. Since there are lots of review comment I
>> numbered each review comment. The prefix of patches I attached is
>> corresponding to review comment number.
>>

Thank you for reviewing.

>> 1.
>> >
>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>> > value from the argument, instead of DatumGetObjectId().
>>
>> Attached 001 patch fixes it.
>
> Hmm. I looked at the launcher side and found that it assigns bare
> integer to bgw_main_arg. It also should use Int32GetDatum.

Yeah, I agreed. Will fix it.

>
>> logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
>>              Oid relid)
>> {
>>     int          slot;
> ...
>>     for (slot = 0; slot < max_logical_replication_workers; slot++)
> ...
>>     bgw.bgw_main_arg = slot;
>
>
>
>> 2.
>> >
>> > No one resets on_commit_launcher_wakeup flag to false.
>>
>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>> up the launcher process.
>
> It is omitting the abort case. Maybe it should be
> AtEOXact_ApplyLauncher(bool commit)?

Should we wake up the launcher process even when abort?

>
>> 3.
>> >
>> > ApplyLauncherWakeup() should be static function.
>>
>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>
>> 4.
>> >
>> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>
>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>
>> 5.
>> >
>> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>
>> I also guess it's necessary. This change is included in #3 patch.
>
> IsBackendPid() is not free, I suppose that just ignoring failure
> with ESRCH is enough.

After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
check if launcher_pid != 0?

>
>> 6.
>> >
>> > Normal statements that the walsender for logical rep runs are logged
>> > by log_replication_commands. So if log_statement is also set,
>> > those commands are logged twice.
>>
>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>> connection parameter of libpqrcv if logical = true.
>
> The control by log_replication_commands is performed on
> walsender, so this also shuld be done on the same place. Anyway
> log_statement is irrelevant to walsender.

Patch 006 emits all logs executed by the apply worker as a log of
log_replication_command but perhaps you're right. It would be better
to leave the log of log_statement if the command executed by the apply
worker is a SQL command. Will fix.

>
>> 7.
>> >
>> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
>> > an error. The callback function to reset it needs to be registered
>> > via on_shmem_exit().
>>
>> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
>>
>> 8.
>> >
>> > Typo: "an subscriber" should be "a subscriber" in some places.
>>
>> It seems that there is no longer these typo.
>>
>> 9.
>> >     /* Get conninfo */
>> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
>> >         tup,
>> >         Anum_pg_subscription_subconninfo,
>> >         &isnull);
>> >     Assert(!isnull);
>> >
>> > This assertion makes me think that pg_subscription.subconnifo should
>> > have NOT NULL constraint. Also subslotname and subpublications
>> > should have NOT NULL constraint because of the same reason.
>>
>> Agreed. Attached 009 patch add NOT NULL constraint to all other
>> columns of pg_subscription. pg_subscription.subsynccommit should have
>> it as well.
>
> I'm not sure the policy here, but I suppose that the assertions
> there are still required irrelevantly from the nun-nullness of
> the attribute.

You're right. Will fix it.

>> 10.
>> >
>> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>> >     MyLogicalRepWorker->relstate =
>> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>> >             MyLogicalRepWorker->relid,
>> >             &MyLogicalRepWorker->relstate_lsn,
>> >             false);
>> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>> >
>> > Non-"short-term" function like GetSubscriptionRelState() should not
>> > be called while spinlock is being held.
>> >
>>
>> One option is adding new LWLock for the relation state change but this
>> lock is used at many locations where the operation actually doesn't
>> take time. I think that the discussion would be needed to get
>> consensus, so patch for it is not attached.
>
> From the point of view of time, I agree that it doesn't seem to
> harm. Bt I thing it as more significant problem that
> potentially-throwable function is called within the mutex
> region. It potentially causes a kind of dead lock.  (That said,
> theoretically dosn't occur and I'm not sure what happens by the
> dead lock..)
>
>
>> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > Though I've read only a part of the logical rep code yet, I'd like to
>>> > share some (relatively minor) review comments that I got so far.
>>>
>>> It seems nobody is working on dealing with these review comments, so
>>> I've attached patches. Since there are lots of review comment I
>>> numbered each review comment. The prefix of patches I attached is
>>> corresponding to review comment number.
>>>
>
> Thank you for reviewing.
>
>>> 1.
>>> >
>>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>> > value from the argument, instead of DatumGetObjectId().
>>>
>>> Attached 001 patch fixes it.
>>
>> Hmm. I looked at the launcher side and found that it assigns bare
>> integer to bgw_main_arg. It also should use Int32GetDatum.
>
> Yeah, I agreed. Will fix it.
>
>>
>>> logicalrep_worker_launch(Oid dbid, Oid subid, const char *subname, Oid userid,
>>>              Oid relid)
>>> {
>>>     int          slot;
>> ...
>>>     for (slot = 0; slot < max_logical_replication_workers; slot++)
>> ...
>>>     bgw.bgw_main_arg = slot;
>>
>>
>>
>>> 2.
>>> >
>>> > No one resets on_commit_launcher_wakeup flag to false.
>>>
>>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>>> up the launcher process.
>>
>> It is omitting the abort case. Maybe it should be
>> AtEOXact_ApplyLauncher(bool commit)?
>
> Should we wake up the launcher process even when abort?
>
>>
>>> 3.
>>> >
>>> > ApplyLauncherWakeup() should be static function.
>>>
>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>
>>> 4.
>>> >
>>> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>
>>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>>
>>> 5.
>>> >
>>> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>
>>> I also guess it's necessary. This change is included in #3 patch.
>>
>> IsBackendPid() is not free, I suppose that just ignoring failure
>> with ESRCH is enough.
>
> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
> check if launcher_pid != 0?
>
>>
>>> 6.
>>> >
>>> > Normal statements that the walsender for logical rep runs are logged
>>> > by log_replication_commands. So if log_statement is also set,
>>> > those commands are logged twice.
>>>
>>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>>> connection parameter of libpqrcv if logical = true.
>>
>> The control by log_replication_commands is performed on
>> walsender, so this also shuld be done on the same place. Anyway
>> log_statement is irrelevant to walsender.
>
> Patch 006 emits all logs executed by the apply worker as a log of
> log_replication_command but perhaps you're right. It would be better
> to leave the log of log_statement if the command executed by the apply
> worker is a SQL command. Will fix.
>
>>
>>> 7.
>>> >
>>> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
>>> > an error. The callback function to reset it needs to be registered
>>> > via on_shmem_exit().
>>>
>>> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
>>>
>>> 8.
>>> >
>>> > Typo: "an subscriber" should be "a subscriber" in some places.
>>>
>>> It seems that there is no longer these typo.
>>>
>>> 9.
>>> >     /* Get conninfo */
>>> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
>>> >         tup,
>>> >         Anum_pg_subscription_subconninfo,
>>> >         &isnull);
>>> >     Assert(!isnull);
>>> >
>>> > This assertion makes me think that pg_subscription.subconnifo should
>>> > have NOT NULL constraint. Also subslotname and subpublications
>>> > should have NOT NULL constraint because of the same reason.
>>>
>>> Agreed. Attached 009 patch add NOT NULL constraint to all other
>>> columns of pg_subscription. pg_subscription.subsynccommit should have
>>> it as well.
>>
>> I'm not sure the policy here, but I suppose that the assertions
>> there are still required irrelevantly from the nun-nullness of
>> the attribute.
>
> You're right. Will fix it.
>
>>> 10.
>>> >
>>> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>> >     MyLogicalRepWorker->relstate =
>>> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>> >             MyLogicalRepWorker->relid,
>>> >             &MyLogicalRepWorker->relstate_lsn,
>>> >             false);
>>> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>> >
>>> > Non-"short-term" function like GetSubscriptionRelState() should not
>>> > be called while spinlock is being held.
>>> >
>>>
>>> One option is adding new LWLock for the relation state change but this
>>> lock is used at many locations where the operation actually doesn't
>>> take time. I think that the discussion would be needed to get
>>> consensus, so patch for it is not attached.
>>
>> From the point of view of time, I agree that it doesn't seem to
>> harm. Bt I thing it as more significant problem that
>> potentially-throwable function is called within the mutex
>> region. It potentially causes a kind of dead lock.  (That said,
>> theoretically dosn't occur and I'm not sure what happens by the
>> dead lock..)
>>
>>
>>> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp
>>
>> --
>> Kyotaro Horiguchi
>> NTT Open Source Software Center
>>
>
>

I've attached latest v2 three patches; 001, 006 and 009. The review
comments I got so far are incorporated.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> >>> 1.
> >>> >
> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> >>> > value from the argument, instead of DatumGetObjectId().
> >>>
> >>> Attached 001 patch fixes it.
> >>
> >> Hmm. I looked at the launcher side and found that it assigns bare
> >> integer to bgw_main_arg. It also should use Int32GetDatum.
> >
> > Yeah, I agreed. Will fix it.

Thanks.

> >>> 2.
> >>> >
> >>> > No one resets on_commit_launcher_wakeup flag to false.
> >>>
> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
> >>> up the launcher process.
> >>
> >> It is omitting the abort case. Maybe it should be
> >> AtEOXact_ApplyLauncher(bool commit)?
> >
> > Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

> >>> 3.
> >>> >
> >>> > ApplyLauncherWakeup() should be static function.
> >>>
> >>> Attached 003 patch fixes it (and also fixes #5 review comment).
> >>>
> >>> 4.
> >>> >
> >>> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> >>> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> >>>
> >>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
> >>>
> >>> 5.
> >>> >
> >>> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> >>>
> >>> I also guess it's necessary. This change is included in #3 patch.
> >>
> >> IsBackendPid() is not free, I suppose that just ignoring failure
> >> with ESRCH is enough.
> >
> > After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
> > check if launcher_pid != 0?

Yes. For example, code to send a signal to autovacuum launcher
looks as the follows. I don't think there's a reason to do
different thing here.

|       avlauncher_needs_signal = false;
|       if (AutoVacPID != 0)
|         kill(AutoVacPID, SIGUSR2);


> >>> 6.
> >>> >
> >>> > Normal statements that the walsender for logical rep runs are logged
> >>> > by log_replication_commands. So if log_statement is also set,
> >>> > those commands are logged twice.
> >>>
> >>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
> >>> connection parameter of libpqrcv if logical = true.
> >>
> >> The control by log_replication_commands is performed on
> >> walsender, so this also shuld be done on the same place. Anyway
> >> log_statement is irrelevant to walsender.
> >
> > Patch 006 emits all logs executed by the apply worker as a log of
> > log_replication_command but perhaps you're right. It would be better
> > to leave the log of log_statement if the command executed by the apply
> > worker is a SQL command. Will fix.

Here, we can choose whether a SQL command is a part of
replication commands or not. The previous fix thought it is and
this doesn't. (They are emitted in different forms) Is this ok?

I'm not sure why you have moved the location of the code, but if
it should show all received commands, it might be better to be
placed before CHECK_FOR_INTERRUPTS..

The comment for the code seems should be more clearly.

- * compatibility. To prevent the command is logged doubly, we doesn't
- * log it when the command is a SQL command.
+ * compatibility. SQL command are logged later according
+ * to log_statement setting.

> >>> 7.
> >>> >
> >>> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> >>> > an error. The callback function to reset it needs to be registered
> >>> > via on_shmem_exit().
> >>>
> >>> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
> >>>
> >>> 8.
> >>> >
> >>> > Typo: "an subscriber" should be "a subscriber" in some places.
> >>>
> >>> It seems that there is no longer these typo.
> >>>
> >>> 9.
> >>> >     /* Get conninfo */
> >>> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
> >>> >         tup,
> >>> >         Anum_pg_subscription_subconninfo,
> >>> >         &isnull);
> >>> >     Assert(!isnull);
> >>> >
> >>> > This assertion makes me think that pg_subscription.subconnifo should
> >>> > have NOT NULL constraint. Also subslotname and subpublications
> >>> > should have NOT NULL constraint because of the same reason.
> >>>
> >>> Agreed. Attached 009 patch add NOT NULL constraint to all other
> >>> columns of pg_subscription. pg_subscription.subsynccommit should have
> >>> it as well.
> >>
> >> I'm not sure the policy here, but I suppose that the assertions
> >> there are still required irrelevantly from the nun-nullness of
> >> the attribute.
> >
> > You're right. Will fix it.
> >
> >>> 10.
> >>> >
> >>> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> >>> >     MyLogicalRepWorker->relstate =
> >>> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
> >>> >             MyLogicalRepWorker->relid,
> >>> >             &MyLogicalRepWorker->relstate_lsn,
> >>> >             false);
> >>> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >>> >
> >>> > Non-"short-term" function like GetSubscriptionRelState() should not
> >>> > be called while spinlock is being held.
> >>> >
> >>>
> >>> One option is adding new LWLock for the relation state change but this
> >>> lock is used at many locations where the operation actually doesn't
> >>> take time. I think that the discussion would be needed to get
> >>> consensus, so patch for it is not attached.
> >>
> >> From the point of view of time, I agree that it doesn't seem to
> >> harm. Bt I thing it as more significant problem that
> >> potentially-throwable function is called within the mutex
> >> region. It potentially causes a kind of dead lock.  (That said,
> >> theoretically dosn't occur and I'm not sure what happens by the
> >> dead lock..)
> >>
> >>
> >>> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp
> >>
> >> --
> >> Kyotaro Horiguchi
> >> NTT Open Source Software Center
> >>
> >
> >
> 
> I've attached latest v2 three patches; 001, 006 and 009. The review
> comments I got so far are incorporated.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hi,
>
> Thank you for the revised version.
>
> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
>> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> >>> 1.
>> >>> >
>> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>> >>> > value from the argument, instead of DatumGetObjectId().
>> >>>
>> >>> Attached 001 patch fixes it.
>> >>
>> >> Hmm. I looked at the launcher side and found that it assigns bare
>> >> integer to bgw_main_arg. It also should use Int32GetDatum.
>> >
>> > Yeah, I agreed. Will fix it.
>
> Thanks.
>
>> >>> 2.
>> >>> >
>> >>> > No one resets on_commit_launcher_wakeup flag to false.
>> >>>
>> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>> >>> up the launcher process.
>> >>
>> >> It is omitting the abort case. Maybe it should be
>> >> AtEOXact_ApplyLauncher(bool commit)?
>> >
>> > Should we wake up the launcher process even when abort?
>
> No, I meant that on_commit_launcher_wakeup should be just reset
> without waking up launcher on abort.

I understood. Sounds reasonable.

>
>> >>> 3.
>> >>> >
>> >>> > ApplyLauncherWakeup() should be static function.
>> >>>
>> >>> Attached 003 patch fixes it (and also fixes #5 review comment).
>> >>>
>> >>> 4.
>> >>> >
>> >>> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>> >>> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>> >>>
>> >>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>> >>>
>> >>> 5.
>> >>> >
>> >>> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>> >>>
>> >>> I also guess it's necessary. This change is included in #3 patch.
>> >>
>> >> IsBackendPid() is not free, I suppose that just ignoring failure
>> >> with ESRCH is enough.
>> >
>> > After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
>> > check if launcher_pid != 0?
>
> Yes. For example, code to send a signal to autovacuum launcher
> looks as the follows. I don't think there's a reason to do
> different thing here.
>
> |       avlauncher_needs_signal = false;
> |       if (AutoVacPID != 0)
> |         kill(AutoVacPID, SIGUSR2);
>

Fixed.

>
>> >>> 6.
>> >>> >
>> >>> > Normal statements that the walsender for logical rep runs are logged
>> >>> > by log_replication_commands. So if log_statement is also set,
>> >>> > those commands are logged twice.
>> >>>
>> >>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>> >>> connection parameter of libpqrcv if logical = true.
>> >>
>> >> The control by log_replication_commands is performed on
>> >> walsender, so this also shuld be done on the same place. Anyway
>> >> log_statement is irrelevant to walsender.
>> >
>> > Patch 006 emits all logs executed by the apply worker as a log of
>> > log_replication_command but perhaps you're right. It would be better
>> > to leave the log of log_statement if the command executed by the apply
>> > worker is a SQL command. Will fix.
>
> Here, we can choose whether a SQL command is a part of
> replication commands or not. The previous fix thought it is and
> this doesn't. (They are emitted in different forms) Is this ok?

Yes, v2 patch logs other than T_SQLCmd as a replication command, and
T_SQLCmd is logged later in exec_simple_query. The
log_replication_command logs only replication commands[1], it doesn't
mean to log commands executed on replication connection. I think such
behavior is right.

> I'm not sure why you have moved the location of the code, but if
> it should show all received commands, it might be better to be
> placed before CHECK_FOR_INTERRUPTS..

Hmm, the reason why I've moved it is that we cannot know whether given
cmd_string is a SQL command or a replication command before parsing.

>
> The comment for the code seems should be more clearly.
>
> - * compatibility. To prevent the command is logged doubly, we doesn't
> - * log it when the command is a SQL command.
> + * compatibility. SQL command are logged later according
> + * to log_statement setting.

Fixed.

>> >>> 7.
>> >>> >
>> >>> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
>> >>> > an error. The callback function to reset it needs to be registered
>> >>> > via on_shmem_exit().
>> >>>
>> >>> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
>> >>>
>> >>> 8.
>> >>> >
>> >>> > Typo: "an subscriber" should be "a subscriber" in some places.
>> >>>
>> >>> It seems that there is no longer these typo.
>> >>>
>> >>> 9.
>> >>> >     /* Get conninfo */
>> >>> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
>> >>> >         tup,
>> >>> >         Anum_pg_subscription_subconninfo,
>> >>> >         &isnull);
>> >>> >     Assert(!isnull);
>> >>> >
>> >>> > This assertion makes me think that pg_subscription.subconnifo should
>> >>> > have NOT NULL constraint. Also subslotname and subpublications
>> >>> > should have NOT NULL constraint because of the same reason.
>> >>>
>> >>> Agreed. Attached 009 patch add NOT NULL constraint to all other
>> >>> columns of pg_subscription. pg_subscription.subsynccommit should have
>> >>> it as well.
>> >>
>> >> I'm not sure the policy here, but I suppose that the assertions
>> >> there are still required irrelevantly from the nun-nullness of
>> >> the attribute.
>> >
>> > You're right. Will fix it.
>> >
>> >>> 10.
>> >>> >
>> >>> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>> >>> >     MyLogicalRepWorker->relstate =
>> >>> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>> >>> >             MyLogicalRepWorker->relid,
>> >>> >             &MyLogicalRepWorker->relstate_lsn,
>> >>> >             false);
>> >>> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>> >>> >
>> >>> > Non-"short-term" function like GetSubscriptionRelState() should not
>> >>> > be called while spinlock is being held.
>> >>> >
>> >>>
>> >>> One option is adding new LWLock for the relation state change but this
>> >>> lock is used at many locations where the operation actually doesn't
>> >>> take time. I think that the discussion would be needed to get
>> >>> consensus, so patch for it is not attached.
>> >>
>> >> From the point of view of time, I agree that it doesn't seem to
>> >> harm. Bt I thing it as more significant problem that
>> >> potentially-throwable function is called within the mutex
>> >> region. It potentially causes a kind of dead lock.  (That said,
>> >> theoretically dosn't occur and I'm not sure what happens by the
>> >> dead lock..)
>> >>
>> >>
>> >>> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp
>> >>
>> >> --
>> >> Kyotaro Horiguchi
>> >> NTT Open Source Software Center
>> >>
>> >
>> >
>>
>> I've attached latest v2 three patches; 001, 006 and 009. The review
>> comments I got so far are incorporated.
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Attached current latest all patches.

[1] https://www.postgresql.org/docs/devel/static/runtime-config-logging.html#guc-log-replication-commands

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Petr Jelinek
Date:
Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>>>>>> 3.
>>>>>>>
>>>>>>> ApplyLauncherWakeup() should be static function.
>>>>>>
>>>>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>>>>
>>>>>> 4.
>>>>>>>
>>>>>>> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>>>>>> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>>>>
>>>>>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>>>>>
>>>>>> 5.
>>>>>>>
>>>>>>> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>>>>
>>>>>> I also guess it's necessary. This change is included in #3 patch.
>>>>>
>>>>> IsBackendPid() is not free, I suppose that just ignoring failure
>>>>> with ESRCH is enough.
>>>>
>>>> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
>>>> check if launcher_pid != 0?
>>
>> Yes. For example, code to send a signal to autovacuum launcher
>> looks as the follows. I don't think there's a reason to do
>> different thing here.
>>
>> |       avlauncher_needs_signal = false;
>> |       if (AutoVacPID != 0)
>> |         kill(AutoVacPID, SIGUSR2);
>>
> 
> Fixed.

Yes the IsBackendPid was needed only because we didn't reset
launcher_pid and it was causing issues otherwise obviously (and I didn't
realize we are not resetting it...)

> 
>>
>>>>>> 6.
>>>>>>>
>>>>>>> Normal statements that the walsender for logical rep runs are logged
>>>>>>> by log_replication_commands. So if log_statement is also set,
>>>>>>> those commands are logged twice.
>>>>>>
>>>>>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>>>>>> connection parameter of libpqrcv if logical = true.
>>>>>
>>>>> The control by log_replication_commands is performed on
>>>>> walsender, so this also shuld be done on the same place. Anyway
>>>>> log_statement is irrelevant to walsender.
>>>>
>>>> Patch 006 emits all logs executed by the apply worker as a log of
>>>> log_replication_command but perhaps you're right. It would be better
>>>> to leave the log of log_statement if the command executed by the apply
>>>> worker is a SQL command. Will fix.
>>
>> Here, we can choose whether a SQL command is a part of
>> replication commands or not. The previous fix thought it is and
>> this doesn't. (They are emitted in different forms) Is this ok?
> 
> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
> T_SQLCmd is logged later in exec_simple_query. The
> log_replication_command logs only replication commands[1], it doesn't
> mean to log commands executed on replication connection. I think such
> behavior is right.
> 
>> I'm not sure why you have moved the location of the code, but if
>> it should show all received commands, it might be better to be
>> placed before CHECK_FOR_INTERRUPTS..
> 
> Hmm, the reason why I've moved it is that we cannot know whether given
> cmd_string is a SQL command or a replication command before parsing.
> 

Well the issue is that then the query is not logged if there was parsing
issue even when logging was specifically requested. I don't know what's
good solution here, maybe teaching exec_simple_query to not log the
query if it came from walsender.

BTW reading that function in walsender, there is :
>     /*
>      * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
>      * command arrives. Clean up the old stuff if there's anything.
>      */
>     SnapBuildClearExportedSnapshot();

and then

>     /*
>      * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
>      * called outside of transaction the snapshot should be cleared here.
>      */
>     if (!IsTransactionBlock())
>         SnapBuildClearExportedSnapshot();

The first one should not be there, it looks like a result of some merge
conflict being solved wrongly. Maybe your patch could fix that too?

>>>>
>>>>>> 10.
>>>>>>>
>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>>     MyLogicalRepWorker->relstate =
>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>>             MyLogicalRepWorker->relid,
>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
>>>>>>>             false);
>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>
>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>> be called while spinlock is being held.
>>>>>>>
>>>>>>
>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>> take time. I think that the discussion would be needed to get
>>>>>> consensus, so patch for it is not attached.
>>>>>
>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>> harm. Bt I thing it as more significant problem that
>>>>> potentially-throwable function is called within the mutex
>>>>> region. It potentially causes a kind of dead lock.  (That said,
>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>> dead lock..)
>>>>>

Hmm, I think doing what I attached should be fine here. We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.


-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> Thank you for working on this!
>
> On 18/04/17 10:16, Masahiko Sawada wrote:
>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>
>>>>>>> 3.
>>>>>>>>
>>>>>>>> ApplyLauncherWakeup() should be static function.
>>>>>>>
>>>>>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>>>>>
>>>>>>> 4.
>>>>>>>>
>>>>>>>> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>>>>>>> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>>>>>
>>>>>>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>>>>>>
>>>>>>> 5.
>>>>>>>>
>>>>>>>> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>>>>>
>>>>>>> I also guess it's necessary. This change is included in #3 patch.
>>>>>>
>>>>>> IsBackendPid() is not free, I suppose that just ignoring failure
>>>>>> with ESRCH is enough.
>>>>>
>>>>> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
>>>>> check if launcher_pid != 0?
>>>
>>> Yes. For example, code to send a signal to autovacuum launcher
>>> looks as the follows. I don't think there's a reason to do
>>> different thing here.
>>>
>>> |       avlauncher_needs_signal = false;
>>> |       if (AutoVacPID != 0)
>>> |         kill(AutoVacPID, SIGUSR2);
>>>
>>
>> Fixed.
>
> Yes the IsBackendPid was needed only because we didn't reset
> launcher_pid and it was causing issues otherwise obviously (and I didn't
> realize we are not resetting it...)
>
>>
>>>
>>>>>>> 6.
>>>>>>>>
>>>>>>>> Normal statements that the walsender for logical rep runs are logged
>>>>>>>> by log_replication_commands. So if log_statement is also set,
>>>>>>>> those commands are logged twice.
>>>>>>>
>>>>>>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>>>>>>> connection parameter of libpqrcv if logical = true.
>>>>>>
>>>>>> The control by log_replication_commands is performed on
>>>>>> walsender, so this also shuld be done on the same place. Anyway
>>>>>> log_statement is irrelevant to walsender.
>>>>>
>>>>> Patch 006 emits all logs executed by the apply worker as a log of
>>>>> log_replication_command but perhaps you're right. It would be better
>>>>> to leave the log of log_statement if the command executed by the apply
>>>>> worker is a SQL command. Will fix.
>>>
>>> Here, we can choose whether a SQL command is a part of
>>> replication commands or not. The previous fix thought it is and
>>> this doesn't. (They are emitted in different forms) Is this ok?
>>
>> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
>> T_SQLCmd is logged later in exec_simple_query. The
>> log_replication_command logs only replication commands[1], it doesn't
>> mean to log commands executed on replication connection. I think such
>> behavior is right.
>>
>>> I'm not sure why you have moved the location of the code, but if
>>> it should show all received commands, it might be better to be
>>> placed before CHECK_FOR_INTERRUPTS..
>>
>> Hmm, the reason why I've moved it is that we cannot know whether given
>> cmd_string is a SQL command or a replication command before parsing.
>>
>
> Well the issue is that then the query is not logged if there was parsing
> issue even when logging was specifically requested. I don't know what's
> good solution here, maybe teaching exec_simple_query to not log the
> query if it came from walsender.
>
> BTW reading that function in walsender, there is :
>>       /*
>>        * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
>>        * command arrives. Clean up the old stuff if there's anything.
>>        */
>>       SnapBuildClearExportedSnapshot();
>
> and then
>
>>       /*
>>        * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
>>        * called outside of transaction the snapshot should be cleared here.
>>        */
>>       if (!IsTransactionBlock())
>>               SnapBuildClearExportedSnapshot();
>
> The first one should not be there, it looks like a result of some merge
> conflict being solved wrongly. Maybe your patch could fix that too?
>
>>>>>
>>>>>>> 10.
>>>>>>>>
>>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>>>     MyLogicalRepWorker->relstate =
>>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>>>             MyLogicalRepWorker->relid,
>>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
>>>>>>>>             false);
>>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>>
>>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>>> be called while spinlock is being held.
>>>>>>>>
>>>>>>>
>>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>>> take time. I think that the discussion would be needed to get
>>>>>>> consensus, so patch for it is not attached.
>>>>>>
>>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>>> harm. Bt I thing it as more significant problem that
>>>>>> potentially-throwable function is called within the mutex
>>>>>> region. It potentially causes a kind of dead lock.  (That said,
>>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>>> dead lock..)
>>>>>>
>
> Hmm, I think doing what I attached should be fine here.

Thanks for the patch!

> We don't need to
> hold spinlock for table read, just for changing the
> MyLogicalRepWorker->relstate.

Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
be protected with the spinlock.

Regards,

-- 
Fujii Masao



Re: [HACKERS] some review comments on logical rep code

From
Petr Jelinek
Date:
On 18/04/17 19:27, Fujii Masao wrote:
> On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> Thank you for working on this!
>>
>> On 18/04/17 10:16, Masahiko Sawada wrote:
>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>>
>>>>>>>> 10.
>>>>>>>>>
>>>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>>>>     MyLogicalRepWorker->relstate =
>>>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>>>>             MyLogicalRepWorker->relid,
>>>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
>>>>>>>>>             false);
>>>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>>>
>>>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>>>> be called while spinlock is being held.
>>>>>>>>>
>>>>>>>>
>>>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>>>> take time. I think that the discussion would be needed to get
>>>>>>>> consensus, so patch for it is not attached.
>>>>>>>
>>>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>>>> harm. Bt I thing it as more significant problem that
>>>>>>> potentially-throwable function is called within the mutex
>>>>>>> region. It potentially causes a kind of dead lock.  (That said,
>>>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>>>> dead lock..)
>>>>>>>
>>
>> Hmm, I think doing what I attached should be fine here.
> 
> Thanks for the patch!
> 
>> We don't need to
>> hold spinlock for table read, just for changing the
>> MyLogicalRepWorker->relstate.
> 
> Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
> be protected with the spinlock.
> 

Yes, sorry tired/blind, fixed.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Wed, 19 Apr 2017 04:18:18 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<c2cfda3b-9335-2b57-e9ee-a55a8646afcd@2ndquadrant.com>
> On 18/04/17 19:27, Fujii Masao wrote:
> > On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
> > <petr.jelinek@2ndquadrant.com> wrote:
> >> Thank you for working on this!
> >>
> >> On 18/04/17 10:16, Masahiko Sawada wrote:
> >>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
> >>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >>>>>>
> >>>>>>>> 10.
> >>>>>>>>>
> >>>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> >>>>>>>>>     MyLogicalRepWorker->relstate =
> >>>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
> >>>>>>>>>             MyLogicalRepWorker->relid,
> >>>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
> >>>>>>>>>             false);
> >>>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >>>>>>>>>
> >>>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
> >>>>>>>>> be called while spinlock is being held.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> One option is adding new LWLock for the relation state change but this
> >>>>>>>> lock is used at many locations where the operation actually doesn't
> >>>>>>>> take time. I think that the discussion would be needed to get
> >>>>>>>> consensus, so patch for it is not attached.
> >>>>>>>
> >>>>>>> From the point of view of time, I agree that it doesn't seem to
> >>>>>>> harm. Bt I thing it as more significant problem that
> >>>>>>> potentially-throwable function is called within the mutex
> >>>>>>> region. It potentially causes a kind of dead lock.  (That said,
> >>>>>>> theoretically dosn't occur and I'm not sure what happens by the
> >>>>>>> dead lock..)
> >>>>>>>
> >>
> >> Hmm, I think doing what I attached should be fine here.
> > 
> > Thanks for the patch!
> > 
> >> We don't need to
> >> hold spinlock for table read, just for changing the
> >> MyLogicalRepWorker->relstate.
> > 
> > Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
> > be protected with the spinlock.
> > 
> 
> Yes, sorry tired/blind, fixed.

Commit has been moved from after to before of the lock section.
This causes potential race condition. (As the same as the
potential dead-lock, I'm not sure it can cause realistic problem,
though..) Isn't it better to be after the lock section?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Petr Jelinek
Date:
On 19/04/17 10:25, Kyotaro HORIGUCHI wrote:
> At Wed, 19 Apr 2017 04:18:18 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<c2cfda3b-9335-2b57-e9ee-a55a8646afcd@2ndquadrant.com>
>> On 18/04/17 19:27, Fujii Masao wrote:
>>> On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
>>> <petr.jelinek@2ndquadrant.com> wrote:
>>>> Thank you for working on this!
>>>>
>>>> On 18/04/17 10:16, Masahiko Sawada wrote:
>>>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>>>>
>>>>>>>>>> 10.
>>>>>>>>>>>
>>>>>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>>>>>>     MyLogicalRepWorker->relstate =
>>>>>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>>>>>>             MyLogicalRepWorker->relid,
>>>>>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
>>>>>>>>>>>             false);
>>>>>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>>>>>
>>>>>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>>>>>> be called while spinlock is being held.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>>>>>> take time. I think that the discussion would be needed to get
>>>>>>>>>> consensus, so patch for it is not attached.
>>>>>>>>>
>>>>>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>>>>>> harm. Bt I thing it as more significant problem that
>>>>>>>>> potentially-throwable function is called within the mutex
>>>>>>>>> region. It potentially causes a kind of dead lock.  (That said,
>>>>>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>>>>>> dead lock..)
>>>>>>>>>
>>>>
>>>> Hmm, I think doing what I attached should be fine here.
>>>
>>> Thanks for the patch!
>>>
>>>> We don't need to
>>>> hold spinlock for table read, just for changing the
>>>> MyLogicalRepWorker->relstate.
>>>
>>> Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
>>> be protected with the spinlock.
>>>
>>
>> Yes, sorry tired/blind, fixed.
> 
> Commit has been moved from after to before of the lock section.
> This causes potential race condition. (As the same as the
> potential dead-lock, I'm not sure it can cause realistic problem,
> though..) Isn't it better to be after the lock section?
> 

We just read catalogs so there should not be locking issues.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>
> > Commit has been moved from after to before of the lock section.
> > This causes potential race condition. (As the same as the
> > potential dead-lock, I'm not sure it can cause realistic problem,
> > though..) Isn't it better to be after the lock section?
> > 
> 
> We just read catalogs so there should not be locking issues.

Some other process may modify it then go to there. With a kind of
priority inversion, the process may modify the data on the memory
*before* we modify it. Of course this is rather unrealistic,
though.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20170419.174317.114509231.horiguchi.kyotaro@lab.ntt.co.jp>
> At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>
> > > Commit has been moved from after to before of the lock section.
> > > This causes potential race condition. (As the same as the
> > > potential dead-lock, I'm not sure it can cause realistic problem,
> > > though..) Isn't it better to be after the lock section?
> > > 
> > 
> > We just read catalogs so there should not be locking issues.
> 
> Some other process may modify it then go to there. With a kind of
> priority inversion, the process may modify the data on the memory
> *before* we modify it. Of course this is rather unrealistic,
> though.

A bit short.

Some other process may modify it *after* we read it then go to
there. With a kind of priority inversion, the process may modify
the data on the memory *before* we modify it. Of course this is
rather unrealistic, though.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Petr Jelinek
Date:
On 19/04/17 10:45, Kyotaro HORIGUCHI wrote:
> At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20170419.174317.114509231.horiguchi.kyotaro@lab.ntt.co.jp>
 
>> At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>
>>>> Commit has been moved from after to before of the lock section.
>>>> This causes potential race condition. (As the same as the
>>>> potential dead-lock, I'm not sure it can cause realistic problem,
>>>> though..) Isn't it better to be after the lock section?
>>>>
>>>
>>> We just read catalogs so there should not be locking issues.
>>
>> Some other process may modify it then go to there. With a kind of
>> priority inversion, the process may modify the data on the memory
>> *before* we modify it. Of course this is rather unrealistic,
>> though.
> 
> A bit short.
> 
> Some other process may modify it *after* we read it then go to
> there. With a kind of priority inversion, the process may modify
> the data on the memory *before* we modify it. Of course this is
> rather unrealistic, though.
> 

Yeah but I think that's risk anyway in MVCC read committed database, the
only way to protect against that would be to hold lock over the catalogs
access which was what we wanted to get rid of :)

BTW the whole sync coordination is explicitly written in a way that
unless you mess with catalogs manually only the tablesync process should
do UPDATEs to that catalog (with the exception of s->r state switch
which can happen in apply and has no effect on sync because both states
mean that synchronization is done, only makes apply stop tracking the
table individually).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] some review comments on logical rep code

From
Noah Misch
Date:
On Sun, Apr 16, 2017 at 06:14:49AM +0000, Noah Misch wrote:
> On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:
> > Though I've read only a part of the logical rep code yet, I'd like to
> > share some (relatively minor) review comments that I got so far.
> > 
> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> > value from the argument, instead of DatumGetObjectId().
> > 
> > No one resets on_commit_launcher_wakeup flag to false.
> > 
> > ApplyLauncherWakeup() should be static function.
> > 
> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> > 
> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> > 
> > Normal statements that the walsender for logical rep runs are logged
> > by log_replication_commands. So if log_statement is also set,
> > those commands are logged twice.
> > 
> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> > an error. The callback function to reset it needs to be registered
> > via on_shmem_exit().
> > 
> > Typo: "an subscriber" should be "a subscriber" in some places.
> > 
> >     /* Get conninfo */
> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
> >         tup,
> >         Anum_pg_subscription_subconninfo,
> >         &isnull);
> >     Assert(!isnull);
> > 
> > This assertion makes me think that pg_subscription.subconnifo should
> > have NOT NULL constraint. Also subslotname and subpublications
> > should have NOT NULL constraint because of the same reason.
> > 
> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> >     MyLogicalRepWorker->relstate =
> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
> >             MyLogicalRepWorker->relid,
> >             &MyLogicalRepWorker->relstate_lsn,
> >             false);
> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
> > 
> > Non-"short-term" function like GetSubscriptionRelState() should not
> > be called while spinlock is being held.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hi,
>>
>> Thank you for the revised version.
>>
>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
>>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
>>> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>>> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> >>> 1.
>>> >>> >
>>> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>> >>> > value from the argument, instead of DatumGetObjectId().
>>> >>>
>>> >>> Attached 001 patch fixes it.
>>> >>
>>> >> Hmm. I looked at the launcher side and found that it assigns bare
>>> >> integer to bgw_main_arg. It also should use Int32GetDatum.
>>> >
>>> > Yeah, I agreed. Will fix it.
>>
>> Thanks.
>>
>>> >>> 2.
>>> >>> >
>>> >>> > No one resets on_commit_launcher_wakeup flag to false.
>>> >>>
>>> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>>> >>> up the launcher process.
>>> >>
>>> >> It is omitting the abort case. Maybe it should be
>>> >> AtEOXact_ApplyLauncher(bool commit)?
>>> >
>>> > Should we wake up the launcher process even when abort?
>>
>> No, I meant that on_commit_launcher_wakeup should be just reset
>> without waking up launcher on abort.
>
> I understood. Sounds reasonable.

ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.
To fix this issue, we should call AtEOXact_ApplyLauncher() in
FinishPreparedTransaction() or somewhere?

Regards,

-- 
Fujii Masao



Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Thu, Apr 20, 2017 at 12:05 PM, Noah Misch <noah@leadboat.com> wrote:
> On Sun, Apr 16, 2017 at 06:14:49AM +0000, Noah Misch wrote:
>> On Fri, Apr 14, 2017 at 04:47:12AM +0900, Fujii Masao wrote:
>> > Though I've read only a part of the logical rep code yet, I'd like to
>> > share some (relatively minor) review comments that I got so far.
>> >
>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>> > value from the argument, instead of DatumGetObjectId().
>> >
>> > No one resets on_commit_launcher_wakeup flag to false.
>> >
>> > ApplyLauncherWakeup() should be static function.
>> >
>> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>> >
>> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>> >
>> > Normal statements that the walsender for logical rep runs are logged
>> > by log_replication_commands. So if log_statement is also set,
>> > those commands are logged twice.
>> >
>> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
>> > an error. The callback function to reset it needs to be registered
>> > via on_shmem_exit().
>> >
>> > Typo: "an subscriber" should be "a subscriber" in some places.
>> >
>> >     /* Get conninfo */
>> >     datum = SysCacheGetAttr(SUBSCRIPTIONOID,
>> >         tup,
>> >         Anum_pg_subscription_subconninfo,
>> >         &isnull);
>> >     Assert(!isnull);
>> >
>> > This assertion makes me think that pg_subscription.subconnifo should
>> > have NOT NULL constraint. Also subslotname and subpublications
>> > should have NOT NULL constraint because of the same reason.
>> >
>> >     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>> >     MyLogicalRepWorker->relstate =
>> >       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>> >             MyLogicalRepWorker->relid,
>> >             &MyLogicalRepWorker->relstate_lsn,
>> >             false);
>> >     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>> >
>> > Non-"short-term" function like GetSubscriptionRelState() should not
>> > be called while spinlock is being held.
>>
>> [Action required within three days.  This is a generic notification.]
>>
>> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> v10 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within three calendar days of
>> this message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping v10.  Consequently, I will appreciate your efforts
>> toward speedy resolution.  Thanks.
>>
>> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I'm not the owner of this item, but the current status is;

I've pushed several patches, and there is now only one remaining patch.
I posted the review comment on that patch, and I'm expecting that
Masahiko-san will update the patch. So what about waiting for the updated
version of the patch by next Monday (April 24th)?

Regards,

-- 
Fujii Masao



Re: [HACKERS] some review comments on logical rep code

From
Peter Eisentraut
Date:
On 4/20/17 12:30, Fujii Masao wrote:
> I've pushed several patches, and there is now only one remaining patch.
> I posted the review comment on that patch, and I'm expecting that
> Masahiko-san will update the patch. So what about waiting for the updated
> version of the patch by next Monday (April 24th)?

Thank you for pitching in.

Where is your latest patch?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Fri, Apr 21, 2017 at 4:41 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 4/20/17 12:30, Fujii Masao wrote:
>> I've pushed several patches, and there is now only one remaining patch.
>> I posted the review comment on that patch, and I'm expecting that
>> Masahiko-san will update the patch. So what about waiting for the updated
>> version of the patch by next Monday (April 24th)?
>
> Thank you for pitching in.
>
> Where is your latest patch?
>

The remaining patch is 002_Reset_on_commit_launcher_wakeup_v3.patch
attached on this mail[1]. I'll update this patch and send it.

[1]
https://www.postgresql.org/message-id/CAD21AoB1HiZV%2B%2Bah%3DVrQjVZoH%2Bb6Rn8Atv6Miz%2BHCp0j%2BL6GZQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Wed, 19 Apr 2017 10:59:00 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<3ef9c831-0508-51a9-5ded-c2e31e958a9f@2ndquadrant.com>
> On 19/04/17 10:45, Kyotaro HORIGUCHI wrote:
> > At Wed, 19 Apr 2017 17:43:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20170419.174317.114509231.horiguchi.kyotaro@lab.ntt.co.jp>
 
> >> At Wed, 19 Apr 2017 10:33:29 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<ed73a706-9e15-f137-2d55-f05361f2de9a@2ndquadrant.com>
> >> Some other process may modify it then go to there. With a kind of
> >> priority inversion, the process may modify the data on the memory
> >> *before* we modify it. Of course this is rather unrealistic,
> >> though.
> > 
> > A bit short.
> > 
> > Some other process may modify it *after* we read it then go to
> > there. With a kind of priority inversion, the process may modify
> > the data on the memory *before* we modify it. Of course this is
> > rather unrealistic, though.
> > 
> 
> Yeah but I think that's risk anyway in MVCC read committed database, the
> only way to protect against that would be to hold lock over the catalogs
> access which was what we wanted to get rid of :)

Agreed, or, I'm not sure about the real risks.

> BTW the whole sync coordination is explicitly written in a way that
> unless you mess with catalogs manually only the tablesync process should
> do UPDATEs to that catalog (with the exception of s->r state switch
> which can happen in apply and has no effect on sync because both states
> mean that synchronization is done, only makes apply stop tracking the
> table individually).

Hmm. Inhibiting retrospective updates of the on-memory data would
save from some kind of priority inversions but such kind of
manual operation is similar to overwriting of pg_control and I
think is not worth bothering about:p

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> Hi,
>>>
>>> Thank you for the revised version.
>>>
>>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
>>>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
>>>> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>>>> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> >>> 1.
>>>> >>> >
>>>> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>>> >>> > value from the argument, instead of DatumGetObjectId().
>>>> >>>
>>>> >>> Attached 001 patch fixes it.
>>>> >>
>>>> >> Hmm. I looked at the launcher side and found that it assigns bare
>>>> >> integer to bgw_main_arg. It also should use Int32GetDatum.
>>>> >
>>>> > Yeah, I agreed. Will fix it.
>>>
>>> Thanks.
>>>
>>>> >>> 2.
>>>> >>> >
>>>> >>> > No one resets on_commit_launcher_wakeup flag to false.
>>>> >>>
>>>> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>>>> >>> up the launcher process.
>>>> >>
>>>> >> It is omitting the abort case. Maybe it should be
>>>> >> AtEOXact_ApplyLauncher(bool commit)?
>>>> >
>>>> > Should we wake up the launcher process even when abort?
>>>
>>> No, I meant that on_commit_launcher_wakeup should be just reset
>>> without waking up launcher on abort.
>>
>> I understood. Sounds reasonable.
>
> ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.

Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
to a prepared transaction. Even COMMIT PREPARED might wake up the
launcher process but might not wake up it. ROLLBACK PREPARED is also
the same. For example, in the following situation we wake up the
launcher at COMMIT, not at COMMIT PREPARED.

(BTW, CreateSubscription, is the only one function that sets
on_commit_launcher_wakeup for now, cannot be called in a transaction
block. So it doesn't actually happen that we wake up the launcher when
a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
SUBSCRIPTION ENABLE, we need to deal with it.)

BEGIN;
ALTER SUBSCRIPTION hoge_sub ENABLE;
PREPARE TRANSACTION 'g';
BEGIN;
SELECT 1;
COMMIT;  -- wake up the launcher at this point.
COMMIT PREPARED 'g';

Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
not a big deal to the launcher process actually. Compared with the
complexly of changing the logic so that on_commit_launcher_wakeup
corresponds to prepared transaction, we might want to accept this
behavior.

> To fix this issue, we should call AtEOXact_ApplyLauncher() in
> FinishPreparedTransaction() or somewhere?
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Fri, Apr 21, 2017 at 11:55 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 4:41 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 4/20/17 12:30, Fujii Masao wrote:
>>> I've pushed several patches, and there is now only one remaining patch.
>>> I posted the review comment on that patch, and I'm expecting that
>>> Masahiko-san will update the patch. So what about waiting for the updated
>>> version of the patch by next Monday (April 24th)?
>>
>> Thank you for pitching in.
>>
>> Where is your latest patch?
>>
>
> The remaining patch is 002_Reset_on_commit_launcher_wakeup_v3.patch

Oops, there is one more remaining issue that reported on this thread
and another one[1]. The latest patch is attached on [1].

[1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro%40lab.ntt.co.jp

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>> Hi,
>>>>
>>>> Thank you for the revised version.
>>>>
>>>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
>>>>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
>>>>> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>>>>> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>> >>> 1.
>>>>> >>> >
>>>>> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>>>> >>> > value from the argument, instead of DatumGetObjectId().
>>>>> >>>
>>>>> >>> Attached 001 patch fixes it.
>>>>> >>
>>>>> >> Hmm. I looked at the launcher side and found that it assigns bare
>>>>> >> integer to bgw_main_arg. It also should use Int32GetDatum.
>>>>> >
>>>>> > Yeah, I agreed. Will fix it.
>>>>
>>>> Thanks.
>>>>
>>>>> >>> 2.
>>>>> >>> >
>>>>> >>> > No one resets on_commit_launcher_wakeup flag to false.
>>>>> >>>
>>>>> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>>>>> >>> up the launcher process.
>>>>> >>
>>>>> >> It is omitting the abort case. Maybe it should be
>>>>> >> AtEOXact_ApplyLauncher(bool commit)?
>>>>> >
>>>>> > Should we wake up the launcher process even when abort?
>>>>
>>>> No, I meant that on_commit_launcher_wakeup should be just reset
>>>> without waking up launcher on abort.
>>>
>>> I understood. Sounds reasonable.
>>
>> ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.
>
> Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
> to a prepared transaction. Even COMMIT PREPARED might wake up the
> launcher process but might not wake up it. ROLLBACK PREPARED is also
> the same. For example, in the following situation we wake up the
> launcher at COMMIT, not at COMMIT PREPARED.
>
> (BTW, CreateSubscription, is the only one function that sets
> on_commit_launcher_wakeup for now, cannot be called in a transaction
> block. So it doesn't actually happen that we wake up the launcher when
> a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
> SUBSCRIPTION ENABLE, we need to deal with it.)

We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
within the transaction block.

>
> BEGIN;
> ALTER SUBSCRIPTION hoge_sub ENABLE;
> PREPARE TRANSACTION 'g';
> BEGIN;
> SELECT 1;
> COMMIT;  -- wake up the launcher at this point.
> COMMIT PREPARED 'g';
>
> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
> not a big deal to the launcher process actually. Compared with the
> complexly of changing the logic so that on_commit_launcher_wakeup
> corresponds to prepared transaction, we might want to accept this
> behavior.

on_commit_launcher_wakeup needs to be recoreded in 2PC state
file to handle this issue properly. But I agree with you, that would
be overkill for small gain. So I'm thinking to add the following
comment into your patch and push it. Thought?

-------------------------
Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
For example, COMMIT PREPARED on the transaction enabling the flag
doesn't wake up
the logical replication launcher if ROLLBACK on another transaction
runs before it.
To handle this case properly, the flag needs to be recorded in 2PC
state file so that
COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
the launcher. However this is overkill for small gain and false wakeup
of the launcher
is not so harmful (probably we can live with that), so we do nothing
here for this issue.
-------------------------

Regards,

-- 
Fujii Masao



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Sat, Apr 22, 2017 at 4:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>> Hi,
>>>>>
>>>>> Thank you for the revised version.
>>>>>
>>>>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
>>>>>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>>> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
>>>>>> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>>>>>> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>> >>> 1.
>>>>>> >>> >
>>>>>> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>>>>> >>> > value from the argument, instead of DatumGetObjectId().
>>>>>> >>>
>>>>>> >>> Attached 001 patch fixes it.
>>>>>> >>
>>>>>> >> Hmm. I looked at the launcher side and found that it assigns bare
>>>>>> >> integer to bgw_main_arg. It also should use Int32GetDatum.
>>>>>> >
>>>>>> > Yeah, I agreed. Will fix it.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>> >>> 2.
>>>>>> >>> >
>>>>>> >>> > No one resets on_commit_launcher_wakeup flag to false.
>>>>>> >>>
>>>>>> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>>>>>> >>> up the launcher process.
>>>>>> >>
>>>>>> >> It is omitting the abort case. Maybe it should be
>>>>>> >> AtEOXact_ApplyLauncher(bool commit)?
>>>>>> >
>>>>>> > Should we wake up the launcher process even when abort?
>>>>>
>>>>> No, I meant that on_commit_launcher_wakeup should be just reset
>>>>> without waking up launcher on abort.
>>>>
>>>> I understood. Sounds reasonable.
>>>
>>> ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.
>>
>> Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
>> to a prepared transaction. Even COMMIT PREPARED might wake up the
>> launcher process but might not wake up it. ROLLBACK PREPARED is also
>> the same. For example, in the following situation we wake up the
>> launcher at COMMIT, not at COMMIT PREPARED.
>>
>> (BTW, CreateSubscription, is the only one function that sets
>> on_commit_launcher_wakeup for now, cannot be called in a transaction
>> block. So it doesn't actually happen that we wake up the launcher when
>> a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
>> SUBSCRIPTION ENABLE, we need to deal with it.)
>
> We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
> within the transaction block.

Oops, I'd missed it.

>
>>
>> BEGIN;
>> ALTER SUBSCRIPTION hoge_sub ENABLE;
>> PREPARE TRANSACTION 'g';
>> BEGIN;
>> SELECT 1;
>> COMMIT;  -- wake up the launcher at this point.
>> COMMIT PREPARED 'g';
>>
>> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>> not a big deal to the launcher process actually. Compared with the
>> complexly of changing the logic so that on_commit_launcher_wakeup
>> corresponds to prepared transaction, we might want to accept this
>> behavior.
>
> on_commit_launcher_wakeup needs to be recoreded in 2PC state
> file to handle this issue properly. But I agree with you, that would
> be overkill for small gain. So I'm thinking to add the following
> comment into your patch and push it. Thought?
>
> -------------------------
> Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
> For example, COMMIT PREPARED on the transaction enabling the flag
> doesn't wake up
> the logical replication launcher if ROLLBACK on another transaction
> runs before it.
> To handle this case properly, the flag needs to be recorded in 2PC
> state file so that
> COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
> the launcher. However this is overkill for small gain and false wakeup
> of the launcher
> is not so harmful (probably we can live with that), so we do nothing
> here for this issue.
> -------------------------
>

Agreed.

I added this comment to the patch and attached it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
Hello,

At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>
> >> BEGIN;
> >> ALTER SUBSCRIPTION hoge_sub ENABLE;
> >> PREPARE TRANSACTION 'g';
> >> BEGIN;
> >> SELECT 1;
> >> COMMIT;  -- wake up the launcher at this point.
> >> COMMIT PREPARED 'g';
> >>
> >> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
> >> not a big deal to the launcher process actually. Compared with the
> >> complexly of changing the logic so that on_commit_launcher_wakeup
> >> corresponds to prepared transaction, we might want to accept this
> >> behavior.
> >
> > on_commit_launcher_wakeup needs to be recoreded in 2PC state
> > file to handle this issue properly. But I agree with you, that would
> > be overkill for small gain. So I'm thinking to add the following
> > comment into your patch and push it. Thought?
> >
> > -------------------------
> > Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
> > For example, COMMIT PREPARED on the transaction enabling the flag
> > doesn't wake up
> > the logical replication launcher if ROLLBACK on another transaction
> > runs before it.
> > To handle this case properly, the flag needs to be recorded in 2PC
> > state file so that
> > COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
> > the launcher. However this is overkill for small gain and false wakeup
> > of the launcher
> > is not so harmful (probably we can live with that), so we do nothing
> > here for this issue.
> > -------------------------
> >
> 
> Agreed.
> 
> I added this comment to the patch and attached it.

The following "However" may need a follwoing comma.

> However this is overkill for small gain and false wakeup of the
> launcher is not so harmful (probably we can live with that), so
> we do nothing here for this issue.

I agree this as a whole. But I think that the main issue here is
not false wakeups, but 'possible delay of launching new workers
by 3 minutes at most' (this is centainly a kind of false wakeups,
though). We can live with this failure when using two-paase
commmit, but I think it shouldn't happen silently.


How about providing AtPrepare_ApplyLauncher(void) like the
follows and calling it in PrepareTransaction?


void
AtPrepare_ApplyLauncher(void)
{ if (on_commit_launcher_wakeup)   ereport(WARNING,       (errmsg ("logical replication may not start immediately"),
        errhint("PREPARE TRANSACTION can hinder immediate lanching of logical replication worker.")));
}


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>
>> >> BEGIN;
>> >> ALTER SUBSCRIPTION hoge_sub ENABLE;
>> >> PREPARE TRANSACTION 'g';
>> >> BEGIN;
>> >> SELECT 1;
>> >> COMMIT;  -- wake up the launcher at this point.
>> >> COMMIT PREPARED 'g';
>> >>
>> >> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>> >> not a big deal to the launcher process actually. Compared with the
>> >> complexly of changing the logic so that on_commit_launcher_wakeup
>> >> corresponds to prepared transaction, we might want to accept this
>> >> behavior.
>> >
>> > on_commit_launcher_wakeup needs to be recoreded in 2PC state
>> > file to handle this issue properly. But I agree with you, that would
>> > be overkill for small gain. So I'm thinking to add the following
>> > comment into your patch and push it. Thought?
>> >
>> > -------------------------
>> > Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
>> > For example, COMMIT PREPARED on the transaction enabling the flag
>> > doesn't wake up
>> > the logical replication launcher if ROLLBACK on another transaction
>> > runs before it.
>> > To handle this case properly, the flag needs to be recorded in 2PC
>> > state file so that
>> > COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
>> > the launcher. However this is overkill for small gain and false wakeup
>> > of the launcher
>> > is not so harmful (probably we can live with that), so we do nothing
>> > here for this issue.
>> > -------------------------
>> >
>>
>> Agreed.
>>
>> I added this comment to the patch and attached it.
>
> The following "However" may need a follwoing comma.
>
>> However this is overkill for small gain and false wakeup of the
>> launcher is not so harmful (probably we can live with that), so
>> we do nothing here for this issue.
>
> I agree this as a whole. But I think that the main issue here is
> not false wakeups, but 'possible delay of launching new workers
> by 3 minutes at most'

In what case does launching of the apply worker delay 3 minutes at most?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Tue, 25 Apr 2017 10:11:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBAXRMBBaH5-mYXxksPaTis0xt_-yvvb2Y+jG2m-EWAoQ@mail.gmail.com>
> On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Hello,
> >
> > At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>
> >> >> BEGIN;
> >> >> ALTER SUBSCRIPTION hoge_sub ENABLE;
> >> >> PREPARE TRANSACTION 'g';
> >> >> BEGIN;
> >> >> SELECT 1;
> >> >> COMMIT;  -- wake up the launcher at this point.
> >> >> COMMIT PREPARED 'g';
> >> >>
> >> >> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
> >> >> not a big deal to the launcher process actually. Compared with the
> >> >> complexly of changing the logic so that on_commit_launcher_wakeup
> >> >> corresponds to prepared transaction, we might want to accept this
> >> >> behavior.
> >> >
> >> > on_commit_launcher_wakeup needs to be recoreded in 2PC state
> >> > file to handle this issue properly. But I agree with you, that would
> >> > be overkill for small gain. So I'm thinking to add the following
> >> > comment into your patch and push it. Thought?
> >> >
> >> > -------------------------
> >> > Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
> >> > For example, COMMIT PREPARED on the transaction enabling the flag
> >> > doesn't wake up
> >> > the logical replication launcher if ROLLBACK on another transaction
> >> > runs before it.
> >> > To handle this case properly, the flag needs to be recorded in 2PC
> >> > state file so that
> >> > COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
> >> > the launcher. However this is overkill for small gain and false wakeup
> >> > of the launcher
> >> > is not so harmful (probably we can live with that), so we do nothing
> >> > here for this issue.
> >> > -------------------------
> >> >
> >>
> >> Agreed.
> >>
> >> I added this comment to the patch and attached it.
> >
> > The following "However" may need a follwoing comma.
> >
> >> However this is overkill for small gain and false wakeup of the
> >> launcher is not so harmful (probably we can live with that), so
> >> we do nothing here for this issue.
> >
> > I agree this as a whole. But I think that the main issue here is
> > not false wakeups, but 'possible delay of launching new workers
> > by 3 minutes at most'
> 
> In what case does launching of the apply worker delay 3 minutes at most?

In the "while (!got_SIGTERM)" loop in ApplyLauncherMain, if we
tried to run enabled subscriptions but no subscription is
enabled, it waits for 3 minutes. If we turn on a subscription but
the trigger is consumed by the false wakeup, the loop sees no
enabled subscription and will sleep for 3 minutes unless any
other trigger comes.

# I haven't tried that, though.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>
>> >> BEGIN;
>> >> ALTER SUBSCRIPTION hoge_sub ENABLE;
>> >> PREPARE TRANSACTION 'g';
>> >> BEGIN;
>> >> SELECT 1;
>> >> COMMIT;  -- wake up the launcher at this point.
>> >> COMMIT PREPARED 'g';
>> >>
>> >> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>> >> not a big deal to the launcher process actually. Compared with the
>> >> complexly of changing the logic so that on_commit_launcher_wakeup
>> >> corresponds to prepared transaction, we might want to accept this
>> >> behavior.
>> >
>> > on_commit_launcher_wakeup needs to be recoreded in 2PC state
>> > file to handle this issue properly. But I agree with you, that would
>> > be overkill for small gain. So I'm thinking to add the following
>> > comment into your patch and push it. Thought?
>> >
>> > -------------------------
>> > Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
>> > For example, COMMIT PREPARED on the transaction enabling the flag
>> > doesn't wake up
>> > the logical replication launcher if ROLLBACK on another transaction
>> > runs before it.
>> > To handle this case properly, the flag needs to be recorded in 2PC
>> > state file so that
>> > COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
>> > the launcher. However this is overkill for small gain and false wakeup
>> > of the launcher
>> > is not so harmful (probably we can live with that), so we do nothing
>> > here for this issue.
>> > -------------------------
>> >
>>
>> Agreed.
>>
>> I added this comment to the patch and attached it.
>
> The following "However" may need a follwoing comma.
>
>> However this is overkill for small gain and false wakeup of the
>> launcher is not so harmful (probably we can live with that), so
>> we do nothing here for this issue.
>
> I agree this as a whole. But I think that the main issue here is
> not false wakeups, but 'possible delay of launching new workers
> by 3 minutes at most' (this is centainly a kind of false wakeups,
> though). We can live with this failure when using two-paase
> commmit, but I think it shouldn't happen silently.
>
>
> How about providing AtPrepare_ApplyLauncher(void) like the
> follows and calling it in PrepareTransaction?

Or we should apply the attached patch and handle the 2PC case properly?
I was thinking that it's overkill more than necessary, but that seems not true
as far as I implement that.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Petr Jelinek
Date:
On 26/04/17 01:01, Fujii Masao wrote:
> On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hello,
>>
>> At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>
>>>>> BEGIN;
>>>>> ALTER SUBSCRIPTION hoge_sub ENABLE;
>>>>> PREPARE TRANSACTION 'g';
>>>>> BEGIN;
>>>>> SELECT 1;
>>>>> COMMIT;  -- wake up the launcher at this point.
>>>>> COMMIT PREPARED 'g';
>>>>>
>>>>> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>>>>> not a big deal to the launcher process actually. Compared with the
>>>>> complexly of changing the logic so that on_commit_launcher_wakeup
>>>>> corresponds to prepared transaction, we might want to accept this
>>>>> behavior.
>>>>
>>>> on_commit_launcher_wakeup needs to be recoreded in 2PC state
>>>> file to handle this issue properly. But I agree with you, that would
>>>> be overkill for small gain. So I'm thinking to add the following
>>>> comment into your patch and push it. Thought?
>>>>
>>>> -------------------------
>>>> Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
>>>> For example, COMMIT PREPARED on the transaction enabling the flag
>>>> doesn't wake up
>>>> the logical replication launcher if ROLLBACK on another transaction
>>>> runs before it.
>>>> To handle this case properly, the flag needs to be recorded in 2PC
>>>> state file so that
>>>> COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
>>>> the launcher. However this is overkill for small gain and false wakeup
>>>> of the launcher
>>>> is not so harmful (probably we can live with that), so we do nothing
>>>> here for this issue.
>>>> -------------------------
>>>>
>>>
>>> Agreed.
>>>
>>> I added this comment to the patch and attached it.
>>
>> The following "However" may need a follwoing comma.
>>
>>> However this is overkill for small gain and false wakeup of the
>>> launcher is not so harmful (probably we can live with that), so
>>> we do nothing here for this issue.
>>
>> I agree this as a whole. But I think that the main issue here is
>> not false wakeups, but 'possible delay of launching new workers
>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>> though). We can live with this failure when using two-paase
>> commmit, but I think it shouldn't happen silently.
>>
>>
>> How about providing AtPrepare_ApplyLauncher(void) like the
>> follows and calling it in PrepareTransaction?
> 
> Or we should apply the attached patch and handle the 2PC case properly?
> I was thinking that it's overkill more than necessary, but that seems not true
> as far as I implement that.
> 

Looks like it does not even increase size of the 2pc file, +1 for this.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 26/04/17 01:01, Fujii Masao wrote:
>> On Mon, Apr 24, 2017 at 7:57 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> Hello,
>>>
>>> At Mon, 24 Apr 2017 11:18:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoBU8mZdGx-sTJ3u+qkaej5RPnOuotW4kfeXc4XdDNF8cw@mail.gmail.com>
>>>>>> BEGIN;
>>>>>> ALTER SUBSCRIPTION hoge_sub ENABLE;
>>>>>> PREPARE TRANSACTION 'g';
>>>>>> BEGIN;
>>>>>> SELECT 1;
>>>>>> COMMIT;  -- wake up the launcher at this point.
>>>>>> COMMIT PREPARED 'g';
>>>>>>
>>>>>> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>>>>>> not a big deal to the launcher process actually. Compared with the
>>>>>> complexly of changing the logic so that on_commit_launcher_wakeup
>>>>>> corresponds to prepared transaction, we might want to accept this
>>>>>> behavior.
>>>>>
>>>>> on_commit_launcher_wakeup needs to be recoreded in 2PC state
>>>>> file to handle this issue properly. But I agree with you, that would
>>>>> be overkill for small gain. So I'm thinking to add the following
>>>>> comment into your patch and push it. Thought?
>>>>>
>>>>> -------------------------
>>>>> Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
>>>>> For example, COMMIT PREPARED on the transaction enabling the flag
>>>>> doesn't wake up
>>>>> the logical replication launcher if ROLLBACK on another transaction
>>>>> runs before it.
>>>>> To handle this case properly, the flag needs to be recorded in 2PC
>>>>> state file so that
>>>>> COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
>>>>> the launcher. However this is overkill for small gain and false wakeup
>>>>> of the launcher
>>>>> is not so harmful (probably we can live with that), so we do nothing
>>>>> here for this issue.
>>>>> -------------------------
>>>>>
>>>>
>>>> Agreed.
>>>>
>>>> I added this comment to the patch and attached it.
>>>
>>> The following "However" may need a follwoing comma.
>>>
>>>> However this is overkill for small gain and false wakeup of the
>>>> launcher is not so harmful (probably we can live with that), so
>>>> we do nothing here for this issue.
>>>
>>> I agree this as a whole. But I think that the main issue here is
>>> not false wakeups, but 'possible delay of launching new workers
>>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>>> though). We can live with this failure when using two-paase
>>> commmit, but I think it shouldn't happen silently.
>>>
>>>
>>> How about providing AtPrepare_ApplyLauncher(void) like the
>>> follows and calling it in PrepareTransaction?
>>
>> Or we should apply the attached patch and handle the 2PC case properly?
>> I was thinking that it's overkill more than necessary, but that seems not true
>> as far as I implement that.
>>

In my honest opinion, I didn't have a big will that we should handle
even two-phase commit case, because this case is very rare (I could
not image such case) and doesn't mean to lead a harmful result such as
crash of server and returning inconsistent result. it just delays the
launching worker for at most 3 minutes. We also can deal with this for
example by making maximum nap time of apply launcher user-configurable
and document it.
But if we can deal with it by minimum changes like attached your patch I agree.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Kyotaro HORIGUCHI
Date:
At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>
> On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
> > On 26/04/17 01:01, Fujii Masao wrote:
> >>>> However this is overkill for small gain and false wakeup of the
> >>>> launcher is not so harmful (probably we can live with that), so
> >>>> we do nothing here for this issue.
> >>>
> >>> I agree this as a whole. But I think that the main issue here is
> >>> not false wakeups, but 'possible delay of launching new workers
> >>> by 3 minutes at most' (this is centainly a kind of false wakeups,
> >>> though). We can live with this failure when using two-paase
> >>> commmit, but I think it shouldn't happen silently.
> >>>
> >>>
> >>> How about providing AtPrepare_ApplyLauncher(void) like the
> >>> follows and calling it in PrepareTransaction?
> >>
> >> Or we should apply the attached patch and handle the 2PC case properly?
> >> I was thinking that it's overkill more than necessary, but that seems not true
> >> as far as I implement that.
> >>
> > Looks like it does not even increase size of the 2pc file, +1 for this.
> 
> In my honest opinion, I didn't have a big will that we should handle
> even two-phase commit case, because this case is very rare (I could
> not image such case) and doesn't mean to lead a harmful result such as
> crash of server and returning inconsistent result. it just delays the
> launching worker for at most 3 minutes. We also can deal with this for
> example by making maximum nap time of apply launcher user-configurable
> and document it.
> But if we can deal with it by minimum changes like attached your patch I agree.

This change looks reasonable to me, +1 from me to this.

The patch reads on_commit_launcher_wakeup directly then updates
it via ApplyALuncherWakeupAtCommit() but it's too much to add a
function for the sake of this.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>
>> On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
>> <petr.jelinek@2ndquadrant.com> wrote:
>> > On 26/04/17 01:01, Fujii Masao wrote:
>> >>>> However this is overkill for small gain and false wakeup of the
>> >>>> launcher is not so harmful (probably we can live with that), so
>> >>>> we do nothing here for this issue.
>> >>>
>> >>> I agree this as a whole. But I think that the main issue here is
>> >>> not false wakeups, but 'possible delay of launching new workers
>> >>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>> >>> though). We can live with this failure when using two-paase
>> >>> commmit, but I think it shouldn't happen silently.
>> >>>
>> >>>
>> >>> How about providing AtPrepare_ApplyLauncher(void) like the
>> >>> follows and calling it in PrepareTransaction?
>> >>
>> >> Or we should apply the attached patch and handle the 2PC case properly?
>> >> I was thinking that it's overkill more than necessary, but that seems not true
>> >> as far as I implement that.
>> >>
>> > Looks like it does not even increase size of the 2pc file, +1 for this.
>>
>> In my honest opinion, I didn't have a big will that we should handle
>> even two-phase commit case, because this case is very rare (I could
>> not image such case) and doesn't mean to lead a harmful result such as
>> crash of server and returning inconsistent result. it just delays the
>> launching worker for at most 3 minutes. We also can deal with this for
>> example by making maximum nap time of apply launcher user-configurable
>> and document it.
>> But if we can deal with it by minimum changes like attached your patch I agree.
>
> This change looks reasonable to me, +1 from me to this.
>
> The patch reads on_commit_launcher_wakeup directly then updates
> it via ApplyALuncherWakeupAtCommit() but it's too much to add a
> function for the sake of this.

OK, so what about the attached patch? I replaced all the calls to
ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>
>>> On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
>>> <petr.jelinek@2ndquadrant.com> wrote:
>>> > On 26/04/17 01:01, Fujii Masao wrote:
>>> >>>> However this is overkill for small gain and false wakeup of the
>>> >>>> launcher is not so harmful (probably we can live with that), so
>>> >>>> we do nothing here for this issue.
>>> >>>
>>> >>> I agree this as a whole. But I think that the main issue here is
>>> >>> not false wakeups, but 'possible delay of launching new workers
>>> >>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>>> >>> though). We can live with this failure when using two-paase
>>> >>> commmit, but I think it shouldn't happen silently.
>>> >>>
>>> >>>
>>> >>> How about providing AtPrepare_ApplyLauncher(void) like the
>>> >>> follows and calling it in PrepareTransaction?
>>> >>
>>> >> Or we should apply the attached patch and handle the 2PC case properly?
>>> >> I was thinking that it's overkill more than necessary, but that seems not true
>>> >> as far as I implement that.
>>> >>
>>> > Looks like it does not even increase size of the 2pc file, +1 for this.
>>>
>>> In my honest opinion, I didn't have a big will that we should handle
>>> even two-phase commit case, because this case is very rare (I could
>>> not image such case) and doesn't mean to lead a harmful result such as
>>> crash of server and returning inconsistent result. it just delays the
>>> launching worker for at most 3 minutes. We also can deal with this for
>>> example by making maximum nap time of apply launcher user-configurable
>>> and document it.
>>> But if we can deal with it by minimum changes like attached your patch I agree.
>>
>> This change looks reasonable to me, +1 from me to this.
>>
>> The patch reads on_commit_launcher_wakeup directly then updates
>> it via ApplyALuncherWakeupAtCommit() but it's too much to add a
>> function for the sake of this.
>
> OK, so what about the attached patch? I replaced all the calls to
> ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".

BTW, while I was reading the code to implement the patch that
I posted upthread, I found that the following condition doesn't
work as expected. This is because "last_start_time" is always 0.

        /* Limit the start retry to once a wal_retrieve_retry_interval */
        if (TimestampDifferenceExceeds(last_start_time, now,
          wal_retrieve_retry_interval))

"last_start_time" is set to "now" when the launcher starts up new
worker. But "last_start_time" is defined and always initialized with 0
at the beginning of the main loop in ApplyLauncherMain(), so
the above condition becomes always true. This is obviously a bug.
Attached patch fixes this issue.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] some review comments on logical rep code

From
Petr Jelinek
Date:
On 26/04/17 18:36, Fujii Masao wrote:
> On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>
>>>> On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
>>>> <petr.jelinek@2ndquadrant.com> wrote:
>>>>> On 26/04/17 01:01, Fujii Masao wrote:
>>>>>>>> However this is overkill for small gain and false wakeup of the
>>>>>>>> launcher is not so harmful (probably we can live with that), so
>>>>>>>> we do nothing here for this issue.
>>>>>>>
>>>>>>> I agree this as a whole. But I think that the main issue here is
>>>>>>> not false wakeups, but 'possible delay of launching new workers
>>>>>>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>>>>>>> though). We can live with this failure when using two-paase
>>>>>>> commmit, but I think it shouldn't happen silently.
>>>>>>>
>>>>>>>
>>>>>>> How about providing AtPrepare_ApplyLauncher(void) like the
>>>>>>> follows and calling it in PrepareTransaction?
>>>>>>
>>>>>> Or we should apply the attached patch and handle the 2PC case properly?
>>>>>> I was thinking that it's overkill more than necessary, but that seems not true
>>>>>> as far as I implement that.
>>>>>>
>>>>> Looks like it does not even increase size of the 2pc file, +1 for this.
>>>>
>>>> In my honest opinion, I didn't have a big will that we should handle
>>>> even two-phase commit case, because this case is very rare (I could
>>>> not image such case) and doesn't mean to lead a harmful result such as
>>>> crash of server and returning inconsistent result. it just delays the
>>>> launching worker for at most 3 minutes. We also can deal with this for
>>>> example by making maximum nap time of apply launcher user-configurable
>>>> and document it.
>>>> But if we can deal with it by minimum changes like attached your patch I agree.
>>>
>>> This change looks reasonable to me, +1 from me to this.
>>>
>>> The patch reads on_commit_launcher_wakeup directly then updates
>>> it via ApplyALuncherWakeupAtCommit() but it's too much to add a
>>> function for the sake of this.
>>
>> OK, so what about the attached patch? I replaced all the calls to
>> ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".
> 
> BTW, while I was reading the code to implement the patch that
> I posted upthread, I found that the following condition doesn't
> work as expected. This is because "last_start_time" is always 0.
> 
>         /* Limit the start retry to once a wal_retrieve_retry_interval */
>         if (TimestampDifferenceExceeds(last_start_time, now,
>           wal_retrieve_retry_interval))
> 
> "last_start_time" is set to "now" when the launcher starts up new
> worker. But "last_start_time" is defined and always initialized with 0
> at the beginning of the main loop in ApplyLauncherMain(), so
> the above condition becomes always true. This is obviously a bug.
> Attached patch fixes this issue.
> 

Yes, please go ahead and commit this.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] some review comments on logical rep code

From
Fujii Masao
Date:
On Thu, Apr 27, 2017 at 5:37 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 26/04/17 18:36, Fujii Masao wrote:
>> On Thu, Apr 27, 2017 at 1:28 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Wed, Apr 26, 2017 at 3:47 PM, Kyotaro HORIGUCHI
>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>> At Wed, 26 Apr 2017 14:31:12 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoDMy8a6UwMrRh8pigQbDC+JAOQ4m_tXT41VRP4SYp23=w@mail.gmail.com>
>>>>> On Wed, Apr 26, 2017 at 12:35 PM, Petr Jelinek
>>>>> <petr.jelinek@2ndquadrant.com> wrote:
>>>>>> On 26/04/17 01:01, Fujii Masao wrote:
>>>>>>>>> However this is overkill for small gain and false wakeup of the
>>>>>>>>> launcher is not so harmful (probably we can live with that), so
>>>>>>>>> we do nothing here for this issue.
>>>>>>>>
>>>>>>>> I agree this as a whole. But I think that the main issue here is
>>>>>>>> not false wakeups, but 'possible delay of launching new workers
>>>>>>>> by 3 minutes at most' (this is centainly a kind of false wakeups,
>>>>>>>> though). We can live with this failure when using two-paase
>>>>>>>> commmit, but I think it shouldn't happen silently.
>>>>>>>>
>>>>>>>>
>>>>>>>> How about providing AtPrepare_ApplyLauncher(void) like the
>>>>>>>> follows and calling it in PrepareTransaction?
>>>>>>>
>>>>>>> Or we should apply the attached patch and handle the 2PC case properly?
>>>>>>> I was thinking that it's overkill more than necessary, but that seems not true
>>>>>>> as far as I implement that.
>>>>>>>
>>>>>> Looks like it does not even increase size of the 2pc file, +1 for this.
>>>>>
>>>>> In my honest opinion, I didn't have a big will that we should handle
>>>>> even two-phase commit case, because this case is very rare (I could
>>>>> not image such case) and doesn't mean to lead a harmful result such as
>>>>> crash of server and returning inconsistent result. it just delays the
>>>>> launching worker for at most 3 minutes. We also can deal with this for
>>>>> example by making maximum nap time of apply launcher user-configurable
>>>>> and document it.
>>>>> But if we can deal with it by minimum changes like attached your patch I agree.
>>>>
>>>> This change looks reasonable to me, +1 from me to this.
>>>>
>>>> The patch reads on_commit_launcher_wakeup directly then updates
>>>> it via ApplyALuncherWakeupAtCommit() but it's too much to add a
>>>> function for the sake of this.
>>>
>>> OK, so what about the attached patch? I replaced all the calls to
>>> ApplyLauncherWakeupAtCommit() with the code "on_commit_launcher_wakeup = true".
>>
>> BTW, while I was reading the code to implement the patch that
>> I posted upthread, I found that the following condition doesn't
>> work as expected. This is because "last_start_time" is always 0.
>>
>>         /* Limit the start retry to once a wal_retrieve_retry_interval */
>>         if (TimestampDifferenceExceeds(last_start_time, now,
>>           wal_retrieve_retry_interval))
>>
>> "last_start_time" is set to "now" when the launcher starts up new
>> worker. But "last_start_time" is defined and always initialized with 0
>> at the beginning of the main loop in ApplyLauncherMain(), so
>> the above condition becomes always true. This is obviously a bug.
>> Attached patch fixes this issue.
>>
>
> Yes, please go ahead and commit this.

Pushed. Thanks!

Regards,

-- 
Fujii Masao



Re: [HACKERS] some review comments on logical rep code

From
Noah Misch
Date:
On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
> Pushed. Thanks!

Does this close the open item, or is there more to do?



Re: [HACKERS] some review comments on logical rep code

From
Masahiko Sawada
Date:
On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:
> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
>> Pushed. Thanks!
>
> Does this close the open item, or is there more to do?

There is only one item remaining, and the patch is attached on
here[1]. I guess reviewing that patch is almost done.

[1] https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] some review comments on logical rep code

From
Noah Misch
Date:
On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
> >> Pushed. Thanks!
> >
> > Does this close the open item, or is there more to do?
> 
> There is only one item remaining, and the patch is attached on
> here[1]. I guess reviewing that patch is almost done.
> 
> [1] https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com

Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
update.  Kindly send a status update within 24 hours, and include a date for
your subsequent status update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] some review comments on logical rep code

From
Peter Eisentraut
Date:
On 4/28/17 01:01, Noah Misch wrote:
> On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
>> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:
>>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
>>>> Pushed. Thanks!
>>>
>>> Does this close the open item, or is there more to do?
>>
>> There is only one item remaining, and the patch is attached on
>> here[1]. I guess reviewing that patch is almost done.
>>
>> [1] https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com
> 
> Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
> update.  Kindly send a status update within 24 hours, and include a date for
> your subsequent status update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I think the patch that Fujii Masao has proposed has found general
agreement.  I would recommend that he commits it as he sees fit.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] some review comments on logical rep code

From
Noah Misch
Date:
On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote:
> On 4/28/17 01:01, Noah Misch wrote:
> > On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
> >> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:
> >>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
> >>>> Pushed. Thanks!
> >>>
> >>> Does this close the open item, or is there more to do?
> >>
> >> There is only one item remaining, and the patch is attached on
> >> here[1]. I guess reviewing that patch is almost done.
> >>
> >> [1] https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com
> > 
> > Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
> > update.  Kindly send a status update within 24 hours, and include a date for
> > your subsequent status update.  Refer to the policy on open item ownership:
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I think the patch that Fujii Masao has proposed has found general
> agreement.  I would recommend that he commits it as he sees fit.

This is not a conforming status update, because it does not specify a date for
your next update.



Re: [HACKERS] some review comments on logical rep code

From
Robert Haas
Date:
On Sat, Apr 29, 2017 at 12:33 AM, Noah Misch <noah@leadboat.com> wrote:
>> I think the patch that Fujii Masao has proposed has found general
>> agreement.  I would recommend that he commits it as he sees fit.
>
> This is not a conforming status update, because it does not specify a date for
> your next update.

If Peter thinks that the issues fixed by this patch don't really need
to be corrected, then we could interpret this as a statement that it
should not be listed as an open item but that he does not object to
someone else fixing it anyway.

If these are real issues, then Peter should take responsibility for
them because they were created by his commit.  If Fujii Masao is
willing to help by committing the patch in question, great; but if
not, then Peter should be responsible for committing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] some review comments on logical rep code

From
Peter Eisentraut
Date:
I have committed this version.

I have omitted all the talk about 2PC.  There are discussions ongoing
about changing the transaction behavior of CREATE SUBSCRIPTION, which
might interfere with that.  If someone wants to rebase and propose the
parts about 2PC separately, I don't object, but it can be handled
separately.

I will close this open item now, since I think we have covered
everything that was originally reported.  Please start new threads if
there are any issues that were missed or that have been discovered
during the discussion.


On 4/23/17 22:18, Masahiko Sawada wrote:
> On Sat, Apr 22, 2017 at 4:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Fri, Apr 21, 2017 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Fri, Apr 21, 2017 at 1:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Tue, Apr 18, 2017 at 5:16 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Thank you for the revised version.
>>>>>>
>>>>>> At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ@mail.gmail.com>
>>>>>>> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>>>>> On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
>>>>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>>>>> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q@mail.gmail.com>
>>>>>>>>>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>>>>>>>> 1.
>>>>>>>>>>>
>>>>>>>>>>> In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
>>>>>>>>>>> value from the argument, instead of DatumGetObjectId().
>>>>>>>>>>
>>>>>>>>>> Attached 001 patch fixes it.
>>>>>>>>>
>>>>>>>>> Hmm. I looked at the launcher side and found that it assigns bare
>>>>>>>>> integer to bgw_main_arg. It also should use Int32GetDatum.
>>>>>>>>
>>>>>>>> Yeah, I agreed. Will fix it.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>>>>> 2.
>>>>>>>>>>>
>>>>>>>>>>> No one resets on_commit_launcher_wakeup flag to false.
>>>>>>>>>>
>>>>>>>>>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
>>>>>>>>>> up the launcher process.
>>>>>>>>>
>>>>>>>>> It is omitting the abort case. Maybe it should be
>>>>>>>>> AtEOXact_ApplyLauncher(bool commit)?
>>>>>>>>
>>>>>>>> Should we wake up the launcher process even when abort?
>>>>>>
>>>>>> No, I meant that on_commit_launcher_wakeup should be just reset
>>>>>> without waking up launcher on abort.
>>>>>
>>>>> I understood. Sounds reasonable.
>>>>
>>>> ROLLBACK PREPARED should not wake up the launcher, but not even with the patch.
>>>
>>> Yeah, I've missed it. But on_commit_launcher_wakeup dosen't correspond
>>> to a prepared transaction. Even COMMIT PREPARED might wake up the
>>> launcher process but might not wake up it. ROLLBACK PREPARED is also
>>> the same. For example, in the following situation we wake up the
>>> launcher at COMMIT, not at COMMIT PREPARED.
>>>
>>> (BTW, CreateSubscription, is the only one function that sets
>>> on_commit_launcher_wakeup for now, cannot be called in a transaction
>>> block. So it doesn't actually happen that we wake up the launcher when
>>> a ROLLBACK PREPARED. But considering waking up the launcher by ALTER
>>> SUBSCRIPTION ENABLE, we need to deal with it.)
>>
>> We can run CREATE SUBSCRIPTION with NOCREATE SLOT option
>> within the transaction block.
> 
> Oops, I'd missed it.
> 
>>
>>>
>>> BEGIN;
>>> ALTER SUBSCRIPTION hoge_sub ENABLE;
>>> PREPARE TRANSACTION 'g';
>>> BEGIN;
>>> SELECT 1;
>>> COMMIT;  -- wake up the launcher at this point.
>>> COMMIT PREPARED 'g';
>>>
>>> Even if we wake up the launcher even when a ROLLBACK PREPARED, it's
>>> not a big deal to the launcher process actually. Compared with the
>>> complexly of changing the logic so that on_commit_launcher_wakeup
>>> corresponds to prepared transaction, we might want to accept this
>>> behavior.
>>
>> on_commit_launcher_wakeup needs to be recoreded in 2PC state
>> file to handle this issue properly. But I agree with you, that would
>> be overkill for small gain. So I'm thinking to add the following
>> comment into your patch and push it. Thought?
>>
>> -------------------------
>> Note that on_commit_launcher_wakeup flag doesn't work as expected in 2PC case.
>> For example, COMMIT PREPARED on the transaction enabling the flag
>> doesn't wake up
>> the logical replication launcher if ROLLBACK on another transaction
>> runs before it.
>> To handle this case properly, the flag needs to be recorded in 2PC
>> state file so that
>> COMMIT PREPARED and ROLLBACK PREPARED can determine whether to wake up
>> the launcher. However this is overkill for small gain and false wakeup
>> of the launcher
>> is not so harmful (probably we can live with that), so we do nothing
>> here for this issue.
>> -------------------------
>>
> 
> Agreed.
> 
> I added this comment to the patch and attached it.
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 
> 
> 
> 


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] some review comments on logical rep code

From
Peter Eisentraut
Date:
On 4/29/17 00:33, Noah Misch wrote:
> On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote:
>> On 4/28/17 01:01, Noah Misch wrote:
>>> On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
>>>> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch <noah@leadboat.com> wrote:
>>>>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
>>>>>> Pushed. Thanks!
>>>>>
>>>>> Does this close the open item, or is there more to do?
>>>>
>>>> There is only one item remaining, and the patch is attached on
>>>> here[1]. I guess reviewing that patch is almost done.
>>>>
>>>> [1] https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com
>>>
>>> Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
>>> update.  Kindly send a status update within 24 hours, and include a date for
>>> your subsequent status update.  Refer to the policy on open item ownership:
>>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>> I think the patch that Fujii Masao has proposed has found general
>> agreement.  I would recommend that he commits it as he sees fit.
> 
> This is not a conforming status update, because it does not specify a date for
> your next update.

Everything related to this is fixed now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services