Re: [HACKERS] some review comments on logical rep code - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [HACKERS] some review comments on logical rep code
Date
Msg-id CAHGQGwFQ1mARau+0h1_3EeO+3hmfiN8Z28WqEdN2M0S8Jtu5oQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] some review comments on logical rep code  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Responses Re: [HACKERS] some review comments on logical rep code  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
List pgsql-hackers
On Wed, Apr 19, 2017 at 1:35 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> Thank you for working on this!
>
> On 18/04/17 10:16, Masahiko Sawada wrote:
>> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>
>>>>>>> 3.
>>>>>>>>
>>>>>>>> ApplyLauncherWakeup() should be static function.
>>>>>>>
>>>>>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>>>>>
>>>>>>> 4.
>>>>>>>>
>>>>>>>> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>>>>>>> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>>>>>
>>>>>>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>>>>>>
>>>>>>> 5.
>>>>>>>>
>>>>>>>> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>>>>>
>>>>>>> I also guess it's necessary. This change is included in #3 patch.
>>>>>>
>>>>>> IsBackendPid() is not free, I suppose that just ignoring failure
>>>>>> with ESRCH is enough.
>>>>>
>>>>> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
>>>>> check if launcher_pid != 0?
>>>
>>> Yes. For example, code to send a signal to autovacuum launcher
>>> looks as the follows. I don't think there's a reason to do
>>> different thing here.
>>>
>>> |       avlauncher_needs_signal = false;
>>> |       if (AutoVacPID != 0)
>>> |         kill(AutoVacPID, SIGUSR2);
>>>
>>
>> Fixed.
>
> Yes the IsBackendPid was needed only because we didn't reset
> launcher_pid and it was causing issues otherwise obviously (and I didn't
> realize we are not resetting it...)
>
>>
>>>
>>>>>>> 6.
>>>>>>>>
>>>>>>>> Normal statements that the walsender for logical rep runs are logged
>>>>>>>> by log_replication_commands. So if log_statement is also set,
>>>>>>>> those commands are logged twice.
>>>>>>>
>>>>>>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>>>>>>> connection parameter of libpqrcv if logical = true.
>>>>>>
>>>>>> The control by log_replication_commands is performed on
>>>>>> walsender, so this also shuld be done on the same place. Anyway
>>>>>> log_statement is irrelevant to walsender.
>>>>>
>>>>> Patch 006 emits all logs executed by the apply worker as a log of
>>>>> log_replication_command but perhaps you're right. It would be better
>>>>> to leave the log of log_statement if the command executed by the apply
>>>>> worker is a SQL command. Will fix.
>>>
>>> Here, we can choose whether a SQL command is a part of
>>> replication commands or not. The previous fix thought it is and
>>> this doesn't. (They are emitted in different forms) Is this ok?
>>
>> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
>> T_SQLCmd is logged later in exec_simple_query. The
>> log_replication_command logs only replication commands[1], it doesn't
>> mean to log commands executed on replication connection. I think such
>> behavior is right.
>>
>>> I'm not sure why you have moved the location of the code, but if
>>> it should show all received commands, it might be better to be
>>> placed before CHECK_FOR_INTERRUPTS..
>>
>> Hmm, the reason why I've moved it is that we cannot know whether given
>> cmd_string is a SQL command or a replication command before parsing.
>>
>
> Well the issue is that then the query is not logged if there was parsing
> issue even when logging was specifically requested. I don't know what's
> good solution here, maybe teaching exec_simple_query to not log the
> query if it came from walsender.
>
> BTW reading that function in walsender, there is :
>>       /*
>>        * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
>>        * command arrives. Clean up the old stuff if there's anything.
>>        */
>>       SnapBuildClearExportedSnapshot();
>
> and then
>
>>       /*
>>        * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
>>        * called outside of transaction the snapshot should be cleared here.
>>        */
>>       if (!IsTransactionBlock())
>>               SnapBuildClearExportedSnapshot();
>
> The first one should not be there, it looks like a result of some merge
> conflict being solved wrongly. Maybe your patch could fix that too?
>
>>>>>
>>>>>>> 10.
>>>>>>>>
>>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>>>     MyLogicalRepWorker->relstate =
>>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>>>             MyLogicalRepWorker->relid,
>>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
>>>>>>>>             false);
>>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>>
>>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>>> be called while spinlock is being held.
>>>>>>>>
>>>>>>>
>>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>>> take time. I think that the discussion would be needed to get
>>>>>>> consensus, so patch for it is not attached.
>>>>>>
>>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>>> harm. Bt I thing it as more significant problem that
>>>>>> potentially-throwable function is called within the mutex
>>>>>> region. It potentially causes a kind of dead lock.  (That said,
>>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>>> dead lock..)
>>>>>>
>
> Hmm, I think doing what I attached should be fine here.

Thanks for the patch!

> We don't need to
> hold spinlock for table read, just for changing the
> MyLogicalRepWorker->relstate.

Yes, but the update of MyLogicalRepWorker->relstate_lsn also should
be protected with the spinlock.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Why does logical replication launcher set application_name?
Next
From: Jan Michálek
Date:
Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki