Re: [HACKERS] some review comments on logical rep code - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] some review comments on logical rep code
Date
Msg-id 54949025-ce80-3d77-1e97-e4d6716907c5@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] some review comments on logical rep code  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] some review comments on logical rep code
List pgsql-hackers
Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>
>>>>>> 3.
>>>>>>>
>>>>>>> ApplyLauncherWakeup() should be static function.
>>>>>>
>>>>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>>>>
>>>>>> 4.
>>>>>>>
>>>>>>> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>>>>>> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>>>>
>>>>>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>>>>>
>>>>>> 5.
>>>>>>>
>>>>>>> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>>>>
>>>>>> I also guess it's necessary. This change is included in #3 patch.
>>>>>
>>>>> IsBackendPid() is not free, I suppose that just ignoring failure
>>>>> with ESRCH is enough.
>>>>
>>>> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
>>>> check if launcher_pid != 0?
>>
>> Yes. For example, code to send a signal to autovacuum launcher
>> looks as the follows. I don't think there's a reason to do
>> different thing here.
>>
>> |       avlauncher_needs_signal = false;
>> |       if (AutoVacPID != 0)
>> |         kill(AutoVacPID, SIGUSR2);
>>
> 
> Fixed.

Yes the IsBackendPid was needed only because we didn't reset
launcher_pid and it was causing issues otherwise obviously (and I didn't
realize we are not resetting it...)

> 
>>
>>>>>> 6.
>>>>>>>
>>>>>>> Normal statements that the walsender for logical rep runs are logged
>>>>>>> by log_replication_commands. So if log_statement is also set,
>>>>>>> those commands are logged twice.
>>>>>>
>>>>>> Attached 006 patch fixes it by adding  "-c log_statement = none" to
>>>>>> connection parameter of libpqrcv if logical = true.
>>>>>
>>>>> The control by log_replication_commands is performed on
>>>>> walsender, so this also shuld be done on the same place. Anyway
>>>>> log_statement is irrelevant to walsender.
>>>>
>>>> Patch 006 emits all logs executed by the apply worker as a log of
>>>> log_replication_command but perhaps you're right. It would be better
>>>> to leave the log of log_statement if the command executed by the apply
>>>> worker is a SQL command. Will fix.
>>
>> Here, we can choose whether a SQL command is a part of
>> replication commands or not. The previous fix thought it is and
>> this doesn't. (They are emitted in different forms) Is this ok?
> 
> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
> T_SQLCmd is logged later in exec_simple_query. The
> log_replication_command logs only replication commands[1], it doesn't
> mean to log commands executed on replication connection. I think such
> behavior is right.
> 
>> I'm not sure why you have moved the location of the code, but if
>> it should show all received commands, it might be better to be
>> placed before CHECK_FOR_INTERRUPTS..
> 
> Hmm, the reason why I've moved it is that we cannot know whether given
> cmd_string is a SQL command or a replication command before parsing.
> 

Well the issue is that then the query is not logged if there was parsing
issue even when logging was specifically requested. I don't know what's
good solution here, maybe teaching exec_simple_query to not log the
query if it came from walsender.

BTW reading that function in walsender, there is :
>     /*
>      * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
>      * command arrives. Clean up the old stuff if there's anything.
>      */
>     SnapBuildClearExportedSnapshot();

and then

>     /*
>      * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
>      * called outside of transaction the snapshot should be cleared here.
>      */
>     if (!IsTransactionBlock())
>         SnapBuildClearExportedSnapshot();

The first one should not be there, it looks like a result of some merge
conflict being solved wrongly. Maybe your patch could fix that too?

>>>>
>>>>>> 10.
>>>>>>>
>>>>>>>     SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>>     MyLogicalRepWorker->relstate =
>>>>>>>       GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>>             MyLogicalRepWorker->relid,
>>>>>>>             &MyLogicalRepWorker->relstate_lsn,
>>>>>>>             false);
>>>>>>>     SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>
>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>> be called while spinlock is being held.
>>>>>>>
>>>>>>
>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>> take time. I think that the discussion would be needed to get
>>>>>> consensus, so patch for it is not attached.
>>>>>
>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>> harm. Bt I thing it as more significant problem that
>>>>> potentially-throwable function is called within the mutex
>>>>> region. It potentially causes a kind of dead lock.  (That said,
>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>> dead lock..)
>>>>>

Hmm, I think doing what I attached should be fine here. We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.


-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Passing values to a dynamic background worker
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Why does logical replication launcher set application_name?