Home > mailing lists

Re: Problem while setting the fpw with SIGHUP - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Problem while setting the fpw with SIGHUP
Date	March 31, 2018 15:13:58
Msg-id	CAA4eK1JSHSPhkHZXj5q2yf3x1MgBN0oYHb9JvcoVh9ZYqB5g+w@mail.gmail.com Whole thread Raw
In response to	Re: Problem while setting the fpw with SIGHUP (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses	Re: Problem while setting the fpw with SIGHUP
List	pgsql-hackers

Tree view

On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <michael@paquier.xyz> wrote in
<20180327130226.GA1105@paquier.xyz>
>> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
>> > The current UpdateFullPageWrites is safe on standby and promotion
>> > so what we should consider is only the non-standby case. I think
>> > what we should do is just calling RecoveryInProgress() at the
>> > beginning of CheckPointerMain, which is just the same thing with
>> > InitPostgres, but before setting up signal handler to avoid
>> > processing SIGHUP before being ready to insert xlog.
>>
>> Your proposal does not fix the issue for a checkpointer process started
>> on a standby.  After a promotion, if SIGHUP is issued with a change in
>> full_page_writes, then the initialization of InitXLogInsert() would
>> happen again in the critical section of UpdateFullPageWrites().  The
>> window is rather small for normal promotions as the startup process
>> requests a checkpoint which would do the initialization, and much larger
>> for fallback_promote where the startup process is in charge of doing the
>> end-of-recovery checkpoint.
>
> Yeah. I realized that after sending the mail.
>
> I looked closer and found several problems there.
>
> - On standby, StartupXLOG calls UpdateFullPageWrites and
>   checkpointer can call the same function simultaneously, but it
>   doesn't assume concurrent call.
>
> - StartupXLOG can make a concurrent write to
>   Insert->fullPageWrite so it needs to be locked.
>
> - At the time of the very end of recovery, the startup process
>   ignores possible change of full_page_writes GUC. It sticks with
>   the startup value. It leads to loss of
>   XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
>   reload)
>

Won't this be covered by checkpointer process?  Basically, the next
time checkpointer sees that it has received the sighup, it will call
UpdateFullPageWrites which will log the record if required.

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable?  I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages.  Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Magnus Hagander
Date: 31 March 2018, 15:02:54
Subject: Re: Online enabling of checksums

From: CharSyam
Date: 31 March 2018, 15:38:55
Subject: Re: [HACKERS][PATCH] adding simple sock check for windows

Re: Problem while setting the fpw with SIGHUP - Mailing list pgsql-hackers

Previous

Next