Re: Problem while setting the fpw with SIGHUP - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Problem while setting the fpw with SIGHUP
Date
Msg-id CAA4eK1JvKxsHfM6GCHoKNas-7vDSniLgaAm=cG_OaQaOYRNc3w@mail.gmail.com
Whole thread Raw
In response to Re: Problem while setting the fpw with SIGHUP  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Fri, Apr 20, 2018 at 6:06 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Apr 20, 2018 at 11:40 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> By the way, I think I found a bug of FPW.
>>
>> The following steps yields INSERT record that doesn't have a FPI
>> after a checkpoint.
>>
>> (Start server with full_page_writes = off)
>> CREATE TABLE t (a int);
>> CHECKPOINT;
>> INSERT INTO t VALUES (1);
>> ALTER SYSTEM SET full_page_writes TO on;
>> SELECT pg_reload_conf();
>> CHECKPOINT;
>> INSERT INTO t VALUES (1);
>>
>> The last insert is expected to write a record with FPI but it
>> doesn't actually. No FPI will be written for the page after that.
>>
>> It seems that the reason is that XLogInsertRecord is forgetting
>> to check doPageWrites' update.
>>
>> In the failure case, fpw_lsn has been set by XLogRecordAssemble
>> but doPageWrite is false at the time and it considers that no FPI
>> is required then it sets fpw_lsn to InvalidXLogRecPtr.
>>
>> After that, XLogInsertRecord receives the record but it thinks
>> that doPageWrites is true since it looks the shared value.
>>
>>> if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
>>
>> So this line thinks that "no FPI is omitted in this record" but
>> actually the record is just forgotten to attach them.
>>
>> The attached patch lets XLogInsertRecord check if doPageWrites
>> has been turned on after the last call and cause recomputation in
>> the case.
>>
>
> I think you have correctly spotted the problem and your fix looks good
> to me.  As this is a separate problem and fix is different from what
> we are discussing here, I think this can be committed it separately.
>

AFAICS, this problem has been introduced by commit
2c03216d831160bedd72d45f712601b6f7d03f1c, so we should backpatch till
9.5.  Please find attached the slightly modified patch for this bug.
I have modified one of the comments in the patch and the proposed
commit message.  Can you please once cross-verify if the problem exits
till 9.5 and that this patch fixes it?  Also, I don't see an easy way
to write a test for it, do you have anything in mind?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Quan Tran
Date:
Subject: Re: Segfault logical replication PG 10.4
Next
From: Amit Kapila
Date:
Subject: Re: Problem while setting the fpw with SIGHUP