Re: FSM corruption leading to errors - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: FSM corruption leading to errors
Date
Msg-id CABOikdO=Tryjc9CiKBdbXP3KjcRGZgNY9mT=AKLFszUCRpEgQw@mail.gmail.com
Whole thread Raw
In response to Re: FSM corruption leading to errors  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: FSM corruption leading to errors  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers


On Wed, Oct 19, 2016 at 2:37 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:


Actually, this is still not 100% safe. Flushing the WAL before modifying the FSM page is not enough. We also need to WAL-log a full-page image of the FSM page, otherwise we are still vulnerable to the torn page problem.

I came up with the attached. This is fortunately much simpler than my previous attempt. I replaced the MarkBufferDirtyHint() calls with MarkBufferDirty(), to fix the original issue, plus WAL-logging a full-page image to fix the torn page issue.


Looks good to me.
 
BTW any thoughts on race-condition on the primary? Comments at
MarkBufferDirtyHint() seems to suggest that a race condition is possible
which might leave the buffer without the DIRTY flag, but I'm not sure if
that can only happen when the page is locked in shared mode.

I think the race condition can only happen when the page is locked in shared mode. In any case, with this proposed fix, we'll use MarkBufferDirty() rather than MarkBufferDirtyHint(), so it's moot.


Yes, the fix will cover that problem (if it exists). The reason why I was curious to know is because there are several reports of similar error in the past and some of them did not involve as standby. Those reports mostly remained unresolved and I wondered if this explains them. But yeah, my conclusion was that the race is not possible with page locked in EXCLUSIVE mode. So may be there is another problem somewhere or a crash recovery may have left the FSM in inconsistent state.

Anyways, we seem good to go with the patch.

Thanks,
Pavan
--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Indirect indexes
Next
From: Greg Stark
Date:
Subject: Re: LLVM Address Sanitizer (ASAN) and valgrind support