Re: Patch for fail-back without fresh backup - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Patch for fail-back without fresh backup
Date
Msg-id 20130614140111.GE19500@alap2.anarazel.de
Whole thread Raw
In response to Re: Patch for fail-back without fresh backup  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Patch for fail-back without fresh backup
List pgsql-hackers
On 2013-06-14 09:21:52 -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > On 14.06.2013 16:08, Tom Lane wrote:
> >> Refresh my memory as to why we need to WAL-log hints for checksumming?
> 
> > Torn pages:
> 
> So it's not that we actually need to log the individual hint bit
> changes, it's that we need to WAL-log a full page image on the first
> update after a checkpoint, so as to recover from torn-page cases.
> Which one are we doing?


From quickly looking at the code again I think the MarkBufferDirtyHint()
code makes at least one assumption that isn't correct in the fact of
checksums.

It tests for the need to dirty the page with:if ((bufHdr->flags & (BM_DIRTY | BM_JUST_DIRTIED)) !=    (BM_DIRTY |
BM_JUST_DIRTIED))

*before* taking a lock. A comment explains why that is safe:
        * Since we make this test unlocked, there's a chance we * might fail to notice that the flags have just been
cleared,and failed * to reset them, due to memory-ordering issues.
 

That's fine for the classical usecase without checksums but what about
the following scenario:

1) page is dirtied, FPI is logged
2) SetHintBits gets called on the same page, holding only a share lock
3) checkpointer/bgwriter/... writes out the the page, clearing the dirty  flag
4) checkpoint finishes, updates redo ptr
5) SetHintBits actually modifies the hint bits
6) SetHintBits calls MarkBufferDirtyHint which doesn't notice that the  page isn't dirty anymore and thus doesn't check
whethersomething  needs to get logged.
 

At this point we have a page that has been modified without an FPI. But
it's not marked dirty, so it won't be written out without further
cause. Which might be fine since there's no cause to write out the page
and there probably won't be anyone doing that without logging an FPI
independently.
Can anybody see a scenario where this is actually dangerous?

Since

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: MD5 aggregate
Next
From: Heikki Linnakangas
Date:
Subject: Re: Patch for fail-back without fresh backup