Thread: Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

On Sun, Sep 16, 2012 at 2:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
> Recovery code documents clearly that a shutdown checkpoint is executed at
> end of recovery - a shutdown checkpoint WAL record is written but the buffer
> manager had been altered to treat end of recovery as a normal checkpoint.
> This bug exacerbates the bufmgr relpersistence bug.
>
> Bug spotted by Andres Freund, patch by me.

I am confused by this patch.  It seems to me that the effect of this
patch is to force unlogged buffers to be written at end-of-recovery as
well as at shutdown.  But, barring bugs elsewhere, there shouldn't be
any unlogged buffers in shared_buffers at end-of-recovery, so this
won't make any difference at all.  Am I missing something?

Maybe what we should do is - if this is an end-of-recovery checkpoint
- *assert* that the BM_PERMANENT bit is set on every buffer we find.
That would provide a useful cross-check that we don't have a bug
similar to the one Jeff already fixed in any other code path.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Monday, September 17, 2012 04:59:06 PM Robert Haas wrote:
> On Sun, Sep 16, 2012 at 2:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown
> > checkpoint. Recovery code documents clearly that a shutdown checkpoint
> > is executed at end of recovery - a shutdown checkpoint WAL record is
> > written but the buffer manager had been altered to treat end of recovery
> > as a normal checkpoint. This bug exacerbates the bufmgr relpersistence
> > bug.
> > 
> > Bug spotted by Andres Freund, patch by me.
> 
> I am confused by this patch.  It seems to me that the effect of this
> patch is to force unlogged buffers to be written at end-of-recovery as
> well as at shutdown.  But, barring bugs elsewhere, there shouldn't be
> any unlogged buffers in shared_buffers at end-of-recovery, so this
> won't make any difference at all.  Am I missing something?
I just noted during investigating of the impact of the fakerelcache bug that 
contrary to whats claimed at several places END_OF_RECOVERY checkpoints do 
*not* behave the same way CHECKPOINT_IS_SHUTDOWN ones do. Which doesn't seem to 
be a good idea. E.g. the impact of this bug would have been smaller if they 
were really treated the same. Unless I missed something thats the only place of 
relevance that treats them differently.
Imo treating them different in some remote places (2 calls away) is a good way 
to introduce further bugs.

> Maybe what we should do is - if this is an end-of-recovery checkpoint
> - *assert* that the BM_PERMANENT bit is set on every buffer we find.
> That would provide a useful cross-check that we don't have a bug
> similar to the one Jeff already fixed in any other code path.
I haven't looked into the details, but can't a new unlogged relation be created 
since the last checkpoint and thus have pages in s_b?

Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



On 17 September 2012 15:59, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Sep 16, 2012 at 2:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
>> Recovery code documents clearly that a shutdown checkpoint is executed at
>> end of recovery - a shutdown checkpoint WAL record is written but the buffer
>> manager had been altered to treat end of recovery as a normal checkpoint.
>> This bug exacerbates the bufmgr relpersistence bug.
>>
>> Bug spotted by Andres Freund, patch by me.
>
> I am confused by this patch.  It seems to me that the effect of this
> patch is to force unlogged buffers to be written at end-of-recovery as
> well as at shutdown.  But, barring bugs elsewhere, there shouldn't be
> any unlogged buffers in shared_buffers at end-of-recovery, so this
> won't make any difference at all.

There shouldn't be, but this coding is the fail safe way.

> Am I missing something?

If you or others do, this will save us.

> Maybe what we should do is - if this is an end-of-recovery checkpoint
> - *assert* that the BM_PERMANENT bit is set on every buffer we find.
> That would provide a useful cross-check that we don't have a bug
> similar to the one Jeff already fixed in any other code path.

Safety net is needed there, not an Assert.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



On Mon, Sep 17, 2012 at 11:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I just noted during investigating of the impact of the fakerelcache bug that
> contrary to whats claimed at several places END_OF_RECOVERY checkpoints do
> *not* behave the same way CHECKPOINT_IS_SHUTDOWN ones do. Which doesn't seem to
> be a good idea. E.g. the impact of this bug would have been smaller if they
> were really treated the same. Unless I missed something thats the only place of
> relevance that treats them differently.
> Imo treating them different in some remote places (2 calls away) is a good way
> to introduce further bugs.

OK, I can agree with that.  As a backstop against future mistakes, it
makes some sense to me.

>> Maybe what we should do is - if this is an end-of-recovery checkpoint
>> - *assert* that the BM_PERMANENT bit is set on every buffer we find.
>> That would provide a useful cross-check that we don't have a bug
>> similar to the one Jeff already fixed in any other code path.
> I haven't looked into the details, but can't a new unlogged relation be created
> since the last checkpoint and thus have pages in s_b?

Data changes to unlogged relations are not WAL-logged, so there's no
reason for recovery to ever read them.  Even if such a reason existed,
there wouldn't be anything to read, because the backing files are
unlinked before WAL replay begins.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tuesday, September 18, 2012 04:18:01 AM Robert Haas wrote:
> >> Maybe what we should do is - if this is an end-of-recovery checkpoint
> >> - *assert* that the BM_PERMANENT bit is set on every buffer we find.
> >> That would provide a useful cross-check that we don't have a bug
> >> similar to the one Jeff already fixed in any other code path.
> > 
> > I haven't looked into the details, but can't a new unlogged relation be
> > created since the last checkpoint and thus have pages in s_b?
> 
> Data changes to unlogged relations are not WAL-logged, so there's no
> reason for recovery to ever read them.  Even if such a reason existed,
> there wouldn't be anything to read, because the backing files are
> unlinked before WAL replay begins.
Back then I thought that resetting the relation by copying the init fork might 
use the buffer cache. It doesn't atm...

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services