What's the condition of bug "PANIC: WAL contains references to invalid pages"? - Mailing list pgsql-hackers

From MauMau
Subject What's the condition of bug "PANIC: WAL contains references to invalid pages"?
Date
Msg-id 1C1E6786B6704DD5B39B52CF244045A2@maumau
Whole thread Raw
List pgsql-hackers
Hello,

Please tell me a bit about the following bug which has just been solved.  I 
wish this is exactly what has been annoying for a year.

Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
http://www.postgresql.org/message-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com

I've read the discussion, but I'm wondering what the condition where this 
failure happens.  I guess I understand the following conditions need to hold 
true.  Are there any other conditions?

* The database server crashes while a btree index is being extended (by page 
split).
* Hot standby is used.
* The standby is rebuilt and started.

When I last investigated the bug, the user was doing repeated failover 
testing --- stop the master by running "pg_ctl stop -mi" while some 
application was performing database updates, promote the standby, rebuild 
the standby with pg_basebackup, and start the new standby.  In one of those 
iterations, the newly rebuilt standby crashed with "WAL contains references 
to invalid pages".  This seems to match the above mail thread.

However, I don't understand why btree_xlog_vacuum() encountered an all-zero 
page.  How did the all-zero page appear on the standby?  Was it transferred 
from master by pg_basebackup?  FYI, the server log didn't contain any 
messages related to disk full, nor any ERROR messages.

Regards
MauMau




pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: plpgsql.warn_shadow
Next
From: Jan Kara
Date:
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance