Thread: BUG #9169: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

BUG #9169: Replica (v 9.3.2) crashed with "PANIC: WAL contains references to invalid pages"

From
mcassiano@manord.com
Date:
The following bug has been logged on the website:

Bug reference:      9169
Logged by:          Marco Cassiano
Email address:      mcassiano@manord.com
PostgreSQL version: 9.3.2
Operating system:   Centos 6.4
Description:

Hello everybody,

This weeked both replicas of our main db crashed at the same time with this
error :

2014-02-09 11:42:51 GMT    0 52c671da.14da - PANIC:  WAL contains references
to invalid pages
2014-02-09 11:42:51 GMT    0 52c671da.14da - CONTEXT:  xlog redo vacuum: rel
1663/16433/29449; blk 181466, lastBlockVacuumed 181463
2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  startup process (PID
5338) was terminated by signal 6: Aborted
2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  terminating any other
active server processes


All three servers (main + two replicas) are on v. 9.3.2 running on Centos
6.4

We upgraded one month ago the main db from v 9.2.6 to 9.3.2 through
pg_upgrade and had the replicas rebuilt on 9.3.2

I searched the mailing lists and found someone that had the same problem in
the past but it seems that their problem was fixed by already released
patches.

( see thread
http://www.postgresql.org/message-id/675b7cee-b7f0-4e32-8e34-1efaf3ca5fe9@email.android.com)

So it seems that our problem is a new one since we are running the latest
version…….

Thank you for your help

Marco
On 02/10/2014 10:31 AM, mcassiano@manord.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      9169
> Logged by:          Marco Cassiano
> Email address:      mcassiano@manord.com
> PostgreSQL version: 9.3.2
> Operating system:   Centos 6.4
> Description:
>
> Hello everybody,
>
> This weeked both replicas of our main db crashed at the same time with this
> error :
>
> 2014-02-09 11:42:51 GMT    0 52c671da.14da - PANIC:  WAL contains references
> to invalid pages
> 2014-02-09 11:42:51 GMT    0 52c671da.14da - CONTEXT:  xlog redo vacuum: rel
> 1663/16433/29449; blk 181466, lastBlockVacuumed 181463
> 2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  startup process (PID
> 5338) was terminated by signal 6: Aborted
> 2014-02-09 11:42:52 GMT    0 52c671d9.14d1 - LOG:  terminating any other
> active server processes
>
>
> All three servers (main + two replicas) are on v. 9.3.2 running on Centos
> 6.4
>
> We upgraded one month ago the main db from v 9.2.6 to 9.3.2 through
> pg_upgrade and had the replicas rebuilt on 9.3.2
>
> I searched the mailing lists and found someone that had the same problem in
> the past but it seems that their problem was fixed by already released
> patches.
>
> ( see thread
> http://www.postgresql.org/message-id/675b7cee-b7f0-4e32-8e34-1efaf3ca5fe9@email.android.com)
>
> So it seems that our problem is a new one since we are running the latest
> version…….

There has unfortunately been several bugs with similar looking symptoms
lately. This looks like the bug reported here:
http://www.postgresql.org/message-id/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com.
That was fixed only recently, and the fix isn't included in 9.3.2 yet.
It will be included in 9.3.3, which is scheduled for next week.

As a work-around, recovery should be able to get past that point if you
disable hot standby. Once it's recovered past that point, and past the
next checkpoint, you can re-enable hot standby and restart.

- Heikki