Re: WAL contains references to invalid pages - Mailing list pgsql-general

From JotaComm
Subject Re: WAL contains references to invalid pages
Date
Msg-id CAA8OQ69LbF7gd=RqMeGaK-oboX2Yvome4OVeYQy3NEkkx2yY9w@mail.gmail.com
Whole thread Raw
In response to WAL contains references to invalid pages  (JotaComm <jota.comm@gmail.com>)
List pgsql-general



Hello,




2013/5/21 Adarsh Sharma <eddy.adarsh@gmail.com>
Try to take backups of that table & index only. If succeeded drop and recreate them. May be it fix your issue.

On Monday night I made the slave server. Yesterday I was analyzing the log files and I found the following messages.

2013-05-21 15:13:48 BRT [30686]: [25-1] user=,db= WARNING:  page 136714 of relation base/79251/79262 is uninitialized
2013-05-21 15:13:48 BRT [30686]: [26-1] user=,db= CONTEXT:  xlog redo visible: rel 1663/79251/79262; blk 136714
2013-05-21 15:13:48 BRT [30686]: [27-1] user=,db= PANIC:  WAL contains references to invalid pages
2013-05-21 15:13:48 BRT [30686]: [28-1] user=,db= CONTEXT:  xlog redo visible: rel 1663/79251/79262; blk 136714
2013-05-21 15:13:49 BRT [30684]: [2-1] user=,db= LOG:  startup process (PID 30686) was terminated by signal 6: Aborted
2013-05-21 15:13:49 BRT [30684]: [3-1] user=,db= LOG:  terminating any other active server processes

It's the same problem, but now is in another table.

According the documentation: http://www.postgresql.org/docs/9.2/interactive/release-9-2-3.html

  • Fix multiple problems in detection of when a consistent database state has been reached during WAL replay (Fujii Masao, Heikki Linnakangas, Simon Riggs, Andres Freund)

  • Fix detection of end-of-backup point when no actual redo work is required (Heikki Linnakangas)

    This mistake could result in incorrect "WAL ends before end of online backup" errors.


I believe that my problem is described here. What do you think about it?






On Thu, May 16, 2013 at 11:14 PM, JotaComm <jota.comm@gmail.com> wrote:
Hello, Fabrízio


2013/5/16 Fabrízio de Royes Mello <fabriziomello@gmail.com>

On Thu, May 16, 2013 at 11:12 AM, JotaComm <jota.comm@gmail.com> wrote:

[...]

Yesterday I identified the following messages in my log file (slave):

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

Information:

PostgreSQL 9.2.3 (master and slave)
Operational System: CentOS release 6.3 (Final)
The parameter full_page_writes is enabled in both servers.

Analyzing the objects in my cluster (master) I identified the database [20449] and the relation [24818]. The relation 24818 is an index, so I ran the command REINDEX to try solving the problem. Immediately after, I tried to up the slave but I received the same errors.

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

As the problem is in the wal file, so the process (above) doesn't work according my wish.

Any idea?


Hi JotaComm,

IMHO as it is your slave you could just rebuild it.

However if you want to make an attempt to recover you can do:

1) make a physical backup of this cluster
2) in your postgresql.conf set 'zero_damaged_pages = on' [1] 
3) start your cluster

I really don't know if it will work, but you can try... :-)

Thanks for your suggestion :)

I tried it and I had the same errors. I believe that will be necessary to rebuild the cluster, because the problem is in the wal file.






Thanks a lot


Thank you

Regards

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: [PERFORM] Very slow inner join query Unacceptable latency.
Next
From: Shaun Thomas
Date:
Subject: Re: Success stories of PostgreSQL implementations in different companies