Thread: WAL contains references to invalid pages

WAL contains references to invalid pages

From
JotaComm
Date:
Hello, guys

Yesterday I identified the following messages in my log file (slave):

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

Information:

PostgreSQL 9.2.3 (master and slave)

Operational System: CentOS release 6.3 (Final)

The parameter full_page_writes is enabled in both servers.

Analyzing the objects in my cluster (master) I identified the database [20449] and the relation [24818]. The relation 24818 is an index, so I ran the command REINDEX to try solving the problem. Immediately after, I tried to up the slave but I received the same errors.

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

As the problem is in the wal file, so the process (above) doesn't work according my wish.

Any idea?

Thanks a lot.

Regards

Re: WAL contains references to invalid pages

From
Fabrízio de Royes Mello
Date:

On Thu, May 16, 2013 at 11:12 AM, JotaComm <jota.comm@gmail.com> wrote:

[...]
Yesterday I identified the following messages in my log file (slave):

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

Information:

PostgreSQL 9.2.3 (master and slave)
Operational System: CentOS release 6.3 (Final)
The parameter full_page_writes is enabled in both servers.

Analyzing the objects in my cluster (master) I identified the database [20449] and the relation [24818]. The relation 24818 is an index, so I ran the command REINDEX to try solving the problem. Immediately after, I tried to up the slave but I received the same errors.

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

As the problem is in the wal file, so the process (above) doesn't work according my wish.

Any idea?


Hi JotaComm,

IMHO as it is your slave you could just rebuild it.

However if you want to make an attempt to recover you can do:

1) make a physical backup of this cluster
2) in your postgresql.conf set 'zero_damaged_pages = on' [1] 
3) start your cluster

I really don't know if it will work, but you can try... :-)


Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: WAL contains references to invalid pages

From
JotaComm
Date:
Hello, Fabrízio


2013/5/16 Fabrízio de Royes Mello <fabriziomello@gmail.com>

On Thu, May 16, 2013 at 11:12 AM, JotaComm <jota.comm@gmail.com> wrote:

[...]

Yesterday I identified the following messages in my log file (slave):

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

Information:

PostgreSQL 9.2.3 (master and slave)
Operational System: CentOS release 6.3 (Final)
The parameter full_page_writes is enabled in both servers.

Analyzing the objects in my cluster (master) I identified the database [20449] and the relation [24818]. The relation 24818 is an index, so I ran the command REINDEX to try solving the problem. Immediately after, I tried to up the slave but I received the same errors.

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

As the problem is in the wal file, so the process (above) doesn't work according my wish.

Any idea?


Hi JotaComm,

IMHO as it is your slave you could just rebuild it.

However if you want to make an attempt to recover you can do:

1) make a physical backup of this cluster
2) in your postgresql.conf set 'zero_damaged_pages = on' [1] 
3) start your cluster

I really don't know if it will work, but you can try... :-)

Thanks for your suggestion :)

I tried it and I had the same errors. I believe that will be necessary to rebuild the cluster, because the problem is in the wal file.



Regards

Re: WAL contains references to invalid pages

From
Adarsh Sharma
Date:
Try to take backups of that table & index only. If succeeded drop and recreate them. May be it fix your issue.

Thanks


On Thu, May 16, 2013 at 11:14 PM, JotaComm <jota.comm@gmail.com> wrote:
Hello, Fabrízio


2013/5/16 Fabrízio de Royes Mello <fabriziomello@gmail.com>

On Thu, May 16, 2013 at 11:12 AM, JotaComm <jota.comm@gmail.com> wrote:

[...]

Yesterday I identified the following messages in my log file (slave):

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

Information:

PostgreSQL 9.2.3 (master and slave)
Operational System: CentOS release 6.3 (Final)
The parameter full_page_writes is enabled in both servers.

Analyzing the objects in my cluster (master) I identified the database [20449] and the relation [24818]. The relation 24818 is an index, so I ran the command REINDEX to try solving the problem. Immediately after, I tried to up the slave but I received the same errors.

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

As the problem is in the wal file, so the process (above) doesn't work according my wish.

Any idea?


Hi JotaComm,

IMHO as it is your slave you could just rebuild it.

However if you want to make an attempt to recover you can do:

1) make a physical backup of this cluster
2) in your postgresql.conf set 'zero_damaged_pages = on' [1] 
3) start your cluster

I really don't know if it will work, but you can try... :-)

Thanks for your suggestion :)

I tried it and I had the same errors. I believe that will be necessary to rebuild the cluster, because the problem is in the wal file.




Re: WAL contains references to invalid pages

From
JotaComm
Date:



Hello,




2013/5/21 Adarsh Sharma <eddy.adarsh@gmail.com>
Try to take backups of that table & index only. If succeeded drop and recreate them. May be it fix your issue.

On Monday night I made the slave server. Yesterday I was analyzing the log files and I found the following messages.

2013-05-21 15:13:48 BRT [30686]: [25-1] user=,db= WARNING:  page 136714 of relation base/79251/79262 is uninitialized
2013-05-21 15:13:48 BRT [30686]: [26-1] user=,db= CONTEXT:  xlog redo visible: rel 1663/79251/79262; blk 136714
2013-05-21 15:13:48 BRT [30686]: [27-1] user=,db= PANIC:  WAL contains references to invalid pages
2013-05-21 15:13:48 BRT [30686]: [28-1] user=,db= CONTEXT:  xlog redo visible: rel 1663/79251/79262; blk 136714
2013-05-21 15:13:49 BRT [30684]: [2-1] user=,db= LOG:  startup process (PID 30686) was terminated by signal 6: Aborted
2013-05-21 15:13:49 BRT [30684]: [3-1] user=,db= LOG:  terminating any other active server processes

It's the same problem, but now is in another table.

According the documentation: http://www.postgresql.org/docs/9.2/interactive/release-9-2-3.html

  • Fix multiple problems in detection of when a consistent database state has been reached during WAL replay (Fujii Masao, Heikki Linnakangas, Simon Riggs, Andres Freund)

  • Fix detection of end-of-backup point when no actual redo work is required (Heikki Linnakangas)

    This mistake could result in incorrect "WAL ends before end of online backup" errors.


I believe that my problem is described here. What do you think about it?






On Thu, May 16, 2013 at 11:14 PM, JotaComm <jota.comm@gmail.com> wrote:
Hello, Fabrízio


2013/5/16 Fabrízio de Royes Mello <fabriziomello@gmail.com>

On Thu, May 16, 2013 at 11:12 AM, JotaComm <jota.comm@gmail.com> wrote:

[...]

Yesterday I identified the following messages in my log file (slave):

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

Information:

PostgreSQL 9.2.3 (master and slave)
Operational System: CentOS release 6.3 (Final)
The parameter full_page_writes is enabled in both servers.

Analyzing the objects in my cluster (master) I identified the database [20449] and the relation [24818]. The relation 24818 is an index, so I ran the command REINDEX to try solving the problem. Immediately after, I tried to up the slave but I received the same errors.

user=,db= WARNING:  page 6629 of relation base/20449/24818 is uninitialized
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= PANIC:  WAL contains references to invalid pages
user=,db= CONTEXT:  xlog redo vacuum: rel 1663/20449/24818; blk 6631, lastBlockVacuumed 6626
user=,db= LOG:  startup process (PID 26293) was terminated by signal 6: Aborted
user=,db= LOG:  terminating any other active server processes

As the problem is in the wal file, so the process (above) doesn't work according my wish.

Any idea?


Hi JotaComm,

IMHO as it is your slave you could just rebuild it.

However if you want to make an attempt to recover you can do:

1) make a physical backup of this cluster
2) in your postgresql.conf set 'zero_damaged_pages = on' [1] 
3) start your cluster

I really don't know if it will work, but you can try... :-)

Thanks for your suggestion :)

I tried it and I had the same errors. I believe that will be necessary to rebuild the cluster, because the problem is in the wal file.






Thanks a lot


Thank you

Regards