Re: Completely broken replica after PANIC: WAL contains references to invalid pages - Mailing list pgsql-bugs

From Sergey Konoplev
Subject Re: Completely broken replica after PANIC: WAL contains references to invalid pages
Date
Msg-id CAL_0b1tyfAYxg0u7U6vhKn6e1PbBkib3hVh_o7Wqg0Cz4xTn1Q@mail.gmail.com
Whole thread Raw
In response to Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-bugs
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund <andres@2ndquadrant.com> wro=
te:
> The attached patch fixes this although I don't like the way it knowledge =
of the
> point up to which StartupSUBTRANS zeroes pages is handled.

So, after half a year the same failure has happened again on the same
replica, but now patched with the Andres' patch (9.2.4 + the patch)
that was supposed to fix it.

Here is the link to the full conversation.

http://www.postgresql.org/message-id/flat/CAL_0b1t=3DWuM6roO8dki=3Dw8DhH8P8=
whhohbPjReymmQUrOcNT2A@mail.gmail.com

Here is the logs.

2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] WARNING:
page 27415 of relation base/16436/3220672275 is uninitialized
2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-10-31 22:51:44 MSK 30711 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-10-31 22:51:44 MSK 30708 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 30711) was terminated by signal 6: Aborted
2013-10-31 22:51:44 MSK 30708 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

I saved the base/16436/3220672275* files and pg_xlog directory, just in cas=
e.

On attempt to restart it printed the same in logs and didn't started.

2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] LOG:
consistent recovery state reached at 2F02/2774CA28
2013-11-01 08:15:25 MSK 764 @ from  [vxid: txid:0] [] LOG:  database
system is ready to accept read only connections
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] WARNING:
page 27415 of relation base/16436/3220672275 is uninitialized
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-11-01 08:15:25 MSK 767 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo visible: rel 1663/16436/3220672275; blk 27415
2013-11-01 08:15:25 MSK 764 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 767) was terminated by signal 6: Aborted
2013-11-01 08:15:25 MSK 764 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

Here is the pg_controldata ouptut.

pg_control version number:            922
Catalog version number:               201204301
Database system identifier:           5858109675396804534
Database cluster state:               in archive recovery
pg_control last modified:             =D0=9F=D1=82=D0=BD 01 =D0=9D=D0=BE=D1=
=8F 2013 07:52:08
Latest checkpoint location:           2F00/C9BCE828
Prior checkpoint location:            2F00/C9BCE828
Latest checkpoint's REDO location:    2F00/32F59B70
Latest checkpoint's TimeLineID:       2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          3/805663702
Latest checkpoint's NextOID:          3227099776
Latest checkpoint's NextMultiXactId:  4809163
Latest checkpoint's NextMultiOffset:  21342992
Latest checkpoint's oldestXID:        605734616
Latest checkpoint's oldestXID's DB:   16436
Latest checkpoint's oldestActiveXID:  805262681
Time of latest checkpoint:            =D0=A7=D1=82=D0=B2 31 =D0=9E=D0=BA=D1=
=82 2013 21:00:02
Minimum recovery ending location:     2F02/2774CA28
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      550
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

Any thoughts?

--=20
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

pgsql-bugs by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: BUG #8542: Materialized View with another column_name does not work?
Next
From: Frank van Vugt
Date:
Subject: Re: array_agg() on a set larger than some arbitrary(?) limit causes runaway memory usage and eventually memory exhaustion