BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery - Mailing list pgsql-bugs

From maciek@heroku.com
Subject BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery
Date
Msg-id E1U6AQu-0002rq-4b@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      7883
Logged by:          Maciek Sakrejda
Email address:      maciek@heroku.com
PostgreSQL version: 9.1.8
Operating system:   Ubuntu 12.04 64-bit
Description:        =


We ran into a customer database giving us the error above when replicating
from 9.1.7 to 9.1.8 and attempting to fail over to the 9.1.8. I noticed
several fixes to WAL replay in 9.1.8--could this be a factor in this case?
We're trying again with a fresh replica; hopefully that will just work. Logs
from the incident are below.

Thanks,
Maciek

Feb 15 00:49:12 wal_e.worker.s3_worker INFO     MSG: could not locate object
while performing wal restore#012        DETAIL: The absolute URI that could
not be located is
s3://wal-e-[redacted]/wal-e-backups/timeline-0e0b390f-cb3f-4192-8cdb-fced4d=
54b0a2/wal_005/000000010000003C00000094.lzo.#012
       HINT: This can be normal when Postgres is trying to detect what
timelines are available during restoration.
Feb 15 00:49:12 [1300-1]  [COPPER] LOG:  invalid magic number 0000 in log
file 60, segment 148, offset 0
Feb 15 00:49:13 wal_e.worker.s3_worker INFO     MSG: could not locate object
while performing wal restore#012        DETAIL: The absolute URI that could
not be located is
s3://wal-e-[redacted]/wal-e-backups/timeline-0e0b390f-cb3f-4192-8cdb-fced4d=
54b0a2/wal_005/000000010000003C00000094.lzo.#012
       HINT: This can be normal when Postgres is trying to detect what
timelines are available during restoration.
Feb 15 00:49:13 [1301-1]  [COPPER] LOG:  redo done at 3C/930000B0
Feb 15 00:49:13 [1302-1]  [COPPER] LOG:  last completed transaction was at
log time 2013-02-14 22:35:05.338681+00
Feb 15 00:49:15 wal_e.worker.s3_worker INFO     MSG: completed download and
decompression#012        DETAIL: Downloaded and decompressed
"s3://wal-e-[redacted]/wal-e-backups/timeline-0e0b390f-cb3f-4192-8cdb-fced4=
d54b0a2/wal_005/000000010000003C00000093.lzo"
to "pg_xlog/RECOVERYXLOG"
Feb 15 00:49:15 [1303-1]  [COPPER] LOG:  restored log file
"000000010000003C00000093" from archive
Feb 15 00:49:15 wal_e.worker.s3_worker INFO     MSG: could not locate object
while performing wal restore#012        DETAIL: The absolute URI that could
not be located is
s3://wal-e-[redacted]/wal-e-backups/timeline-0e0b390f-cb3f-4192-8cdb-fced4d=
54b0a2/wal_005/00000002.history.lzo.#012
       HINT: This can be normal when Postgres is trying to detect what
timelines are available during restoration.
Feb 15 00:49:16 [1304-1]  [COPPER] LOG:  selected new timeline ID: 2
Feb 15 00:49:16 wal_e.worker.s3_worker INFO     MSG: could not locate object
while performing wal restore#012        DETAIL: The absolute URI that could
not be located is
s3://wal-e-[redacted]/wal-e-backups/timeline-0e0b390f-cb3f-4192-8cdb-fced4d=
54b0a2/wal_005/00000001.history.lzo.#012
       HINT: This can be normal when Postgres is trying to detect what
timelines are available during restoration.
Feb 15 00:49:16 [1305-1]  [COPPER] LOG:  archive recovery complete
Feb 15 00:49:16 [1306-1]  [COPPER] WARNING:  page 37956 of relation
base/16385/16430 was uninitialized
Feb 15 00:49:16 [1307-1]  [COPPER] PANIC:  WAL contains references to
invalid pages
Feb 15 00:49:17 [3-1]  [COPPER] LOG:  startup process (PID 7) was terminated
by signal 6: Aborted
Feb 15 00:49:17 [4-1]  [COPPER] LOG:  terminating any other active server
processes
Feb 15 00:49:17 [37-1] collectd [COPPER] WARNING:  terminating connection
because of crash of another server process
Feb 15 00:49:17 [37-2] collectd [COPPER] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted
shared memory.
Feb 15 00:49:17 [37-3] collectd [COPPER] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #7852: TeamPostgresql is giving error
Next
From: Heikki Linnakangas
Date:
Subject: Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery