Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery
Date
Msg-id 51260D39.5020505@vmware.com
Whole thread Raw
In response to Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery  (Maciek Sakrejda <maciek@heroku.com>)
Responses Re: BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery  (Maciek Sakrejda <maciek@heroku.com>)
List pgsql-bugs
On 19.02.2013 00:19, Maciek Sakrejda wrote:
> On Mon, Feb 18, 2013 at 12:57 AM, Heikki Linnakangas<
> hlinnakangas@vmware.com>  wrote:
>
>> On 16.02.2013 01:49, Daniel Farina wrote:
>>
>>> I guess that means Ubuntu (and probably Debian?) libpq-dev breaks
>>> PG_VERSION_NUM for PGXS=1.
>>>
>>
>> That obviously needs to be fixed in debian. Meanwhile, Maciek, I'd suggest
>> that you build PostgreSQL from sources, install it to some temporary
>> location, and then build xlogdump against that.
>
> That worked, thanks. I have a working xlogdump. Any pointers as to what I
> should look for? This is the contents of the pg_xlog directory:
>
> total 49160
> -rw------- 1 udrehggpif7kft postgres 16777216 Feb 15 00:00
> 000000010000003C00000093
> -rw------- 1 udrehggpif7kft postgres 16777216 Feb 15 00:47
> 000000010000003C00000094
> -rw------- 1 udrehggpif7kft postgres 16777216 Feb 15 00:49
> 000000020000003C00000093
> -rw------- 1 udrehggpif7kft postgres       56 Feb 15 00:49 00000002.history
> drwx------ 2 udrehggpif7kft postgres     4096 Feb 15 00:49 archive_status

I'd like to see the contents of the WAL, starting from the last
checkpoint, up to the point where failover happened. In particular, any
actions on the relation base/16385/16430, which caused the error.
pg_controldata output on the base backup would also interesting, as well
as the contents of backup_label file.

How long did the standby run between the base backup and the failover?
How many WAL segments?

One more thing you could try to narrow down the error: restore from the
base backup, and let it run up to the point of failover, but shut it
down just before the failover with "pg_ctl stop -m fast". That should
create a restartpoint, at the latest checkpoint record. Then restart,
and perform failover. If it still throws the same error, we know that
the WAL record that touched the page that doesn't exist was after the
last checkpoint.

- Heikki

pgsql-bugs by date:

Previous
From: Josh Kupershmidt
Date:
Subject: Re: BUG #7873: pg_restore --clean tries to drop tables that don't exist
Next
From: Claude Speed
Date:
Subject: Re: new BUG: "postgresql 9.2.3: very long query time"