Re: Tracking down log segment corruption - Mailing list pgsql-general

From Gordon Shannon
Subject Re: Tracking down log segment corruption
Date
Msg-id x2ub2dd93301005021443t3c13fb57m7fc5b53e7f0f466b@mail.gmail.com
Whole thread Raw
In response to Re: Tracking down log segment corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Tracking down log segment corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Sounds like you're on it.  Just wanted to share one additional piece, in case it helps.

Just before the ALTER INDEX SET TABLESPACE was issued, there were some writes to the table in question inside a serializable transaction. The transaction committed at 11:11:58 EDT, and consisted of, among a couple thousand writes to sibling tables, 4 writes (unknown combination of inserts and updates) to cts_20100501, which definitely effected the index in question.

In any case, I will cease and desist from ALTER SET TABLESPACE for a while!.

Thanks!
Gordon

Between 11:11:56 and 11:11:58 EDT (11 sec before the crash), there were

On Sun, May 2, 2010 at 3:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Found it, I think.  ATExecSetTableSpace transfers the copied data to the
slave by means of XLOG_HEAP_NEWPAGE WAL records.  The replay function
for this (heap_xlog_newpage) is failing to pay any attention to the
forkNum field of the WAL record.  This means it will happily write FSM
and visibility-map pages into the main fork of the relation.  So if the
index had any such pages on the master, it would immediately become
corrupted on the slave.  Now indexes don't have a visibility-map fork,
but they could have FSM pages.  And an FSM page would have the right
header information to look like an empty index page.  So dropping an
index FSM page into the main fork of the index would produce the
observed symptom.

I'm not 100% sure that this is what bit you, but it's clearly a bug and
AFAICS it could produce the observed symptoms.

This is a seriously, seriously nasty data corruption bug.  The only bit
of good news is that ALTER SET TABLESPACE seems to be the only operation
that can emit XLOG_HEAP_NEWPAGE records with forkNum different from
MAIN_FORKNUM, so that's the only operation that's at risk.  But if you
do do that, not only are standby slaves going to get clobbered, but the
master could get corrupted too if you were unlucky enough to have a
crash and replay from WAL shortly after completing the ALTER.  And it's
not only indexes that are at risk --- tables could get clobbered the
same way.

My crystal ball says there will be update releases in the very near
future.

                       regards, tom lane


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Tracking down log segment corruption
Next
From: Tom Lane
Date:
Subject: Re: Tracking down log segment corruption