Re: Tracking down log segment corruption - Mailing list pgsql-general

From Tom Lane
Subject Re: Tracking down log segment corruption
Date
Msg-id 17352.1272834971@sss.pgh.pa.us
Whole thread Raw
In response to Re: Tracking down log segment corruption  (Gordon Shannon <gordo169@gmail.com>)
Responses Re: Tracking down log segment corruption
List pgsql-general
Gordon Shannon <gordo169@gmail.com> writes:
> [ corruption on a standby slave after an ALTER SET TABLESPACE operation ]

Found it, I think.  ATExecSetTableSpace transfers the copied data to the
slave by means of XLOG_HEAP_NEWPAGE WAL records.  The replay function
for this (heap_xlog_newpage) is failing to pay any attention to the
forkNum field of the WAL record.  This means it will happily write FSM
and visibility-map pages into the main fork of the relation.  So if the
index had any such pages on the master, it would immediately become
corrupted on the slave.  Now indexes don't have a visibility-map fork,
but they could have FSM pages.  And an FSM page would have the right
header information to look like an empty index page.  So dropping an
index FSM page into the main fork of the index would produce the
observed symptom.

I'm not 100% sure that this is what bit you, but it's clearly a bug and
AFAICS it could produce the observed symptoms.

This is a seriously, seriously nasty data corruption bug.  The only bit
of good news is that ALTER SET TABLESPACE seems to be the only operation
that can emit XLOG_HEAP_NEWPAGE records with forkNum different from
MAIN_FORKNUM, so that's the only operation that's at risk.  But if you
do do that, not only are standby slaves going to get clobbered, but the
master could get corrupted too if you were unlucky enough to have a
crash and replay from WAL shortly after completing the ALTER.  And it's
not only indexes that are at risk --- tables could get clobbered the
same way.

My crystal ball says there will be update releases in the very near
future.

            regards, tom lane

pgsql-general by date:

Previous
From: Gordon Shannon
Date:
Subject: Re: Tracking down log segment corruption
Next
From: Gordon Shannon
Date:
Subject: Re: Tracking down log segment corruption