Re: Fwd: index corruption in PG 8.3.13 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Fwd: index corruption in PG 8.3.13
Date
Msg-id AANLkTi=yC=ePCx5zD-___W0BR9OpoAvTPtvkUTw1DX36@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: index corruption in PG 8.3.13  (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>)
Responses Re: Fwd: index corruption in PG 8.3.13
List pgsql-hackers
On Wed, Mar 16, 2011 at 7:51 AM, Nikhil Sontakke
<nikhil.sontakke@enterprisedb.com> wrote:
> Hi,
>
>> To summarize, as I see it - the zeroed out block 523 should have been
>> the second left-most leaf and should have pointed out to 522. Thus
>> re-establishing the index chain
>>
>> 524 -> 523 -> 522 -> 277 -> ...
>>
>>> Was there a machine restart in the picture as well?
>>
>
> It seems there might have been a machine restart involved too. So I
> guess even WAL writing could have been impacted.
>
> But even if VF was ongoing at the time of restart, the WAL replay on
> restart should not do anything since this will be a non-committed
> transaction?

That's not how it works.  Replaying an uncommitted transaction
shouldn't result in any user-visible changes, but it still does stuff.

> Also I was looking at ReadRecord and saw that it logs a message for
> failed CRC blocks but the WAL replay just stops at that point since it
> returns a NULL. Is there a way to find out if more blocks follow in
> the wake of this failed block (should be a matter of calling
> ReadRecord with NULL as a first argument I think)? If so maybe we can
> warn further that error was encountered in the middle of WAL replay.
> However the last block too could be CRC check-fail candidate...

In general, when we WAL-log, we're writing over a previous WAL segment
that's been recycled.  A failed CRC is indistinguishable from
end-of-WAL, because we expect there to be arbitrary garbage bytes in
the file after the end of WAL position.

> BTW, is there a possibility to encounter trailing blocks with CRC
> failures regularly? For transactions that were ongoing at the time of
> shutdown and did not get a chance to commit or WAL log properly?

Well you might have a torn page if there was a *system* crash in the
middle of recovery, but in theory even that shouldn't break anything,
because the system shouldn't rely on the fsync being complete until it
actually is.  Of course, as you mentioned earlier, it's not impossible
there's a bug in the recovery code.  But if an OS crash is involved,
another possibility is that something went wrong with the fsync -
maybe there's a lying writeback cache between PG and the platter, for
example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: Fwd: index corruption in PG 8.3.13
Next
From: Robert Haas
Date:
Subject: Re: CREATE FOREIGN TABLE doc