Thread: PANIC: corrupted item lengths

PANIC: corrupted item lengths

From
Simon Riggs
Date:
I've seen a couple of *topic issues lately.

What seems strange about the various errors generated in bufpage.c is
that they are marked as ERRORs, yet are executed within a critical
section causing the system to PANIC.

There are a number of sanity checks that are made prior to any changes
taking place, so they need not generate PANICs. *topic is one such error
message but there are others. There is no need to take down the whole
server just because one block has a corruption. I'm not advocating that
corruptions are acceptable, but the server provides no way to remove the
corruption, which is a problem and server really should keep a better
grip on its towel in any case.

I would much prefer: 

* VACUUMs seeing these errors would perform as with zero_damaged_pages.
* other backends seeing those errors should just ERROR out. 

We can do this by having a new function: boolean PageIsValid() which
performs the sanity checks. This can then be executed by
heap_page_prune() prior to entering the critical section. That will then
be called correctly by both VACUUM and other code. VACUUM can then
optionally zero out the block, as is done with PageHeaderIsValid().

Votes?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: PANIC: corrupted item lengths

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> I would much prefer: 
> 
> * VACUUMs seeing these errors would perform as with zero_damaged_pages.
> * other backends seeing those errors should just ERROR out. 
> 
> We can do this by having a new function: boolean PageIsValid() which
> performs the sanity checks. This can then be executed by
> heap_page_prune() prior to entering the critical section. That will then
> be called correctly by both VACUUM and other code. VACUUM can then
> optionally zero out the block, as is done with PageHeaderIsValid().

I tend to hate automatic zeroing of pages because there's no way to get
the contents later for forensics.  I would support your proposal if we
had a way of saving the block elsewhere before zeroing it (say create a
directory corrupted+zeroed similar to lost+found in the database dir and
save it annotated with the OID of the table and the block number).

The main problem I see with this is that if you don't immediately act to
examine the data, some of the pg_clog files that you need to be able to
read these files may be gone.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: PANIC: corrupted item lengths

From
Greg Stark
Date:
On Thu, Jun 4, 2009 at 2:55 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

> I tend to hate automatic zeroing of pages because there's no way to get
> the contents later for forensics.

That's why we default to zero_damaged_pages=false. Saving the pages
somewhere seems like it would be a good feature to add for
zero_damaged_pages in general but an orthogonal issue.

Simon's suggestion is a few more cases we could catch send through the
zero_damaged_pages code path rather than crashing. If his analysis of
the existing code is correct -- I haven't looked but I assume so --
then that sounds like the right thing to do.

I would be concerned about the overhead of checking this on all the
line pointers if we're just trying to look up a single item. But from
the sound of his description it would only fire in the hot pruning for
which we already control the frequency.

-- 
greg


Re: PANIC: corrupted item lengths

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> What seems strange about the various errors generated in bufpage.c is
> that they are marked as ERRORs, yet are executed within a critical
> section causing the system to PANIC.

The reason we PANIC there is to reduce the probability that bad data
will be written back to disk.  Of course, if the bad data was read off
disk in the first place, there's no hope --- but we have checks on
incoming pages for that.  What seems significantly more likely if we
detect a problem here is that we somehow corrupted the page while it
sits in shared buffers.  So, there's some hope that the corruption will
not get back to disk, so long as we PANIC and thereby cause
shared-memory contents to be flushed.

> Votes?

+1 for no change.

We could make the page-read-time validation checks stricter, if there's
some specific pattern you're seeing that gets past those checks now.
        regards, tom lane


Re: PANIC: corrupted item lengths

From
Simon Riggs
Date:
On Thu, 2009-06-04 at 10:02 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > What seems strange about the various errors generated in bufpage.c is
> > that they are marked as ERRORs, yet are executed within a critical
> > section causing the system to PANIC.
> 
> The reason we PANIC there is to reduce the probability that bad data
> will be written back to disk.  Of course, if the bad data was read off
> disk in the first place, there's no hope --- but we have checks on
> incoming pages for that.

We don't have checks for this on incoming pages: We only ever check the
header. I think there is hope if we do have corruption. We don't need to
PANIC, we can do what I've suggested instead.

>   What seems significantly more likely if we
> detect a problem here is that we somehow corrupted the page while it
> sits in shared buffers.  So, there's some hope that the corruption will
> not get back to disk, so long as we PANIC and thereby cause
> shared-memory contents to be flushed.

If the block is marked as dirty, then yes, I can see your point. I would
prefer to PANIC than to lose data. If the block is *not* dirty, i.e. it
has been trashed in some other way, then it is not likely to go to disk.
Anybody re-reading the block will see the same corruption and die long
before they can make the page dirty. So a corrupt, yet clean block is no
reason to PANIC. If the block *is* dirty it might only be because of a
hint bit change and there are various other ways to dirty a block that
don't trigger full page validity checks.

> > Votes?
> 
> +1 for no change.
> 
> We could make the page-read-time validation checks stricter, if there's
> some specific pattern you're seeing that gets past those checks now.

Don't know what the pattern is because the bloody things keep PANICing.
Seriously, it does look like some kind of memory corruption issue, but
still not sure if hardware or software related. But this thread is not
about that issue, its about how we respond in the face of such issues.

Main problem is no easy way to get rid of the corrupt block. You have to
select out good data somehow then truncate. I like Alvaro's suggestion
for the future, but mostly we need to be able to surgically remove the
data block. Yes, I can do this without backend changes via normal levels
of non-user level skullduggery.

It would be good to have a check_page_validity option so that before we
do anything to a page we do full validation, especially for example on
PageAddItem(). With that turned on, I wouldn't mind PANICing, because at
least you'd have reasonable proof that the corruption has happened
recently and that PANICing may actually prevent the horse from escaping.

Thanks,

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: PANIC: corrupted item lengths

From
Dimitri Fontaine
Date:
Hi,

Le 4 juin 09 à 15:55, Alvaro Herrera a écrit :
> I tend to hate automatic zeroing of pages because there's no way to
> get
> the contents later for forensics.  I would support your proposal if we
> had a way of saving the block elsewhere before zeroing it (say
> create a
> directory corrupted+zeroed similar to lost+found in the database dir
> and
> save it annotated with the OID of the table and the block number).

What about creating a special purpose fork for this? It could contain
some metadata plus the original (maybe corrupted) block content.

> The main problem I see with this is that if you don't immediately
> act to
> examine the data, some of the pg_clog files that you need to be able
> to
> read these files may be gone.

The necessary clogs maybe could be part of the special fork metadata
associated with each saved apart corrupted blocks?

Regards,
--
dim