Thread: Finding corrupt data

Finding corrupt data

From
Matthew Hagerty
Date:
Greetings,

If there was corrupt data in a table, how would one go about finding it?
Is it possible that corrupt data could cause a backend crash?

Thank you,
Matthew


Re: [HACKERS] Finding corrupt data

From
Tom Lane
Date:
Matthew Hagerty <matthew@venux.net> writes:
> Is it possible that corrupt data could cause a backend crash?

Absolutely.  The scenario I've seen most is that the length word of a
variable-length field value (a "varlena" value in pghackers-speak)
contains garbage.  The backend comes along and tries to allocate space
equal to the claimed field length in order to copy the value to
someplace, and the usual result is that the backend process exceeds
its allowed memory usage and is summarily killed by the kernel.

> If there was corrupt data in a table, how would one go about finding it?

The brute-force way is to do a SELECT * or COPY TO and see if the
backend survives ;-).  If not, narrowing down which record is bad
is left as an exercise for the student...
        regards, tom lane


Re: [HACKERS] Finding corrupt data

From
Ed Loehr
Date:
Tom Lane wrote:

> > If there was corrupt data in a table, how would one go about finding it?
>
> The brute-force way is to do a SELECT * or COPY TO and see if the
> backend survives ;-).  If not, narrowing down which record is bad
> is left as an exercise for the student...

One RDBMS I used had a utility called 'dbcheck' which did some sort of
examination of indices, tables, etc., and issued an 'OK' or 'CORRUPT' for
each examined object.  Such a utility for pgsql might simply do some
combination of SELECT * or COPY TO as you suggest above.

Would it be reasonable to put such a tool make its way onto the todo list, in
the absence of better alternatives?  I'd argue it's important for pgsql's
future popular prospects to be able to be _operated_  (i.e., live dbs backed
up, diagnosed as corrupted, and restored) by folks who may know very little
about the internals or the design of the schema/code.  Quick and correct
diagnosis of the problem is the key for them.  Such a tool would seem to go a
long way toward that end.

Cheers,
Ed Loehr



Re: [HACKERS] Finding corrupt data

From
Tom Lane
Date:
Ed Loehr <ELOEHR@austin.rr.com> writes:
> One RDBMS I used had a utility called 'dbcheck' which did some sort of
> examination of indices, tables, etc., and issued an 'OK' or 'CORRUPT' for
> each examined object.  Such a utility for pgsql might simply do some
> combination of SELECT * or COPY TO as you suggest above.

> Would it be reasonable to put such a tool make its way onto the todo list, in
> the absence of better alternatives?

What'd be really nice is some kind of 'fsck' for databases.  But it'd be
a lot of work to write one, and more work to keep it up to date in the
face of continuing changes to the database representation.

One simpler thing that I'd like to see is for VACUUM to recreate indexes
from scratch instead of trying to compact them.  This would provide a
very simple recovery procedure for corrupted indexes, and it seems
possible that it'd actually be faster than what VACUUM does now.
        regards, tom lane


Re: [HACKERS] Finding corrupt data

From
Bruce Momjian
Date:
> Tom Lane wrote:
> 
> > > If there was corrupt data in a table, how would one go about finding it?
> >
> > The brute-force way is to do a SELECT * or COPY TO and see if the
> > backend survives ;-).  If not, narrowing down which record is bad
> > is left as an exercise for the student...
> 
> One RDBMS I used had a utility called 'dbcheck' which did some sort of
> examination of indices, tables, etc., and issued an 'OK' or 'CORRUPT' for
> each examined object.  Such a utility for pgsql might simply do some
> combination of SELECT * or COPY TO as you suggest above.

Does vacuum already do that?


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Finding corrupt data

From
Ed Loehr
Date:
Bruce Momjian wrote:

> > One RDBMS I used had a utility called 'dbcheck' which did some sort of
> > examination of indices, tables, etc., and issued an 'OK' or 'CORRUPT' for
> > each examined object.  Such a utility for pgsql might simply do some
> > combination of SELECT * or COPY TO as you suggest above.
>
> Does vacuum already do that?

Not as far as I can tell.   Here's the kind of output I see from vacuum:

DEBUG:  --Relation pg_class--
DEBUG:  Pages 10: Changed 0, Reapped 1, Empty 0, New 0; Tup 695: Vac 0, Keep/VTL
0/0, Crash 0, UnUsed 35, MinLen 102, MaxLen 132; Re-using: Free/Avail. Space
3828/0; EndEmpty/Avail. Pages 0/0. Elapsed 0/0 sec.
DEBUG:  Index pg_class_relname_index: Pages 16; Tuples 695: Deleted 0. Elapsed
0/0 sec.
DEBUG:  Index pg_class_oid_index: Pages 7; Tuples 695: Deleted 0. Elapsed 0/0
sec.

Am I missing something?

Cheers,
Ed Loehr



Re: [HACKERS] Finding corrupt data

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> 
> > > One RDBMS I used had a utility called 'dbcheck' which did some sort of
> > > examination of indices, tables, etc., and issued an 'OK' or 'CORRUPT' for
> > > each examined object.  Such a utility for pgsql might simply do some
> > > combination of SELECT * or COPY TO as you suggest above.
> >
> > Does vacuum already do that?
> 
> Not as far as I can tell.   Here's the kind of output I see from vacuum:
> 
> DEBUG:  --Relation pg_class--
> DEBUG:  Pages 10: Changed 0, Reapped 1, Empty 0, New 0; Tup 695: Vac 0, Keep/VTL
> 0/0, Crash 0, UnUsed 35, MinLen 102, MaxLen 132; Re-using: Free/Avail. Space
> 3828/0; EndEmpty/Avail. Pages 0/0. Elapsed 0/0 sec.
> DEBUG:  Index pg_class_relname_index: Pages 16; Tuples 695: Deleted 0. Elapsed
> 0/0 sec.
> DEBUG:  Index pg_class_oid_index: Pages 7; Tuples 695: Deleted 0. Elapsed 0/0
> sec.
> 
> Am I missing something?

Vacuum does catch some problems, not all of them.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Finding corrupt data

From
Ed Loehr
Date:
Bruce Momjian wrote:

> > Bruce Momjian wrote:
> >
> > > > One RDBMS I used had a utility called 'dbcheck' which did some sort of
> > > > examination of indices, tables, etc., and issued an 'OK' or 'CORRUPT' for
> > > > each examined object.  Such a utility for pgsql might simply do some
> > > > combination of SELECT * or COPY TO as you suggest above.
> > >
> > > Does vacuum already do that?
> >
> > Not as far as I can tell.   Here's the kind of output I see from vacuum:
> >
> > DEBUG:  --Relation pg_class--
> > DEBUG:  Pages 10: Changed 0, Reapped 1, Empty 0, New 0; Tup 695: Vac 0, Keep/VTL
> > 0/0, Crash 0, UnUsed 35, MinLen 102, MaxLen 132; Re-using: Free/Avail. Space
> > 3828/0; EndEmpty/Avail. Pages 0/0. Elapsed 0/0 sec.
> > DEBUG:  Index pg_class_relname_index: Pages 16; Tuples 695: Deleted 0. Elapsed
> > 0/0 sec.
> > DEBUG:  Index pg_class_oid_index: Pages 7; Tuples 695: Deleted 0. Elapsed 0/0
> > sec.
> >
> > Am I missing something?
>
> Vacuum does catch some problems, not all of them.

Yes, and vacuum appears to be the only known remedy to my current postgresql
showstopper.  For that, I'm grateful.  However, I think that misses the point I'm
trying to convey...

There are a three basic tasks critically important to an operationally viable
database, from my perspective.  First, I need to be able to easily create a backup of
the database at any point.  The pg_dump appears to serve that function.

Second, I need to be able to restore from a backup copy if something goes terribly
wrong.  Psql coupled with pg_dump output seems to support that.  So far, so good.

Third, and most importantly, I need to be able to tell *when* I need to restore from
a backup.  A restoration from a backup copy usually involves a likely loss of data,
and that can be a Very Big Deal.  "Is this database corrupt?", is a critically
important question.  And I need to be able to answer it without knowing the details
of postgresql C code.  If I can't somehow answer that question when a problem arises,
the total cost of ownership of the database jumps pretty dramatically due to wasted
time and data loss, and the operability/viability drops in tandem.

Cheers,
Ed Loehr