Re: Corrupt Table - Mailing list pgsql-general

From Bryan White
Subject Re: Corrupt Table
Date
Msg-id 007c01c01e74$42d57960$2dd260d1@arcamax.com
Whole thread Raw
In response to Corrupt Table  ("Bryan White" <bryan@arcamax.com>)
Responses Re: Corrupt Table  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
> > Server process (pid 2864) exited with status 139 at Thu Sep 14 10:13:11
2000
>
> That should produce a coredump --- can you get a backtrace?

I found a core file.  I am not all that familiar with gdb but the backtrace
looks useless:
#0  0x8064fb4 in ?? ()
#1  0x809da10 in ?? ()
#2  0x809e538 in ?? ()
#3  0x809e0c6 in ?? ()
#4  0x809e176 in ?? ()
#5  0x809e2dc in ?? ()
#6  0x809e5ab in ?? ()
#7  0x809e666 in ?? ()
#8  0x809eb00 in ?? ()
#9  0x80a1f90 in ?? ()
#10 0x809d3ae in ?? ()
#11 0x80a31a4 in ?? ()
#12 0x809d3b7 in ?? ()
#13 0x809c6b9 in ?? ()
#14 0x809bd0e in ?? ()
#15 0x80ec132 in ?? ()
#16 0x80ec19c in ?? ()
#17 0x80eadc7 in ?? ()
#18 0x80eaca7 in ?? ()
#19 0x80ebba2 in ?? ()
#20 0x80d61f2 in ?? ()
#21 0x80d5dd1 in ?? ()
#22 0x80d518a in ?? ()
#23 0x80d4c14 in ?? ()
#24 0x80ab736 in ?? ()
#25 0x401029cb in ?? ()

> Cool; want to submit it as a contrib item?  This sounds like something
> that could be gradually improved into a "fsck" kind of thing...

Right now it is sort of 'hack it up as needed'.  I will try and polish it up
and add command line options to control it.

>Hmm.  The all-zero pages (I expect you'll find that 25878 etc are *all*
>zeroes, not just their header fields) look like an old bug wherein
>a newly-added page might not get initialized if the transaction that's
>adding the page aborts just after allocating the page.  I thought I'd
>fixed that in 7.0 though.  You are running 7.0.2 I hope?

Yes I am running 7.0.2.  The 4 pages in question have 0's in the first 16
bytes but other data after that.  I see some text that look like real data.

> A VACUUM should patch up zero pages.  I'm guessing that you haven't
> vacuumed this table in a long time...

Vacuum occurs nightly just after the backup.  I checked and it ran fine the
night before.  Last nights vacuum reported: psql: The Data Base System is in
recovery mode

>
> Page 71958 looks pretty badly corrupted --- you'll have to look at that
> and see if you can clean it up by hand.  Something fishy about 71453
> as well.  Worst case, you could set these pages to all-zero by hand,
> and just lose the tuples thereon rather than the whole table.
>
> How fast does your app add/update tuples in this table?  If you are
> lucky, the tuples in pages 71453 and 71958 might be available from your
> last successful backup, in which case trying to patch up the page
> contents by hand is probably a waste of effort.  Zero those pages,
> dump out the current contents of the file with COPY, and start comparing
> that to your last backup.  The fact that you haven't vacuumed will make
> this pretty easy, because the tuple ordering should be the same.
>
> If you do choose to recover by zeroing pages, it'd be a good idea to
> drop and recreate the indexes on the table.  Sooner or later you should
> do a vacuum to fix the zero pages, but not just yet --- you want to
> leave the tuples in their current ordering for comparison to your
> backup ...

I suspect diff will produce more output that I want to deal with.  Customer
records are never deleted from this table so I think the thing to do is copy
all customers from the previous good backup that are not in a cleaned up
customer table.  I will lose some edits but it should not be too bad.

Ok here is my plan:
1) Stop the server
2) Backup the physical customer file
3) Zero out all the corrupt pages.
4) Restart the database for localhost access only.
5) Dump the customer table
6) Reload the customer table from the dump (I know it is now clean)
7) Recreate the customer indexes.
8) Vacuum the customer table
9) Restart the database for normal access
10) Load the last good backup into a test database
11) Rename the test database customer table to custbackup.
12) Load the customer dump from above into the test database
13) Run a program on the test database to produce insert customer statements
for records in the custback but not in the customer table.
14) Apply the above insert statements to the live database.




pgsql-general by date:

Previous
From: James Hall
Date:
Subject: Unable to install DBD module
Next
From: Börkur Sigurbjörnsson
Date:
Subject: OR working strangely.