Thread: Corrupt Table
I have apparently picked up a corrupt record in a table. What happend: Yesterday at one point the database seems to hang. There were three backend processes consuming large amounts of CPU time. I stopped the server and rebooted (3 months since last reboot). The database restarted and seemed to be fine. Then last night the nightly backups failed apparently when reading the 'customer' table. The database restarted itself. There have been a couple of database restarts since then. As far as I can tell it is the customer table that is the problem. Here is what a failure looks like in the log file: -------------------- Server process (pid 2864) exited with status 139 at Thu Sep 14 10:13:11 2000 Terminating any active server processes... 000914.10:13:11.425 [5879] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally.... ------------------- The last entry is repeated multiple times. I have written a small utility program (pganal). It looks for inconsistancies in page layout and tuple layout. My original intent was to parse the tuple internal structure as well but that proved to be more complex that I was ready to handle at the time. Anyway I stopped the database, copied the customer file to another directory and restarted the database. Here is the pganal output from this copy: -------------------------- Analyzing customer Page 25878 ERROR: pd_lower is too small pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 Page 31395 ERROR: pd_lower is too small pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 Page 32950 ERROR: pd_lower is too small pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 Tuple 71453.0 Ofs=8029 Len=164 Flags=1 Error: tuple overwrites pd_special Tuple 71453.4 Ofs=7346 Len=208 Flags=1 Error: tuple overlaps another Tuple 71453.40 Ofs=1365 Len=160 Flags=1 Error: tuple overlaps another Page 71958 ERROR: pd_lower has odd value pd_lower=11886 pd_upper=24239 pd_special=109 pd_opaque.od_pagesize=0 Page 73622 ERROR: pd_lower is too small pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 Page Summary Data Pages = 76555 Unused Pages = 0 New Pages = 0 Empty Pages = 0 Bad Pages = 5 Total Pages = 76560 Tuple Summary O/L Error Tuples = 1 Overlaped Tuples = 2 Unused Tuples = 47994 Used Tuples = 3698495 Total Tuples = 3746492 -------------------------- I suspect the 'pd_lower is too small' may be just my misunderstanding of the page layout. The three tuple errors (all on the same page) and the 'pd_lower has odd value' error seem to be real. 'pd_lower has odd value' comes from: int nitems = (phd->pd_lower - sizeof(*phd)) / sizeof(ItemIdData); if(nitems * sizeof(ItemIdData) != phd->pd_lower - sizeof(*phd)) pderr = "pd_lower has odd value"; Basically it means the pd_lower did not leave room for an integral number of ItemIDData structures. I seem to have two separate corrupt pages. I can post the full source to pganal if anyone is interested. Its about 300 lines. My question is how do I proceed from here. Going back to the previous day's backup would be very painful in terms of lost data. I suspect the answer is to perform surgery on the bad pages and then rebuild indexes but this is a scary idea. Has anyone else created tools to deal with this kind of problem? Bryan White, ArcaMax.com, VP of Technology You can't deny that it is not impossible, can you.
Here is a follow up. I did a hex/ascii dump of the 3 bad tuples. In the dump I could pick out an email address. This is an indexed field. I did a select on each of them in the live database. The 1st and 3rd were not found. The second worked ok if I only selected the customer id (an int4 and the first field in the record). The custid reported seems to be nonsense. The backend crashed if I selected the whole record.
"Bryan White" <bryan@arcamax.com> writes: > Here is what a failure looks like in the log file: > -------------------- > Server process (pid 2864) exited with status 139 at Thu Sep 14 10:13:11 2000 That should produce a coredump --- can you get a backtrace? > I have written a small utility program (pganal). It looks for > inconsistancies in page layout and tuple layout. My original intent was to > parse the tuple internal structure as well but that proved to be more > complex that I was ready to handle at the time. Cool; want to submit it as a contrib item? This sounds like something that could be gradually improved into a "fsck" kind of thing... > Anyway I stopped the database, copied the customer file to another directory > and restarted the database. Here is the pganal output from this copy: > -------------------------- > Analyzing customer > Page 25878 ERROR: pd_lower is too small > pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 > Page 31395 ERROR: pd_lower is too small > pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 > Page 32950 ERROR: pd_lower is too small > pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 > Tuple 71453.0 Ofs=8029 Len=164 Flags=1 Error: tuple overwrites pd_special > Tuple 71453.4 Ofs=7346 Len=208 Flags=1 Error: tuple overlaps another > Tuple 71453.40 Ofs=1365 Len=160 Flags=1 Error: tuple overlaps another > Page 71958 ERROR: pd_lower has odd value > pd_lower=11886 pd_upper=24239 pd_special=109 pd_opaque.od_pagesize=0 > Page 73622 ERROR: pd_lower is too small > pd_lower=0 pd_upper=0 pd_special=0 pd_opaque.od_pagesize=0 Hmm. The all-zero pages (I expect you'll find that 25878 etc are *all* zeroes, not just their header fields) look like an old bug wherein a newly-added page might not get initialized if the transaction that's adding the page aborts just after allocating the page. I thought I'd fixed that in 7.0 though. You are running 7.0.2 I hope? A VACUUM should patch up zero pages. I'm guessing that you haven't vacuumed this table in a long time... Page 71958 looks pretty badly corrupted --- you'll have to look at that and see if you can clean it up by hand. Something fishy about 71453 as well. Worst case, you could set these pages to all-zero by hand, and just lose the tuples thereon rather than the whole table. > My question is how do I proceed from here. Going back to the previous day's > backup would be very painful in terms of lost data. I suspect the answer is > to perform surgery on the bad pages and then rebuild indexes but this is a > scary idea. How fast does your app add/update tuples in this table? If you are lucky, the tuples in pages 71453 and 71958 might be available from your last successful backup, in which case trying to patch up the page contents by hand is probably a waste of effort. Zero those pages, dump out the current contents of the file with COPY, and start comparing that to your last backup. The fact that you haven't vacuumed will make this pretty easy, because the tuple ordering should be the same. If you do choose to recover by zeroing pages, it'd be a good idea to drop and recreate the indexes on the table. Sooner or later you should do a vacuum to fix the zero pages, but not just yet --- you want to leave the tuples in their current ordering for comparison to your backup ... regards, tom lane
> > Server process (pid 2864) exited with status 139 at Thu Sep 14 10:13:11 2000 > > That should produce a coredump --- can you get a backtrace? I found a core file. I am not all that familiar with gdb but the backtrace looks useless: #0 0x8064fb4 in ?? () #1 0x809da10 in ?? () #2 0x809e538 in ?? () #3 0x809e0c6 in ?? () #4 0x809e176 in ?? () #5 0x809e2dc in ?? () #6 0x809e5ab in ?? () #7 0x809e666 in ?? () #8 0x809eb00 in ?? () #9 0x80a1f90 in ?? () #10 0x809d3ae in ?? () #11 0x80a31a4 in ?? () #12 0x809d3b7 in ?? () #13 0x809c6b9 in ?? () #14 0x809bd0e in ?? () #15 0x80ec132 in ?? () #16 0x80ec19c in ?? () #17 0x80eadc7 in ?? () #18 0x80eaca7 in ?? () #19 0x80ebba2 in ?? () #20 0x80d61f2 in ?? () #21 0x80d5dd1 in ?? () #22 0x80d518a in ?? () #23 0x80d4c14 in ?? () #24 0x80ab736 in ?? () #25 0x401029cb in ?? () > Cool; want to submit it as a contrib item? This sounds like something > that could be gradually improved into a "fsck" kind of thing... Right now it is sort of 'hack it up as needed'. I will try and polish it up and add command line options to control it. >Hmm. The all-zero pages (I expect you'll find that 25878 etc are *all* >zeroes, not just their header fields) look like an old bug wherein >a newly-added page might not get initialized if the transaction that's >adding the page aborts just after allocating the page. I thought I'd >fixed that in 7.0 though. You are running 7.0.2 I hope? Yes I am running 7.0.2. The 4 pages in question have 0's in the first 16 bytes but other data after that. I see some text that look like real data. > A VACUUM should patch up zero pages. I'm guessing that you haven't > vacuumed this table in a long time... Vacuum occurs nightly just after the backup. I checked and it ran fine the night before. Last nights vacuum reported: psql: The Data Base System is in recovery mode > > Page 71958 looks pretty badly corrupted --- you'll have to look at that > and see if you can clean it up by hand. Something fishy about 71453 > as well. Worst case, you could set these pages to all-zero by hand, > and just lose the tuples thereon rather than the whole table. > > How fast does your app add/update tuples in this table? If you are > lucky, the tuples in pages 71453 and 71958 might be available from your > last successful backup, in which case trying to patch up the page > contents by hand is probably a waste of effort. Zero those pages, > dump out the current contents of the file with COPY, and start comparing > that to your last backup. The fact that you haven't vacuumed will make > this pretty easy, because the tuple ordering should be the same. > > If you do choose to recover by zeroing pages, it'd be a good idea to > drop and recreate the indexes on the table. Sooner or later you should > do a vacuum to fix the zero pages, but not just yet --- you want to > leave the tuples in their current ordering for comparison to your > backup ... I suspect diff will produce more output that I want to deal with. Customer records are never deleted from this table so I think the thing to do is copy all customers from the previous good backup that are not in a cleaned up customer table. I will lose some edits but it should not be too bad. Ok here is my plan: 1) Stop the server 2) Backup the physical customer file 3) Zero out all the corrupt pages. 4) Restart the database for localhost access only. 5) Dump the customer table 6) Reload the customer table from the dump (I know it is now clean) 7) Recreate the customer indexes. 8) Vacuum the customer table 9) Restart the database for normal access 10) Load the last good backup into a test database 11) Rename the test database customer table to custbackup. 12) Load the customer dump from above into the test database 13) Run a program on the test database to produce insert customer statements for records in the custback but not in the customer table. 14) Apply the above insert statements to the live database.
"Bryan White" <bryan@arcamax.com> writes: > I found a core file. I am not all that familiar with gdb but the backtrace > looks useless: > #0 0x8064fb4 in ?? () > #1 0x809da10 in ?? () Looks like you are running a stripped executable :-(. You might want to consider recompiling with debug symbols so we can get more info if this happens again. > Yes I am running 7.0.2. The 4 pages in question have 0's in the first 16 > bytes but other data after that. I see some text that look like real data. Oh, that's interesting. This isn't a previously known kind of failure. These dropouts must have occurred since your last vacuum, since vacuum would have thought that the pages are uninitialized and "fixed" them. Perhaps they have the same cause as the problems in the other two pages. I recall once having seen a similar kind of failure (aligned segments of pages suddenly becoming zeroes) that turned out to be from a hardware problem --- disk controller wasn't quite compatible with the motherboard, or something like that, and would occasionally transfer bad data to/from memory. I'm not ready to blame the hardware yet, but it's a possibility to keep in mind, particularly if you've changed the hardware setup recently. > Ok here is my plan: Seems reasonable. Good luck! regards, tom lane
Ok I nulled out the bad pages. A pg_dump still fails. I just noticed there are 21000 files in my database directory. Most of the form INDEXNAME.NUMBER where INDEXNAME is the name of one of my indexes and NUMBER is a sequential number. There are 4 or 5 different indexes involved. All of these files are 0 bytes in size. All dated in the last day or two. When I did the pg_dump I got this in the log file: 000914.18:00:07.600 [10406] FATAL 1: Memory exhausted in AllocSetAlloc() Smart Shutdown request at Thu Sep 14 18:07:15 2000 The dump died after putting 100MB in the output file. My guess is the internal structure of one of the tuples is corrupt. I have know idea what all the other files are for or if they one problem is the cause of the other. At this moment I am doing a tar of the database directory before I screw anything else up. Please help me.
"Bryan White" <bryan@arcamax.com> writes: > Ok I nulled out the bad pages. A pg_dump still fails. I just noticed there > are 21000 files in my database directory. Most of the form INDEXNAME.NUMBER > where INDEXNAME is the name of one of my indexes and NUMBER is a sequential > number. There are 4 or 5 different indexes involved. All of these files > are 0 bytes in size. All dated in the last day or two. This suggests corrupted pointers inside the indexes. I wouldn't worry too much about it, you have bigger problems :-(. The indexes are not what's keeping you from dumping the database, anyway. > When I did the pg_dump I got this in the log file: > 000914.18:00:07.600 [10406] FATAL 1: Memory exhausted in AllocSetAlloc() > Smart Shutdown request at Thu Sep 14 18:07:15 2000 > The dump died after putting 100MB in the output file. > My guess is the internal structure of one of the tuples is corrupt. So it would seem. Evidently there's at least one more corrupted page besides the ones you were able to identify before. What I did the last time I had to identify a corrupted tuple was to try SELECT tid,* FROM table LIMIT 1 OFFSET n and experiment with different values of n to home in on the corrupted tuple. The last tuple you can print this way without a crash is the one before the damaged tuple. The TID of that tuple gives you the block number it's in. regards, tom lane
> -----Original Message----- > From: Tom Lane > > "Bryan White" <bryan@arcamax.com> writes: > > Ok I nulled out the bad pages. A pg_dump still fails. I just > noticed there > > are 21000 files in my database directory. Most of the form > INDEXNAME.NUMBER > > where INDEXNAME is the name of one of my indexes and NUMBER is > a sequential > > number. There are 4 or 5 different indexes involved. All of > these files > > are 0 bytes in size. All dated in the last day or two. > > This suggests corrupted pointers inside the indexes. I wouldn't worry > too much about it, you have bigger problems :-(. The indexes are not > what's keeping you from dumping the database, anyway. > > > When I did the pg_dump I got this in the log file: > > 000914.18:00:07.600 [10406] FATAL 1: Memory exhausted in > AllocSetAlloc() > > Smart Shutdown request at Thu Sep 14 18:07:15 2000 > > > The dump died after putting 100MB in the output file. > > > My guess is the internal structure of one of the tuples is corrupt. > > So it would seem. Evidently there's at least one more corrupted page > besides the ones you were able to identify before. > > What I did the last time I had to identify a corrupted tuple was to try > SELECT tid,* FROM table LIMIT 1 OFFSET n ^^^^ ctid intead of tid ? Hiroshi Inoue
I said: > What I did the last time I had to identify a corrupted tuple was to try > SELECT tid,* FROM table LIMIT 1 OFFSET n Er, make that "ctid" ... sorry for the error ... regards, tom lane
> > My guess is the internal structure of one of the tuples is corrupt. > > So it would seem. Evidently there's at least one more corrupted page > besides the ones you were able to identify before. I punted last night and reloaded the customer table from a backup. Turns out I could recreate the most critical records from other sources. Life goes on. I would still like to refine my pganal tool to look inside of tuples. Where should I look to find information about the internal structure? Is it parsable at some level on its own or do I have to consult the system tables to determine stucture. I suspect this might get more complicated once TOAST is available. Bryan White
"Bryan White" <bryan@arcamax.com> writes: > I would still like to refine my pganal tool to look inside of tuples. Where > should I look to find information about the internal structure? Is it > parsable at some level on its own or do I have to consult the system tables > to determine stucture. I suspect this might get more complicated once TOAST > is available. The tuple layout is basically header bitmap of which fields are null values for non-null fields The header is type HeapTupleHeaderData defined in src/include/access/htup.h. The bitmap is omitted if the header's infomask shows the tuple contains no nulls; otherwise its length in bits is the same as the t_natts field of the header. The data values are uninterpretable without looking up the set of column datatypes for the table... regards, tom lane