Thread: Database corruption?
Hello: I didn't think I would live to see a corrupted database, but I now have. I don't know how it happened; it looks like some table got corrupted during VACUUM and now the database won't start. The serverlog shows: invoking IpcMemoryCreate(size=3203072) FindExec: found "/usr/local/pgsql/bin/postmaster" using argv[0] DEBUG: database system shutdown was interrupted at 2001-10-22 01:03:37 CLST DEBUG: CheckPoint record at (13, 3399750448) DEBUG: Redo record at (13, 3399750448); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 20960076; NextOid: 41447617 DEBUG: database system was not properly shut down; automatic recovery in progress... DEBUG: redo starts at (13, 3399750512) REDO @ 13/3399750512; LSN 13/3399750548: prev 13/3399750448; xprev 0/0; xid 20960086: XLOG - nextOid: 41455809 REDO @ 13/3399750548; LSN 13/3399758820: prev 13/3399750512; xprev 0/0; xid 20960086; bkpb 1: Heap - insert: node 16283895/16287107;tid 333/97 REDO @ 13/3399758820; LSN 13/3399767092: prev 13/3399750548; xprev 13/3399750548; xid 20960086; bkpb 1: Btree - insert: node16283895/23651833; tid 195/257 REDO @ 13/3399767092; LSN 13/3399767164: prev 13/3399758820; xprev 13/3399758820; xid 20960086: Heap - insert: node 16283895/16287107;tid 333/98 REDO @ 13/3399767164; LSN 13/3399767228: prev 13/3399767092; xprev 13/3399767092; xid 20960086: Btree - insert: node 16283895/23651833;tid 195/258 REDO @ 13/3399767228; LSN 13/3399767300: prev 13/3399767164; xprev 13/3399767164; xid 20960086: Heap - insert: node 16283895/16287107;tid 333/99 REDO @ 13/3399767300; LSN 13/3399767364: prev 13/3399767228; xprev 13/3399767228; xid 20960086: Btree - insert: node 16283895/23651833;tid 195/259 REDO @ 13/3399767364; LSN 13/3399767448: prev 13/3399767300; xprev 13/3399767300; xid 20960086: Heap - update: node 16283895/16287620;tid 428/79; new 428/129 REDO @ 13/3399767448; LSN 13/3399775720: prev 13/3399767364; xprev 13/3399767364; xid 20960086; bkpb 1: Btree - insert: node16283895/23651923; tid 4/2 REDO @ 13/3399775720; LSN 13/3399775780: prev 13/3399767448; xprev 13/3399767448; xid 20960086: Btree - insert: node 16283895/23651926;tid 233/6 DEBUG: ReadRecord: record with zero len at (13, 3399775780) DEBUG: redo done at (13, 3399775720) XLogFlush: rqst 13/3399767300; wrt 13/3399775780; flsh 13/3399775780 XLogFlush: rqst 13/3399767364; wrt 13/3399775780; flsh 13/3399775780 XLogFlush: rqst 13/3400103600; wrt 13/3399775780; flsh 13/3399775780 FATAL 2: XLogFlush: request is not satisfied DEBUG: proc_exit(2) DEBUG: shmem_exit(2) DEBUG: exit(2) /usr/local/pgsql/bin/postmaster: reaping dead processes... /usr/local/pgsql/bin/postmaster: Startup proc 3855 exited with status 512 - abort /usr/local/pgsql/bin/postmaster: PostmasterMain: initial environ dump: [blah] And here is a backtrace taken from a core file I found laying around, which has a timestamp makes me think it has something to say: (gdb) bt #0 0x4018cbf4 in memmove () from /lib/libc.so.6 #1 0x08100f85 in PageRepairFragmentation () #2 0x080ae9a7 in scan_heap () #3 0x080adfb4 in vacuum_rel () #4 0x080adbee in vac_vacuum () #5 0x080adb68 in vacuum () #6 0x08105c72 in ProcessUtility () #7 0x081039d9 in pg_exec_query_string () #8 0x08104adb in PostgresMain () #9 0x080ee7e4 in DoBackend () #10 0x080ee3c5 in BackendStartup () #11 0x080ed599 in ServerLoop () #12 0x080ecfa6 in PostmasterMain () #13 0x080ccb8f in main () #14 0x401231f0 in __libc_start_main () from /lib/libc.so.6 The database has been running for months without trouble. I'm now trying desperate measures, but I fear I will have to restore from backup (a week old). I have taken a tarball of the complete location (pg_xlog included and all that stuff) if anyone wants to see it (but it's 2 GB). I think I know what table is dead, but I don't know what to do with that information :-( In the serverlog, I see DEBUG: --Relation delay_171-- NOTICE: Rel delay_171: TID 15502/4279: OID IS INVALID. TUPGONE 0. NOTICE: Rel delay_171: TID 15502/4291: OID IS INVALID. TUPGONE 1. NOTICE: Rel delay_171: TID 15502/4315: OID IS INVALID. TUPGONE 1. NOTICE: Rel delay_171: TID 15502/4375: OID IS INVALID. TUPGONE 0. NOTICE: Rel delay_171: TID 15502/4723: OID IS INVALID. TUPGONE 1. NOTICE: Rel delay_171: TID 15502/4771: OID IS INVALID. TUPGONE 0. NOTICE: Rel delay_171: TID 15502/4783: OID IS INVALID. TUPGONE 0. NOTICE: Rel delay_171: TID 15502/4831: OID IS INVALID. TUPGONE 1. NOTICE: Rel delay_171: TID 15502/4843: OID IS INVALID. TUPGONE 0. NOTICE: Rel delay_171: TID 15502/4867: InsertTransactionInProgress 0 - can't shrink relation NOTICE: Rel delay_171: TID 15502/4867: OID IS INVALID. TUPGONE 0. [a lot similarly looking lines] NOTICE: Rel delay_171: TID 15502/6067: OID IS INVALID. TUPGONE 0. Server process (pid 22773) exited with status 139 at Sun Oct 21 02:30:27 2001 Terminating any active server processes... NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. (this is way before the lines from the serverlog I showed earlier) I really don't know what to do from here. -- Alvaro Herrera (<alvherre[@]atentus.com>) "La rebeldia es la virtud original del hombre" (Arthur Schopenhauer)
Alvaro Herrera <alvherre@atentus.com> writes: > FATAL 2: XLogFlush: request is not satisfied We had a previous report of this same failure message --- see the thread starting at http://fts.postgresql.org/db/mw/msg.html?mid=1033586 > And here is a backtrace taken from a core file I found laying around, > which has a timestamp makes me think it has something to say: > (gdb) bt > #0 0x4018cbf4 in memmove () from /lib/libc.so.6 > #1 0x08100f85 in PageRepairFragmentation () > #2 0x080ae9a7 in scan_heap () > #3 0x080adfb4 in vacuum_rel () > #4 0x080adbee in vac_vacuum () > #5 0x080adb68 in vacuum () It would be useful to look into that too, for sure, but I think it is probably not related to your XLog problem. > The database has been running for months without trouble. I'm now trying > desperate measures, but I fear I will have to restore from backup (a week > old). I have taken a tarball of the complete location (pg_xlog included and > all that stuff) if anyone wants to see it (but it's 2 GB). As I said to Denis in the earlier thread, it would be good to try to track down which page is corrupted and maybe then we'd understand how it got that way. Since you have the database tarball, you have the raw material to look into it --- you'd need to rebuild Postgres with debug symbols enabled and trace back from the failure points to learn more. Are you up to that, or could you grant access to your machine to someone who is? As for your immediate problem, I'd counsel reducing that elog(STOP) to elog(DEBUG) so that you can bring the database up, and then you can try to pg_dump your current data. You'll probably still want to re-initdb and restore once you get a consistent dump. Um, Vadim? Still of the opinion that elog(STOP) is a good idea here? That's two people now for whom that decision has turned localized corruption into complete database failure. I don't think it's a good tradeoff. regards, tom lane
> The database has been running for months without trouble. I'm > now trying desperate measures, but I fear I will have to restore > from backup (a week old). I have taken a tarball of the complete > location (pg_xlog included and all that stuff) if anyone wants > to see it (but it's 2 GB). 1. PostgreSQL version? 2. OS+hardware? Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> Um, Vadim? Still of the opinion that elog(STOP) is a good idea here? >> That's two people now for whom that decision has turned localized >> corruption into complete database failure. I don't think it's a good >> tradeoff. > One is able to use pg_resetxlog so I don't see point in removing elog(STOP) > there. What do you think? Well, pg_resetxlog would get around the symptom, but at the cost of possibly losing updates that are further along in the xlog than the update for the corrupted page. (I'm assuming that the problem here is a page with a corrupt LSN.) I think it's better to treat flush request past end of log as a DEBUG or NOTICE condition and keep going. Sure, it indicates badness somewhere, but we should try to have some robustness in the face of that badness. I do not see any reason why XLOG has to declare defeat and go home because of this condition. regards, tom lane
> As I said to Denis in the earlier thread, it would be good to try to > track down which page is corrupted and maybe then we'd understand how > it got that way. Since you have the database tarball, you have the > raw material to look into it --- you'd need to rebuild Postgres with > debug symbols enabled and trace back from the failure points to learn > more. Are you up to that, or could you grant access to your > machine to someone who is? I have something to report about Denis problem but had no time so far. > As for your immediate problem, I'd counsel reducing that elog(STOP) to > elog(DEBUG) so that you can bring the database up, and then you can > try to pg_dump your current data. You'll probably still want to > re-initdb and restore once you get a consistent dump. > > Um, Vadim? Still of the opinion that elog(STOP) is a good idea here? > That's two people now for whom that decision has turned localized > corruption into complete database failure. I don't think it's a good > tradeoff. One is able to use pg_resetxlog so I don't see point in removing elog(STOP) there. What do you think? Vadim
On Mon, 22 Oct 2001, Tom Lane wrote: > Alvaro Herrera <alvherre@atentus.com> writes: > > FATAL 2: XLogFlush: request is not satisfied > > We had a previous report of this same failure message --- see > the thread starting at > http://fts.postgresql.org/db/mw/msg.html?mid=1033586 All right, I'll give it a try next week. > Since you have the database tarball, you have the > raw material to look into it --- you'd need to rebuild Postgres with > debug symbols enabled and trace back from the failure points to learn > more. Are you up to that, or could you grant access to your machine to > someone who is? I am. I'll rebuild and then maybe you'll see what gives. > Um, Vadim? Still of the opinion that elog(STOP) is a good idea here? > That's two people now for whom that decision has turned localized > corruption into complete database failure. I don't think it's a good > tradeoff. Well, if there's this reset xlog bussiness I don't think that'd be necessary: just point it out. Had I known... but now the problem is solved. -- Alvaro Herrera (<alvherre[@]atentus.com>) "Cuando no hay humildad las personas se degradan" (A. Christie)
> >> Um, Vadim? Still of the opinion that elog(STOP) is a good > >> idea here? That's two people now for whom that decision has > >> turned localized corruption into complete database failure. > >> I don't think it's a good tradeoff. > > > One is able to use pg_resetxlog so I don't see point in > > removing elog(STOP) there. What do you think? > > Well, pg_resetxlog would get around the symptom, but at the cost of > possibly losing updates that are further along in the xlog than the > update for the corrupted page. (I'm assuming that the problem here > is a page with a corrupt LSN.) I think it's better to treat flush ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ On restart, entire content of all modified after last checkpoint pages should be restored from WAL. In Denis case it looks like newly allocated for update page was somehow corrupted before heapam.c:2235 (7.1.2 src) and so there was no XLOG_HEAP_INIT_PAGE flag in WAL record => page content was not initialized on restart. Denis reported system crash - very likely due to memory problem. > request past end of log as a DEBUG or NOTICE condition and keep going. > Sure, it indicates badness somewhere, but we should try to have some > robustness in the face of that badness. I do not see any reason why > XLOG has to declare defeat and go home because of this condition. Ok - what about setting some flag there on restart and abort restart after all records from WAL applied? So DBA will have choice either to run pg_resetxlog after that and try to dump data or restore from old backup. I still object just NOTICE there - easy to miss it. And in normal processing mode I'd leave elog(STOP) there. Vadim P.S. Further discussions will be in hackers-list, sorry.
On Mon, 22 Oct 2001, Tom Lane wrote: > Alvaro Herrera <alvherre@atentus.com> writes: > > The database has been running for months without trouble. I'm now trying > > desperate measures, but I fear I will have to restore from backup (a week > > old). I have taken a tarball of the complete location (pg_xlog included and > > all that stuff) if anyone wants to see it (but it's 2 GB). > > As I said to Denis in the earlier thread, it would be good to try to > track down which page is corrupted and maybe then we'd understand how > it got that way. Since you have the database tarball, you have the > raw material to look into it --- you'd need to rebuild Postgres with > debug symbols enabled and trace back from the failure points to learn > more. Are you up to that, or could you grant access to your machine to > someone who is? The problem ended up being bad RAM (it was good two weeks ago). I don't think it's of any use to do anything else with the database. Sorry for the noise. If you think it's useful anyway, I can recompile and see what's up. Do you think so? -- Alvaro Herrera (<alvherre[@]atentus.com>) "Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)
Alvaro Herrera <alvherre@atentus.com> writes: > The problem ended up being bad RAM (it was good two weeks ago). I don't > think it's of any use to do anything else with the database. Ah so. Thanks for following up. It may be unthinkable hubris to say this, but ... I am starting to notice that a larger and larger fraction of serious trouble reports ultimately trace to hardware failures, not software bugs. Seems we've done a good job getting data-corruption bugs out of Postgres. Perhaps we should reconsider the notion of keeping CRC checksums on data pages. Not sure what we could do to defend against bad RAM, however. regards, tom lane
> It may be unthinkable hubris to say this, but ... I am starting to > notice that a larger and larger fraction of serious trouble reports > ultimately trace to hardware failures, not software bugs. Seems we've > done a good job getting data-corruption bugs out of Postgres. I can say from personal experience that since 7.0 I haven't had a single instance of corruption... I had quite a few troubles with pre-7.0 version but you guys have kicked some serious butt in getting those bugs out of PG.. As always, excellent work gentlemen.. -Mitch
Tom Lane wrote: > Alvaro Herrera <alvherre@atentus.com> writes: > >>The problem ended up being bad RAM (it was good two weeks ago). I don't >>think it's of any use to do anything else with the database. >> > > Ah so. Thanks for following up. > > It may be unthinkable hubris to say this, but ... I am starting to > notice that a larger and larger fraction of serious trouble reports > ultimately trace to hardware failures, not software bugs. Seems we've > done a good job getting data-corruption bugs out of Postgres. > I am certainly a member of the "hardware cause db corruption" family, as it has happened to me more than once. RAM and power supplies seem to be the usual suspects. Has anyone else noticed that only a few years ago it seemed like hardware was more robust? Or is that just fanciful projections of the "good ol' days"? -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham Extracta Moléculas Naturais, Rio de Janeiro, Brasil email: lapham@extracta.com.br web: http://www.extracta.com.br/ ***-*--*----*-------*------------*--------------------*---------------
> It may be unthinkable hubris to say this, but ... I am starting to > notice that a larger and larger fraction of serious trouble reports > ultimately trace to hardware failures, not software bugs. Seems we've > done a good job getting data-corruption bugs out of Postgres. > > Perhaps we should reconsider the notion of keeping CRC checksums on > data pages. Not sure what we could do to defend against bad RAM, > however. Good idea. I have been troubled by a really strange problem. Populating with huge data (~7GB) cause random failures, for example a misterious unique constaraint violation, count(*) shows incorrect number, pg_temp* suddenly disappear (the table in question is a temporary table). These are really hard to reproduce and happen on 7.0 to current, virtually any PostgreSQL releases. Even on an identical system, the problems are sometimes gone after re-initdb... I now suspect that some hardware failures might be the source of the trouble. Problem is, I see no sign so far from the standard system logs, such as syslog or messages. It would be really nice if PostgreSQL could be protected from such hardware failures using CRC or whatever... -- Tatsuo Ishii
On Wed, 31 Oct 2001, Tatsuo Ishii wrote: > > It may be unthinkable hubris to say this, but ... I am starting to > > notice that a larger and larger fraction of serious trouble reports > > ultimately trace to hardware failures, not software bugs. Seems we've > > done a good job getting data-corruption bugs out of Postgres. > > > > Perhaps we should reconsider the notion of keeping CRC checksums on > > data pages. Not sure what we could do to defend against bad RAM, > > however. Maybe not defend against it, but at least you can detect and warn the user that something is likely to go wrong. > I have been troubled by a really strange problem. Populating with huge > data (~7GB) cause random failures, for example a misterious unique > constaraint violation, count(*) shows incorrect number, pg_temp* > suddenly disappear (the table in question is a temporary table). Remember the guy who had to change relnatts by hand to get a table back on line? It was bad RAM. One may wonder just how big the coincidence was to get exactly that bit changed... Well, a bad CRC checksum would've warned him right away. -- Alvaro Herrera (<alvherre[@]atentus.com>) "Si quieres ser creativo, aprende el arte de perder el tiempo"
> I have been troubled by a really strange problem. Populating with huge > data (~7GB) cause random failures, for example a misterious unique > constaraint violation, count(*) shows incorrect number, pg_temp* > suddenly disappear (the table in question is a temporary table). These > are really hard to reproduce and happen on 7.0 to current, virtually > any PostgreSQL releases. Even on an identical system, the problems are > sometimes gone after re-initdb... > > I now suspect that some hardware failures might be the source of the > trouble. Problem is, I see no sign so far from the standard system > logs, such as syslog or messages. > > It would be really nice if PostgreSQL could be protected from such > hardware failures using CRC or whatever... At a minimum, for cases where hardware problems are suspected, we should have some CRC detection code we can turn on. Tatsuo, does --enable-cassert help? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > Perhaps we should reconsider the notion of keeping CRC checksums on > > data pages. Not sure what we could do to defend against bad RAM, > > however. > > Good idea. I third that. Machines are very fast today. Reliability is vastly more important than speed in many cases. Let's say that CRC introduces a 20% slowdown (which is probably an overstatement). This means that, to get the same speed, we will have to spend 20% more on the hardware. Let's say that this is a top-end server, so we have to spend $12k instead of $10k. Is $2k worth it for better data reliability? Absolutely! No question about it.
Error-correcting RAM modules for PCs are becoming very cheap these days.. I think anyone doing any serious work on a computer should be using them now to avoid problems like this. - Andrew
> At a minimum, for cases where hardware problems are suspected, we should > have some CRC detection code we can turn on. > > Tatsuo, does --enable-cassert help? Not trying yet. Insted we are running Purify to see if it detects something going wrong. It will take long time... -- Tatsuo Ishii
> > Error-correcting RAM modules for PCs are becoming very cheap these days.. I > think anyone doing any serious work on a computer should be using them now > to avoid problems like this. I think any error-detection code we add would be optional and default to off. Also, rememeber it is error-detection, usually not error correction. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026