Thread: right sibling is not next child
Apologies if this is a duplicate, but my original post stalled and I noticed I had omitted the postgres version, which you will want. I'm reporting this as a PostgreSQL bug because it involves an index corruption. I can't see any other way our application should be able to corrupt an index. I will attach the tail of the log when the corruption was detected (and the postmaster shut itself down), as well as the subsequent attempt to start. Fortunately we run our web site off of a farm of four database servers, so we are taking one of the others out of the mix, stopping postmaster, and copying its data directory over to this machine for recovery, so we don't need advice on that aspect of things; but, we'd like to do what we can to help track down the cause to prevent a recurrence. We have renamed the data directory to make room for recovery at the normal location, but otherwise the failing data directory structure is unmodified. For context, this is running on Windows 2003 Server. Eight Xeon box, no HT, 6 GB RAM, 13 drive 15,000 RPM RAID5 array through battery backed controller for everything. This database is about 180 GB with about 300 tables. We are running 8.1.3 modified with a patch we have submitted (pending review last I saw) to implement the standard_conforming_strings TODO. We have autovacuum running every ten seconds because of a few very small tables with very high update rates, and we have a scheduled VACUUM ANALYZE VERBOSE every night. It appears that last night's vacuum found the problem, which the previous night's vacuum didn't. We had some event which started at 14:25 yesterday which persisted until we restarted the middle tier at 15:04. The symptom was that a fraction of the queries which normally run in a few ms were timing out on a 20 second limit. pg_locks showed no blocking. We've been getting episodes with these symptoms occassionally, but they have only lasted a minute or two; this duration was unusual. We haven't identified a cause. One odd thing is that with the number of queries per second that we run, the number of timeouts during an episode is too small to support the notion that _all_ similar queries are failing. How best to proceed? -Kevin
Attachment
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > [2006-04-06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)> PANIC: right sibling is not next child in "Panel_pkey" This should be repeatable by re-attempting a VACUUM, right? Please find out which page exactly it's unhappy about (either gdb the crash or add a printout of the "parent" variable to the elog call in nbtpage.c), then pg_filedump the index and look to see what the index contains. regards, tom lane
Right now the postmaster refuses to start. What is the best way to get past that to try what you suggest? [2006-04-06 07:22:50.347 ] 3984 LOG: database system was interrupted while in recovery at 2006-04-06 02:19:59 Central Daylight Time [2006-04-06 07:22:50.347 ] 3984 HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. [2006-04-06 07:22:50.347 ] 3984 LOG: checkpoint record is at F6/50052F10 [2006-04-06 07:22:50.347 ] 3984 LOG: redo record is at F6/50052F10; undo record is at 0/0; shutdown FALSE [2006-04-06 07:22:50.347 ] 3984 LOG: next transaction ID: 313381941; next OID: 2321597 [2006-04-06 07:22:50.347 ] 3984 LOG: next MultiXactId: 1; next MultiXactOffset: 0 [2006-04-06 07:22:50.347 ] 3984 LOG: database system was not properly shut down; automatic recovery in progress [2006-04-06 07:22:50.347 ] 3984 LOG: redo starts at F6/50052F58 [2006-04-06 07:22:50.347 ] 4076 <bigbird bigbird 127.0.0.1(2236)> FATAL: the database system is starting up [2006-04-06 07:22:50.347 ] 4064 <bigbird bigbird 127.0.0.1(2235)> FATAL: the database system is starting up [2006-04-06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block -Kevin >>> On Thu, Apr 6, 2006 at 12:40 pm, in message <25774.1144345220@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> [2006- 04- 06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)> PANIC: right > sibling is not next child in "Panel_pkey" > > This should be repeatable by re- attempting a VACUUM, right? Please find > out which page exactly it's unhappy about (either gdb the crash or add a > printout of the "parent" variable to the elog call in nbtpage.c), then > pg_filedump the index and look to see what the index contains. > > regards, tom lane
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Right now the postmaster refuses to start. What is the best way to get > past that to try what you suggest? > [2006-04-06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block Hm, did this start happening immediately after the other problem? That would suggest that you've got worse problems than just a corrupt index. You weren't by any chance running with full_page_writes = off were you? You could get past the startup failure with pg_resetxlog, but it's not clear whether you'd have a consistent database afterward. What I'd suggest first is saving a copy of the entire $PGDATA tree for forensic purposes (not to mention being able to go back to that state if you need to). Is there any chance of letting someone else have a look at the database contents? Otherwise you're going to have to do your own sleuthing to figure out what went wrong ... regards, tom lane
>>> On Thu, Apr 6, 2006 at 12:57 pm, in message <25913.1144346246@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Right now the postmaster refuses to start. What is the best way to get >> past that to try what you suggest? > >> [2006- 04- 06 07:22:50.378 ] 3984 PANIC: heap_clean_redo: no block > > Hm, did this start happening immediately after the other problem? This started happening on the first attempt to start the postmaster after the other error, which left the postmaster down, apparently after a failed restart attempt. > That would suggest that you've got worse problems than just a corrupt > index. You weren't by any chance running with full_page_writes = off > were you? Yes we were. Apparently I have misunderstood the implications of this. Somehow I had convinced myself that this setting was relatively safe in our environment, due to our battery-backed controllers. I'd convinced myself, after reading carefully through the documentation of this setting, that I would be OK as long as that functioned correctly, and have problems regardless of this setting if it didn't. If you show me where I went wrong, maybe I can suggest a patch to the docs to prevent others from going down the wrong path in this regard. (Of course, maybe it's all there and I just had a bad day when I thought this through.) > You could get past the startup failure with pg_resetxlog, but it's not > clear whether you'd have a consistent database afterward. What I'd > suggest first is saving a copy of the entire $PGDATA tree for forensic > purposes We already have this forensic copy and a replacement production copy on this box. I think we'll need to copy to another box to get a second forensic copy, to avoid risking an out-of-space condition. That can be done, but it'll take a few hours. > (not to mention being able to go back to that state if you need > to). That's not an issue for production purposes. > Is there any chance of letting someone else have a look at the database > contents? There is a lot of data in the database which is confidential by law. I'd have to jump through a lot of hoops to get anyone to even consider letting me ship it off site. If you're asking whether you could access in to our site, that might be arranged. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You weren't by any chance running with full_page_writes = off >> were you? > Yes we were. Apparently I have misunderstood the implications of this. So had we all :-(. It just plain doesn't work in 8.1.*, and will be disabled in 8.1.4 --- see discussion last week. It might or might not get resurrected in 8.2, depending on how messy it seems to be to fix. Anyway, that explains your "heap_clean_redo: no block" failure. I think you're stuck risking a pg_resetxlog to try to get back into the database. If that results in a hopelessly corrupt database, we can try modifying the WAL replay code to not consider this a fatal error, and see if that produces anything we can use for debugging. I'm glad this isn't your only copy of the database ... > There is a lot of data in the database which is confidential by law. > I'd have to jump through a lot of hoops to get anyone to even consider > letting me ship it off site. If you're asking whether you could access > in to our site, that might be arranged. It sounds like the DB is too big to consider shipping anywhere anyway. As long as you're comfortable doing stuff like pg_filedump and modifying the code to get more debug info, we can proceed without getting into the question of remote access. regards, tom lane
>>> On Thu, Apr 6, 2006 at 1:26 pm, in message <26154.1144348014@sss.pgh.pa.us>, > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> You weren't by any chance running with full_page_writes = off >>> were you? > >> Yes we were. Apparently I have misunderstood the implications of this. > > So had we all :- (. It just plain doesn't work in 8.1.*, and will be > disabled in 8.1.4 --- see discussion last week. Dang! I've not been able to keep up with the lists, and I missed that thread. I'll change that setting as soon as possible. > Anyway, that explains your "heap_clean_redo: no block" failure. I think > you're stuck risking a pg_resetxlog to try to get back into the > database. If that results in a hopelessly corrupt database, we can try > modifying the WAL replay code to not consider this a fatal error, and > see if that produces anything we can use for debugging. Will do. Before I do that, though, is it worth making a copy? (The down side is primarily the delay of copying it.) > I'm glad this isn't your only copy of the database ... We go beyond being a belt-and-suspenders shop. Think staples and glue-gun, too. ;-) This is one of four copies of what is redundant data to start with, and we have backups. So recovery is no sweat, but we got emails about timeouts from the public during this incident, so the underlying bug is important to us. > As long as you're comfortable doing stuff like pg_filedump and modifying > the code to get more debug info, we can proceed without getting into the > question of remote access. OK. If you later think we do need to go that direction, send me an email off-list. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyway, that explains your "heap_clean_redo: no block" failure. I think >> you're stuck risking a pg_resetxlog to try to get back into the >> database. > Will do. Before I do that, though, is it worth making a copy? (The > down side is primarily the delay of copying it.) Yeah, I think you'd better --- pg_resetxlog is destructive, and if we find the database unusable for debugging afterward, we'll need that copy. regards, tom lane