Re: Race-condition with failed block-write? - Mailing list pgsql-bugs
From | Arjen van der Meijden |
---|---|
Subject | Re: Race-condition with failed block-write? |
Date | |
Msg-id | 43270FAA.20301@tweakers.net Whole thread Raw |
In response to | Re: Race-condition with failed block-write? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
On 13-9-2005 16:25, Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: > > It's highly unlikely that that query has anything to do with it, since > it's not touching anything but system catalogs and not trying to write > them either. Indeed, other things trigger it as well. > The first thing you ought to find out is which table > 1663/2013826/9975789 is, and look to see if the corrupted LSN value is > already present on disk in that block. Well, its an index, not a table. It was the index: "pg_class_relname_nsp_index" on pg_class(relname, relnamespace). Using pg_filedump I extracted the LSN for block 21 and indeed, that was already 67713428 instead of something below 2E73E53C. It wasn't that block alone though, here are a few LSN-lines from it: LSN: logid 41 recoff 0x676f5174 Special 8176 (0x1ff0) LSN: logid 25 recoff 0x3c6c5504 Special 8176 (0x1ff0) LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) LSN: logid 41 recoff 0x2ea88190 Special 8176 (0x1ff0) LSN: logid 1 recoff 0x68e2f660 Special 8176 (0x1ff0) LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) LSN: logid 1 recoff 0x68e2f6a4 Special 8176 (0x1ff0) I tried other files and each one I tried only had LSN's of 0. When trying (\d indexname in psql) to determine to which table that index belonged I noticed it got the errors again, but for another file (pg_index this time). And another try (oid2name ...) after that, yet another file (the pg_class-table). All those files where last changed somewhere August 25, so now new changes. On that day I did some active query-tuning, but a few times it took too long, so I issued immediate shut downs when the selects took too long. There were no warnings about broken records afterwards in the log though, so I don't believe anything got damaged afterwards. After that I loaded some fresh data from a production-database using either pg_restore or psql < some-file-from-pg_dump.sql (I don't know which one anymore). A few days later I shut down that postgres, installed 8.1-beta and used that (in another directory of course), this 8.0.3 only came back up because of a reboot and wasn't used since that reboot. I guess, during that reloading those system tables got mixed up? > If it is, then we've probably > not got much chance of finding out how it got there. If it is *not* on > disk, but you have a repeatable way of causing this to happen starting > from a clean postmaster start, then that's pretty interesting --- but > I don't know any way of figuring it out short of groveling through the > code with a debugger. If you're not already pretty familiar with the PG > code, coaching you remotely isn't going to work very well :-(. I'd be > glad to look into it if you can get me access to the machine though. Well, I can very probably give you that access. But as you say, finding out was went wrong is very hard to do. Best regards, Arjen van der Meijden
pgsql-bugs by date: