Thread: Btree or not btree? That is the question
We have a 8.3.18 system (yes, the same one from the previous thread, finally upgraded!) that gave us this error yesterday: ERROR: index "pg_class_oid_index" is not a btree It seems like this happened from just a normal SELECT query, and did not occur again (and this is a very busy system) The pg_class table does gets vacuumed and reindexed often. All appeared fine when I did some manual investigation. Since this only happened this one time, can I chalk it up to some random rare race condition having to do with a reindex? Or is there anything else I should be looking for or can proactively do? No other system catalog problems have been seen before it or since. -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
Greg Sabino Mullane <greg@endpoint.com> writes: > We have a 8.3.18 system (yes, the same one from the previous > thread, finally upgraded!) that gave us this error yesterday: > ERROR: index "pg_class_oid_index" is not a btree That means you got bogus data while reading the metapage. I'm beginning to wonder about the hardware on this server ... regards, tom lane
On Mon, Jun 04, 2012 at 02:09:44PM -0400, Tom Lane wrote: > Greg Sabino Mullane <greg@endpoint.com> writes: > > We have a 8.3.18 system (yes, the same one from the previous > > thread, finally upgraded!) that gave us this error yesterday: > > > ERROR: index "pg_class_oid_index" is not a btree > > That means you got bogus data while reading the metapage. > I'm beginning to wonder about the hardware on this server ... Thanks for the reply. Me too. This is on a cluster[1], so we do have the luxury of testing the hardware on each box and shuffling things around. I'm also thinking we need to get away from the near-constant updating of pg_class, just as a good measure. [1] Bruce, yet another usage! :) -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
> > ERROR: index "pg_class_oid_index" is not a btree > > That means you got bogus data while reading the metapage. > I'm beginning to wonder about the hardware on this server ... This happened again, and this time I went back through the logs and found that it is always the exact same query causing the issue. I also found it occuring on different servers, which rules out RAM anyway (still shared disk, so those are suspect). This query also sometimes gives errors like this: ERROR: could not read block 3 of relation 1663/1554846571/3925298284: read only 0 of 8192 bytes However, the final number changes: these are invariably temporary relations. The query itself is a GROUP BY over a large view and the explain plan is 107 rows, with nothing esoteric about it. Most of the tables used are fairly common ones. I'm trying to duplicate on a non-production box, without success so far, and I'm loath to run it on production as it sometimes causes multiple backends to freeze up and requires a forceful restart. Any ideas on how to carefully debug this? There are a couple of quicksorts when I explain analyze on a non-prod system, which I am guessing where the temp tables come from (work_mem is 24MB). I'm not sure I understand what could be causing both the 'read 0' and btree errors for the same query - bad blocks on disk for one of the underlying tables? I'll work next on checking each of the tables the view is using. -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
Greg Sabino Mullane <greg@endpoint.com> writes: >>> ERROR: index "pg_class_oid_index" is not a btree >> That means you got bogus data while reading the metapage. >> I'm beginning to wonder about the hardware on this server ... > This happened again, and this time I went back through > the logs and found that it is always the exact same query causing > the issue. I also found it occuring on different servers, > which rules out RAM anyway (still shared disk, so those are suspect). > This query also sometimes gives errors like this: > ERROR: could not read block 3 of relation 1663/1554846571/3925298284: > read only 0 of 8192 bytes > However, the final number changes: these are invariably temporary relations. Oh really ... okay, maybe it is a software problem then. > Any ideas on how to carefully debug this? There are a couple of quicksorts > when I explain analyze on a non-prod system, which I am guessing where > the temp tables come from (work_mem is 24MB). No, that error message is complaining about an attempt to read an actual, named, relation (could be temp, no way to be sure from this info). A sort might generate temp files but those don't have that kind of name. > I'm not sure I understand > what could be causing both the 'read 0' and btree errors for the > same query - bad blocks on disk for one of the underlying tables? The "could not read" thing looks more like an attempt to fetch an invalid TID; you could get such an error for instance if you had an index that referenced a tuple in block 3, but the table on disk isn't that long. So one possible theory for what's happening here is that once in a while we get confused about which shared buffer holds which disk block, and either find the wrong block entirely when looking for pg_class_oid_index (the first case) or grab a page of the wrong index in the second case. This theory would be more plausible if you're wrong about the second-case tables being temp, though, because if they are temp then their indexes would be kept in local buffers not shared buffers, making it much harder to believe in a single bug causing both symptoms. One possible mechanism for confusion of that sort would be if the spinlock code wasn't quite right, or the compiler was incorrectly moving loads/stores into or out of locked sections. So it might be time to ask exactly what kind of hardware this is, which compiler PG was built with, etc. On the other hand, if the issue were of that sort then it ought to affect all buffers more or less at random; so if you're consistently seeing exactly these symptoms (in particular, if it's always pg_class_oid_index that's complained of), then I'm not sure I believe this theory either. Which PG version again? Are you in the habit of doing VACUUM FULLs on system catalogs, and if so do these glitches correlate at all with such activities? regards, tom lane
I dug through the logs and found some other occurances of the "could not read block" errors. Some on dirt simple SELECT queries. Nothing else has generated the btree error yet. About 35 found in the last month. > This theory would be more plausible if you're wrong about the second-case > tables being temp, though, because if they are temp then their indexes > would be kept in local buffers not shared buffers, making it much harder > to believe in a single bug causing both symptoms. I grepped the last month of logs and found about 20 instances of that error: none of the relfilenodes given shows up in pg_class, even for that dirt simple SELECT. > One possible mechanism for confusion of that sort would be if the > spinlock code wasn't quite right, or the compiler was incorrectly > moving loads/stores into or out of locked sections. So it might be > time to ask exactly what kind of hardware this is, which compiler > PG was built with, etc. Quad core AMD Opteron. RHEL. Compiled with gcc with all the options (basically the standard compilation e.g. --build=x86_64-redhat-linux-gnu) I can give you more details offlist if it will help. > On the other hand, if the issue were of that sort then it ought to > affect all buffers more or less at random; so if you're consistently > seeing exactly these symptoms (in particular, if it's always > pg_class_oid_index that's complained of), then I'm not sure I believe > this theory either. I've never seen any other index for the btree error, but it has only happened a grand total of 3 times ever. The other error appears to be fairly random, except that the one particular query that gives the btree error always seems to give one version or the other. > Which PG version again? Are you in the habit of doing VACUUM FULLs > on system catalogs, and if so do these glitches correlate at all > with such activities? Heh. 8.3.18. Yes, very heavy vac fulls (and reindexes) of the system catalogs. Cron-driven, and depends on the time of day and if any DDL is running (if so, it does not run), but probably on average pg_class is vacfulled and reindexed twice an hour during the times this happens (which is, during normal business hours). There is a lot in churn in pg_class, pg_attribute, and pg_depend in particular from all the temp stuff being created and torn down all day, as well as some Bucardo pg_class updating. -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
I've expanded my searching a bit, to see if I can find any other correlations. One thing that seems to happen about 10 times a day is an error of this sort: ERROR: could not open relation with OID 1554847326 In this case, the OID in question always exists, and corresponds to one of a handful of particularly busy tables. Sometimes the query does not even touch the OID mentioned directly: in the above example, the SQL was an update to table A that had a FK to table B, and the OID above is for table B. The queries themselves vary: I've not found any common factor yet. These errors have been happening a long time, and obviously don't cause the same database-hosed-must-restart issue the btree does, but it is still a little disconcerting. Although 10 times out of > 20 million transactions per day is at least an extremely rare event :) It is definitely NOT correlated to system table reindexing, but does seem to be roughly correlated to how busy things are in general. We've not been able to duplicate on a non-prod test system yet either, which points to either hardware or (more likely) a failure to completely simulate the high activity level of prod. No idea if this related to the relatively recent btree errors, but figured I would get it out there. There is also an even rarer sprinkling of: ERROR: relation with OID 3924107573 does not exist but I figured that was probably a variant of the first error. -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
Greg Sabino Mullane <greg@endpoint.com> writes: > I've expanded my searching a bit, to see if I can find any other > correlations. One thing that seems to happen about 10 times a day > is an error of this sort: > ERROR: could not open relation with OID 1554847326 Is that the *entire* message? No details? Could you crank up the log verbosity so we can get file and line number, at least? regards, tom lane
> > ERROR: could not open relation with OID 1554847326 > > Is that the *entire* message? No details? Could you crank up the > log verbosity so we can get file and line number, at least? Yes, that's the entire thing, other than the statement line after it. In this particular case: STATEMENT: UPDATE mytable SET foobar = 'T' WHERE id IN ($1,$2,$3,$4) I'll start the wheels rolling for upping the verbosity. -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
> Could you crank up the log verbosity so we can get > file and line number, at least? First hit since the change: ERROR: XX000: could not open relation with OID 1554847444 LOCATION: relation_open, heapam.c:879 STATEMENT: SELECT ... Will leave the verbosity up and see if it occurs in the same place. -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
On Mon, Jul 09, 2012 at 04:02:13PM -0400, Tom Lane wrote: > ... Could you crank up the log verbosity so we can get > file and line number, at least? Here is what the increased verbosity reveals in aggregate. This is about an 18-hour span, covering 12.5M transactions, on version 8.3.18: (13 times) Various OIDs that do exist cannot be found: ERROR: XX000: could not open relation with OID 1554847444 LOCATION: relation_open, heapam.c:879 (21 times) Relations that do exist cannot be found: ERROR: 42P01: relation "foobar" does not exist LOCATION: RangeVarGetRelid, namespace.c:273 (1 time) Qualified relation that exists cannot be found: ERROR: 42P01: relation "public.foobar" does not exist LOCATION: RangeVarGetRelid, namespace.c:268 (5 times) Failure to read a block: XX001: could not read block 3 of relation 1663/1554846571/4184054438: read only 0 of 8192 bytes LOCATION: mdread, md.c:631 (5 times) Cache lookup failure: XX000: cache lookup failed for relation 1554847255 LOCATION: has_subclass, plancat.c:921 -- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8