I've expanded my searching a bit, to see if I can find any other
correlations. One thing that seems to happen about 10 times a day
is an error of this sort:
ERROR: could not open relation with OID 1554847326
In this case, the OID in question always exists, and corresponds to
one of a handful of particularly busy tables. Sometimes the query
does not even touch the OID mentioned directly: in the above example,
the SQL was an update to table A that had a FK to table B, and the
OID above is for table B. The queries themselves vary: I've not found any
common factor yet.
These errors have been happening a long time, and obviously don't cause the
same database-hosed-must-restart issue the btree does, but it is still
a little disconcerting. Although 10 times out of > 20 million transactions
per day is at least an extremely rare event :) It is definitely NOT correlated to
system table reindexing, but does seem to be roughly correlated to how busy
things are in general. We've not been able to duplicate on a non-prod test
system yet either, which points to either hardware or (more likely) a failure
to completely simulate the high activity level of prod.
No idea if this related to the relatively recent btree errors, but figured
I would get it out there. There is also an even rarer sprinkling of:
ERROR: relation with OID 3924107573 does not exist
but I figured that was probably a variant of the first error.
--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8