I dug through the logs and found some other occurances of
the "could not read block" errors. Some on dirt simple
SELECT queries. Nothing else has generated the btree
error yet. About 35 found in the last month.
> This theory would be more plausible if you're wrong about the second-case
> tables being temp, though, because if they are temp then their indexes
> would be kept in local buffers not shared buffers, making it much harder
> to believe in a single bug causing both symptoms.
I grepped the last month of logs and found about 20 instances of
that error: none of the relfilenodes given shows up in pg_class,
even for that dirt simple SELECT.
> One possible mechanism for confusion of that sort would be if the
> spinlock code wasn't quite right, or the compiler was incorrectly
> moving loads/stores into or out of locked sections. So it might be
> time to ask exactly what kind of hardware this is, which compiler
> PG was built with, etc.
Quad core AMD Opteron. RHEL. Compiled with gcc with all the options
(basically the standard compilation e.g. --build=x86_64-redhat-linux-gnu)
I can give you more details offlist if it will help.
> On the other hand, if the issue were of that sort then it ought to
> affect all buffers more or less at random; so if you're consistently
> seeing exactly these symptoms (in particular, if it's always
> pg_class_oid_index that's complained of), then I'm not sure I believe
> this theory either.
I've never seen any other index for the btree error, but it has only
happened a grand total of 3 times ever. The other error appears to
be fairly random, except that the one particular query that gives
the btree error always seems to give one version or the other.
> Which PG version again? Are you in the habit of doing VACUUM FULLs
> on system catalogs, and if so do these glitches correlate at all
> with such activities?
Heh. 8.3.18. Yes, very heavy vac fulls (and reindexes) of the system
catalogs. Cron-driven, and depends on the time of day and if any
DDL is running (if so, it does not run), but probably on average
pg_class is vacfulled and reindexed twice an hour during the times
this happens (which is, during normal business hours). There is a lot
in churn in pg_class, pg_attribute, and pg_depend in particular from
all the temp stuff being created and torn down all day, as well as
some Bucardo pg_class updating.
--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8