Thread: Cache lookup failed?
What might be the source of this error? Cache lookup failed for relation 188485009 We've been getting these at odd intervals, and they are not reproducible. Our setup: PostgreSQL 7.3.3 Red Hat 7.3 kernel.shmall = 1352914698 kernel.shmmax = 1352914698 shared_buffers = 131072 max_fsm_pages = 350000 max_fsm_relations = 200 wal_buffers = 32 sort_mem = 65536 vacuum_mem = 65536 effective_cache_size = 196608 -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
"Jeff Boes" <jboes@nexcerpt.com> writes: > What might be the source of this error? > Cache lookup failed for relation 188485009 > We've been getting these at odd intervals, and they are not reproducible. Always the same OID, or different ones? Does that OID actually exist in pg_class? Can you tell us exactly what SQL command(s) are producing the error? (If not, better turn on query logging so you can find out.) > Our setup: > PostgreSQL 7.3.3 BTW, I'd urge updating to 7.3.4 ASAP. Better to do it in a controlled fashion than to find yourself looking at a forced update if 7.3.3 fails to restart after a crash... regards, tom lane
On Mon, 2003-08-18 at 09:53, Tom Lane wrote: > Always the same OID, or different ones? Does that OID actually exist in > pg_class? Can you tell us exactly what SQL command(s) are producing the > error? (If not, better turn on query logging so you can find out.) Different OIDs, and they do not exist in pg_class (it's the OID of that table's row, right? So for Cache lookup failed for relation 172465102 I would do select * from pg_class where oid = 172465102 right? I'm not 100% familiar yet with the ins and outs of pg_class. Too many OID-type fields in there, I can't keep them straight ... 8-} I'd turn on query logging, but since we're getting these about every 3-7 days, I'm not sure that would be the most effective use of all that disk ... maybe I can find a way to localize it to the point where the pg_class query is happening. -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
It usually refers to some cached plan referring to a table or object that does not exist anymore. Do you have stored procedures that refer to tables that are deleted? This includes temporary tables. So maybe it's only happening when a certain stored procedure is executed twice in the same session? Hope this helps, On Mon, Aug 18, 2003 at 01:01:53PM +0000, Jeff Boes wrote: > What might be the source of this error? > > Cache lookup failed for relation 188485009 > > We've been getting these at odd intervals, and they are not reproducible. > > Our setup: > > PostgreSQL 7.3.3 > Red Hat 7.3 > > kernel.shmall = 1352914698 > kernel.shmmax = 1352914698 > > shared_buffers = 131072 > max_fsm_pages = 350000 > max_fsm_relations = 200 > wal_buffers = 32 > sort_mem = 65536 > vacuum_mem = 65536 > effective_cache_size = 196608 > > -- > Jeff Boes vox 269.226.9550 ext 24 > Database Engineer fax 269.349.9076 > Nexcerpt, Inc. http://www.nexcerpt.com > ...Nexcerpt... Extend your Expertise > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
On Mon, 2003-08-18 at 09:44, Martijn van Oosterhout wrote: > It usually refers to some cached plan referring to a table or object that > does not exist anymore. Do you have stored procedures that refer to tables > that are deleted? This includes temporary tables. > > So maybe it's only happening when a certain stored procedure is executed > twice in the same session? > No, I don't think so. It seems to be pretty consistently happening in our homegrown database connection class, as we are executing a query against the PG metadata tables. Something like -- select a.attname, t.typname, a.attlen, a.atttypmod, a.attnotnull, a.atthasdef, a.attnum from pg_attribute a, pg_class c, pg_type t where c.relname = ? and a.attrelid = c.oid and a.attnum >= 0 and t.oid = a.atttypid order by 1 This is happening inside DBD::Pg, the "table_attributes" method. The relname being selected is NOT a temp table. We do make extensive use of temp tables in the code, however. -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise
Jeff Boes <jboes@nexcerpt.com> writes: > On Mon, 2003-08-18 at 09:53, Tom Lane wrote: >> Always the same OID, or different ones? Does that OID actually exist in >> pg_class? Can you tell us exactly what SQL command(s) are producing the >> error? (If not, better turn on query logging so you can find out.) > Different OIDs, and they do not exist in pg_class (it's the OID of that > table's row, right? Right. My best guess is that you are seeing some weird failure in temp table creation ... do you use lots of temp tables? > I'd turn on query logging, but since we're getting these about every 3-7 > days, I'm not sure that would be the most effective use of all that disk Perhaps you can recycle the logs every few hours? BTW, the symptom sounds the same as the one that led up to the discovery of this bug: 2003-07-29 18:18 tgl * src/backend/access/nbtree/: nbtsearch.c (REL7_3_STABLE), nbtsearch.c (REL7_2_STABLE), nbtsearch.c: Fix longstanding error in _bt_search(): should moveright at top of loop not bottom. Otherwise we fail to moveright when the root page was split while we were "in flight" to it. This is not a significant problem when the root is above the leaf level, but if the root was also a leaf (ie, a single-page index just got split) we may return the wrong leaf page to the caller, resulting in failure to find a key that is in fact present. Bug has existed at least since 7.1, probably forever. However, I doubt that that is your problem. The moveright bug could only lead to a pg_class lookup failure if a lookup occurred while pg_class' OID index was being split from one page to two, which is an event that happens at most once in the lifetime of an index (before 7.4 anyway). Unless you frequently create new databases, or frequently reindex pg_class, I don't see how you would see that bug with any reproducibility. (We were only able to track down the bug because the regression tests evolved to a point where they caused it with nontrivial probability.) regards, tom lane
Jeff Boes <jboes@nexcerpt.com> writes: > No, I don't think so. It seems to be pretty consistently happening in > our homegrown database connection class, as we are executing a query > against the PG metadata tables. Something like -- > select a.attname, t.typname, a.attlen, a.atttypmod, a.attnotnull, > a.atthasdef, a.attnum > from pg_attribute a, > pg_class c, > pg_type t > where c.relname = ? > and a.attrelid = c.oid > and a.attnum >= 0 > and t.oid = a.atttypid > order by 1 I really do not believe that this query could cause "cache lookup failed for relation-with-a-large-OID". The only relations being directly touched in this query are pg_attribute, pg_class, pg_type, and their indexes, which will all have small OIDs (less than 20000). The fact that you are pulling data that refers to other relations is not relevant --- it's just data. The "cache lookup" message could only occur when the system is trying to access a relation as such, and I see nothing here that would make it do that for any user relations. It's too bad you're not running 7.4 ... it'd be really useful to know exactly which "cache lookup" is failing, but there's no way to get that info out of 7.3 without a debugger stack trace ... regards, tom lane