Thread: Cache lookup failed?

Cache lookup failed?

From

"Jeff Boes"

Date:

18 August 2003, 10:20:49

What might be the source of this error?

 Cache lookup failed for relation 188485009

We've been getting these at odd intervals, and they are not reproducible.

Our setup:

 PostgreSQL 7.3.3
 Red Hat 7.3

 kernel.shmall = 1352914698
 kernel.shmmax = 1352914698

 shared_buffers = 131072
 max_fsm_pages = 350000
 max_fsm_relations = 200
 wal_buffers = 32
 sort_mem = 65536
 vacuum_mem = 65536
 effective_cache_size = 196608

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise

Re: Cache lookup failed?

From

Tom Lane

Date:

18 August 2003, 11:18:35

"Jeff Boes" <jboes@nexcerpt.com> writes:
> What might be the source of this error?
>  Cache lookup failed for relation 188485009
> We've been getting these at odd intervals, and they are not reproducible.

Always the same OID, or different ones?  Does that OID actually exist in
pg_class?  Can you tell us exactly what SQL command(s) are producing the
error?  (If not, better turn on query logging so you can find out.)

> Our setup:
>  PostgreSQL 7.3.3

BTW, I'd urge updating to 7.3.4 ASAP.  Better to do it in a controlled
fashion than to find yourself looking at a forced update if 7.3.3 fails
to restart after a crash...

            regards, tom lane

Re: Cache lookup failed?

From

Jeff Boes

Date:

18 August 2003, 11:26:34

On Mon, 2003-08-18 at 09:53, Tom Lane wrote:

> Always the same OID, or different ones?  Does that OID actually exist in
> pg_class?  Can you tell us exactly what SQL command(s) are producing the
> error?  (If not, better turn on query logging so you can find out.)

Different OIDs, and they do not exist in pg_class (it's the OID of that
table's row, right? So for

Cache lookup failed for relation 172465102

I would do

 select * from pg_class where oid = 172465102

right? I'm not 100% familiar yet with the ins and outs of pg_class. Too
many OID-type fields in there, I can't keep them straight ... 8-}

I'd turn on query logging, but since we're getting these about every 3-7
days, I'm not sure that would be the most effective use of all that disk
... maybe I can find a way to localize it to the point where the
pg_class query is happening.

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise

Re: Cache lookup failed?

From

Martijn van Oosterhout

Date:

18 August 2003, 11:48:05

It usually refers to some cached plan referring to a table or object that
does not exist anymore. Do you have stored procedures that refer to tables
that are deleted? This includes temporary tables.

So maybe it's only happening when a certain stored procedure is executed
twice in the same session?

Hope this helps,

On Mon, Aug 18, 2003 at 01:01:53PM +0000, Jeff Boes wrote:
> What might be the source of this error?
>
>  Cache lookup failed for relation 188485009
>
> We've been getting these at odd intervals, and they are not reproducible.
>
> Our setup:
>
>  PostgreSQL 7.3.3
>  Red Hat 7.3
>
>  kernel.shmall = 1352914698
>  kernel.shmmax = 1352914698
>
>  shared_buffers = 131072
>  max_fsm_pages = 350000
>  max_fsm_relations = 200
>  wal_buffers = 32
>  sort_mem = 65536
>  vacuum_mem = 65536
>  effective_cache_size = 196608
>
> --
> Jeff Boes                                      vox 269.226.9550 ext 24
> Database Engineer                                     fax 269.349.9076
> Nexcerpt, Inc.                                 http://www.nexcerpt.com
>            ...Nexcerpt... Extend your Expertise
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

msg-32091-65139.dat

Re: Cache lookup failed?

From

Jeff Boes

Date:

18 August 2003, 12:09:45

On Mon, 2003-08-18 at 09:44, Martijn van Oosterhout wrote:
> It usually refers to some cached plan referring to a table or object that
> does not exist anymore. Do you have stored procedures that refer to tables
> that are deleted? This includes temporary tables.
>
> So maybe it's only happening when a certain stored procedure is executed
> twice in the same session?
>

No, I don't think so. It seems to be pretty consistently happening in
our homegrown database connection class, as we are executing a query
against the PG metadata tables. Something like --

select a.attname, t.typname, a.attlen, a.atttypmod, a.attnotnull,
a.atthasdef, a.attnum
              from pg_attribute a,
                   pg_class     c,
                   pg_type      t
              where c.relname  = ?
                and a.attrelid = c.oid
                and a.attnum  >= 0
                and t.oid      = a.atttypid
                order by 1

This is happening inside DBD::Pg, the "table_attributes" method. The
relname being selected is NOT a temp table. We do make extensive use of
temp tables in the code, however.

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise

Re: Cache lookup failed?

From

Tom Lane

Date:

18 August 2003, 14:53:49

Jeff Boes <jboes@nexcerpt.com> writes:
> On Mon, 2003-08-18 at 09:53, Tom Lane wrote:
>> Always the same OID, or different ones?  Does that OID actually exist in
>> pg_class?  Can you tell us exactly what SQL command(s) are producing the
>> error?  (If not, better turn on query logging so you can find out.)

> Different OIDs, and they do not exist in pg_class (it's the OID of that
> table's row, right?

Right.  My best guess is that you are seeing some weird failure in temp
table creation ... do you use lots of temp tables?

> I'd turn on query logging, but since we're getting these about every 3-7
> days, I'm not sure that would be the most effective use of all that disk

Perhaps you can recycle the logs every few hours?

BTW, the symptom sounds the same as the one that led up to the discovery
of this bug:

2003-07-29 18:18  tgl

    * src/backend/access/nbtree/: nbtsearch.c (REL7_3_STABLE),
    nbtsearch.c (REL7_2_STABLE), nbtsearch.c: Fix longstanding error in
    _bt_search(): should moveright at top of loop not bottom.
    Otherwise we fail to moveright when the root page was split while
    we were "in flight" to it.  This is not a significant problem when
    the root is above the leaf level, but if the root was also a leaf
    (ie, a single-page index just got split) we may return the wrong
    leaf page to the caller, resulting in failure to find a key that is
    in fact present.  Bug has existed at least since 7.1, probably
    forever.

However, I doubt that that is your problem.  The moveright bug could
only lead to a pg_class lookup failure if a lookup occurred while
pg_class' OID index was being split from one page to two, which is an
event that happens at most once in the lifetime of an index (before
7.4 anyway).  Unless you frequently create new databases, or frequently
reindex pg_class, I don't see how you would see that bug with any
reproducibility.  (We were only able to track down the bug because
the regression tests evolved to a point where they caused it with
nontrivial probability.)

            regards, tom lane

Re: Cache lookup failed?

From

Tom Lane

Date:

18 August 2003, 18:57:35

Jeff Boes <jboes@nexcerpt.com> writes:
> No, I don't think so. It seems to be pretty consistently happening in
> our homegrown database connection class, as we are executing a query
> against the PG metadata tables. Something like --

> select a.attname, t.typname, a.attlen, a.atttypmod, a.attnotnull,
> a.atthasdef, a.attnum
>               from pg_attribute a,
>                    pg_class     c,
>                    pg_type      t
>               where c.relname  = ?
>                 and a.attrelid = c.oid
>                 and a.attnum  >= 0
>                 and t.oid      = a.atttypid
>                 order by 1

I really do not believe that this query could cause "cache lookup failed
for relation-with-a-large-OID".  The only relations being directly
touched in this query are pg_attribute, pg_class, pg_type, and their
indexes, which will all have small OIDs (less than 20000).  The fact
that you are pulling data that refers to other relations is not relevant
--- it's just data.  The "cache lookup" message could only occur when
the system is trying to access a relation as such, and I see nothing
here that would make it do that for any user relations.

It's too bad you're not running 7.4 ... it'd be really useful to know
exactly which "cache lookup" is failing, but there's no way to get that
info out of 7.3 without a debugger stack trace ...

            regards, tom lane