Re: ERROR: cache lookup failed for relation 17442 (repost) - Mailing list pgsql-bugs

From Hans-Jürgen Schönig
Subject Re: ERROR: cache lookup failed for relation 17442 (repost)
Date
Msg-id 4207E810.8030504@cybertec.at
Whole thread Raw
In response to ERROR: cache lookup failed for relation 17442 (repost)  (Michael Guerin <guerin@rentec.com>)
List pgsql-bugs
Michael Guerin wrote:
> Hi All,
>
>    I've been getting these errors ("ERROR:  cache lookup failed for
> relation 17442")  in my logs for a while now.   It originally seemed
> like a hardware problem, however now we getting them pretty consistently
> on a couple servers.  I've scalled down the schema to the one table and
> the function involved and included a code snipet to make a bunch of
> connections and loop around calling the same function.   It usually
> takes 100-2000 iterations before these messages start appearing in the
> log.  I've also included the original function, this takes 10,000
> iterations for the error to start showing.   I should note, we've been
> getting these erros since version 7, this is the first time they were
> reproducable..
>
> With the original function, the log messages were slightly different and
> usually caused the server to reset:
> i.e.
> ERROR:  type "t" already exists
> ERROR:  duplicate key violates unique constraint
> "pg_type_typname_nsp_index"
> ERROR:  duplicate key violates unique constraint
> "pg_type_typname_nsp_index"
> ERROR:  duplicate key violates unique constraint
> "pg_type_typname_nsp_index"
> CONTEXT:  SQL statement "create temp table tmp_children ( uniqid bigint,
> memberid bigint, membertype varchar(50), ownerid smallint, tag
> varchar(50), level int4 )"
>     PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement
> ERROR:  duplicate key violates unique constraint
> "pg_type_typname_nsp_index"
> ERROR:  cache lookup failed for type 2449707570
> FATAL:  cache lookup failed for type 2449707570
>
> Environment info:  Postgres v8, suse linix with latest kernal patches,
> filesystem: reiserfs.
>
> Please let me know if you need anymore information.  No data is need,
> just the schema included.
>
> Thanks
> Michael
>


Michael,

The interesting thing about this bug is: We had the same thing on a
customer's machine some time ago. It actually occurred after a certain
script (nothing big) was run the 100.001st time (maybe) on an empty
database. So this one does not seem to be related to the schema - it is
more or less random ...
The interesting thing is: We copied the data directory from the customer
and we were not able to reproduce the same behaviour on a different machine.
The strange thing is: After doing a checkpoint and restarting the
database the problem still occurred. Starting the same binary thing on a
different machine did not show that error ...
We stepped through it with gdb but we could not find anything strange ...
Can you reliably reproduce the problem after a arbitrary amount of
iterations on a different machine? We couldn't ...

Looking at the code: This is a null pointer caught by the system ...
Something seems to corrupt memory ...
Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/660/816 40 77
www.cybertec.at, www.postgresql.at



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: ERROR: cache lookup failed for relation 17442 (repost)
Next
From: Alvaro Herrera
Date:
Subject: Re: ERROR: cache lookup failed for relation 17442 (repost)