Re: Large Database \d: ERROR: cache lookup failed for relation ... - Mailing list pgsql-general

From Jim Nasby
Subject Re: Large Database \d: ERROR: cache lookup failed for relation ...
Date
Msg-id EE2BA3FF-5847-42A7-BD51-857778B85019@decibel.org
Whole thread Raw
In response to Re: Large Database \d: ERROR: cache lookup failed for relation ...  (Erik Jones <erik@myemma.com>)
List pgsql-general
I'm working with these guys to resolve the immediate issue, but I
suspect there's a race condition somewhere in the code.

What's happened is that OIDs have been changed in the system. There's
not a lot of table DDL that happens, but there is a substantial
amount of view DDL that can take place. In a nutshell, tables will
sometimes have fields added to them, and when that happens a whole
set of views needs to be re-created to take the new fields into account.

The files for corrupted tables do exist; this seems to be mostly a
catalog corruption issue. I'm seeing both what appear to be
inconsistencies between relcache and the catalog tables as well as
corruption between tables themselves:

emma2=# select * from userdata_8464_campaigns;
ERROR:  could not open relation with OID 138807643
emma2=# \d userdata_8464_campaigns
                                         Table
"public.userdata_8464_campaigns"
       Column             |            Type
|                            Modifiers
-------------------------------+-----------------------------
+------------------------------------------------------------------
campaign_id              | bigint                      | not null
default nextval(('emma_campaigns_seq'::text)::regclass)
account_id               | bigint                      | not null
cep_object_id                 | bigint                      | not
null default nextval(('cep_object_seq'::text)::regclass)
campaign_name            | character varying(255)      | not null
campaign_subject         | character varying(255)      | not null
layout_page_id           | bigint                      | not null
layout_content_id        | bigint                      | not null
campaign_create_date     | timestamp without time zone | not null
default now()
campaign_last_mod_date   | timestamp without time zone | not null
default now()
campaign_status          | character varying(50)       | not null
campaign_parent_id       | bigint                      |
published_campaign_id    | bigint                      |
campaign_plaintext       | text                        |
campaign_plaintext_ds    | timestamp without time zone |
delivery_old_score       | double precision            |
campaign_person_defaults | text                        |
Inherits: emma_campaigns

select oid from pg_class where relname='userdata_8464_campaigns';
   oid
--------
533438
(1 row)

And that file actually does exist on disk...

select * from pg_index where indexrelid=138807643;
indexrelid | indrelid | indnatts | indisunique | indisprimary |
indisclustered | indisvalid | indkey | indclass | indexprs | indpred
------------+----------+----------+-------------+--------------
+----------------+------------+--------+----------+----------+---------
   138807643 |   533438 |        1 | t           | t            |
f              | t          | 1      | 1980     |          |
(1 row)

select * from pg_class where oid=138807643;
relname | relnamespace | reltype | relowner | relam | relfilenode |
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid
| relhasindex | relisshared | relkind | relnatts | relchecks |
reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey
| relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------+--------------+---------+----------+-------+-------------
+---------------+----------+-----------+---------------
+---------------+-------------+-------------+---------+----------
+-----------+-------------+----------+----------+---------
+------------+------------+-------------+----------------
+--------------+--------+------------
(0 rows)

On Jun 5, 2007, at 11:27 AM, Erik Jones wrote:

> I originally sent this message from my gmail account yesterday as
> we were having issues with our work mail servers yesterday, but
> seeing that it hasn't made it to the lists yet, I'm resending from
> my registered address.  You have my apologies if you receive this
> twice.
>
> "Thomas F. O'Connell" <tf ( at ) o ( dot ) ptimized ( dot ) com>
> writes:
> > I'm dealing with a database where there are ~150,000 rows in
>
> > information_schema.tables. I just tried to do a \d, and it came back
> > with this:
>
> > ERROR:  cache lookup failed for relation [oid]
>
> > Is this indicative of corruption, or is it possibly a resource
> issue?
>
> Greetings,
>
> This message is a follow-up to Thomas's message quoted above (we're
> working together on the same database). He received one response
> when he sent the above message which was from Tom Lane and can be
> easily summarized as him having said that that could happen tables
> were being created or dropped while running the \d in psql.
> Unfortunately, that wasn't the case, we have now determined that
> there is some corruption in our database and we are hoping some of
> you back-end gurus might have some suggestions.
>
> How we verified that there is corruption was simply to reindex all
> of our tables in addition to getting the same errors when running a
> dump this past weekend.  We so far have a list of five tables for
> which reindex fails with the error: "ERROR: could not open relation
> with OID xxxx" (sub xxxx with the five different #s) and one that
> fails reindexing with "ERROR: xxxxx is an index" where is an index
> on a completely different table. After dropping all of the indexes
> on these tables (a couple didn't have any to begin with), we still
> cannot run reindex on them. In addition, we can't drop the tables
> either (we get the same errors). We can however run alter table
> statements on them. So, we have scheduled a downtime for an evening
> later this week wherein we plan on bringing the database down for a
> REINDEX SYSTEM and before that we are going to run a dump excluding
> those tables, restore that on a separate machine and see if these
> errors crop up there anywhere. Is there anything else anyone can
> think of that we can do to narrow down where the actual corruption
> is or how to fix it?
>
> Erik Jones
>
> Software Developer | Emma®
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



pgsql-general by date:

Previous
From: Joe Conway
Date:
Subject: Re: Encrypted column
Next
From: Vincenzo Romano
Date:
Subject: Re: Running v8.1 amd v8.2 at the same time for a transition