Thread: Unable to open DB: 2 Errors (unexpected zero page + could not open relation with OID)

Postgres 8.3.3 on 2.6.21

Some sort of corruption happened overnight and borked things on a server.  Normally I'd just restore and go, but the DB
wasmoved about 6 weeks ago and it looks like the backup task was not, shame on me.  I really just need 2 very small
tablesout of this and then I can re-init.  Any help is appreciated 

There have been other pg databases running on this hardweare for over a year now.  There is no indication of any other
issueand the other instances are still happily running. 

The first error greets me is:

$ psql -d v
psql: FATAL:  index "pg_amproc_fam_proc_index" contains unexpected zero page at block 0
HINT:  Please REINDEX it.

so I do:

template1=# reindex index pg_amproc_fam_proc_index;
REINDEX

and still get the same error.

thinking it's a system table, i try it in single mode:

$ postgres --single -P -D /postgres/v

PostgreSQL stand-alone backend 8.3.3
backend> reindex index pg_amproc_fam_proc_index;
backend>

Followed by back into the normal server, and then I get:

psql: FATAL:  could not open relation with OID 3606

I'm pretty sure the file is there for 3606, but it's another system index from what I can tell:

postgres=> select oid,* from pg_class where oid=3606;
 oid  |          relname           | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages
|reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers |
relukeys| relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl |
reloptions 

------+----------------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
 3606 | pg_ts_parser_prsname_index |           11 |       0 |       10 |   403 |    33505478 |             0 |        2
|        1 |             0 |             0 | f           | f           | i       |        2 |         0 |           0 |
      0 |        0 |       0 | f          | f          | f           | f              |            0 |        |  
(1 row)

-rw------- 1 postgres postgres 16K Jan 26 09:19 ./base/11511/33505478


The logging is still hitting on the pg_amproc_fam_proc_index now, but psql gets stuck at "OID 3606"...

Thanks in advance.
dave


dave sale <postgres@slipt.net> writes:
> psql: FATAL:  could not open relation with OID 3606

> I'm pretty sure the file is there for 3606, but it's another system index from what I can tell:

> postgres=> select oid,* from pg_class where oid=3606;
>  oid  |          relname           | relnamespace | reltype | relowner | relam | relfilenode | reltablespace |
relpages| reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks |
reltriggers| relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid |
relacl| reloptions  
>
------+----------------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
>  3606 | pg_ts_parser_prsname_index |           11 |       0 |       10 |   403 |    33505478 |             0 |
2|         1 |             0 |             0 | f           | f           | i       |        2 |         0 |           0
|       0 |        0 |       0 | f          | f          | f           | f              |            0 |        |  
> (1 row)

That's bizarre.  A look into the source code says you can only get that
particular error message when the relcache code is unable to find a
pg_class row for OID 3606 ... so how come there is one?  My best guess
at the moment is that the row looks valid to an MVCC snapshot but not
under SnapshotNow rules.  What do the xmin and xmax fields of that row
contain?

Also, you might be able to get out of this problem (at least far enough
to dump the tables you need) by copying pg_internal.init from an
undamaged database into the damaged one.  Better keep the old copy of
the file (if any) in case this makes things worse, though.

            regards, tom lane

Thanks for your help;

template1=# select * from pg_class where oid=3606;
          relname           | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages |
reltuples| reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers |
relukeys| relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl |
reloptions 

----------------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
 pg_ts_parser_prsname_index |           11 |       0 |       10 |   403 |        3606 |             0 |        2 |
  1 |             0 |             0 | f           | f           | i       |        2 |         0 |           0 |
0|        0 |       0 | f          | f          | f           | f              |            0 |        |  
(1 row)

copying the pg_internal.init from various other databases did nothing to help.  I'm in-process of the pg_internal.init
method,but the first pass resulted in the same error.  I'm not giving up yet... 

thanks
dave

Tom Lane wrote:
> dave sale <postgres@slipt.net> writes:
>> psql: FATAL:  could not open relation with OID 3606
>
>> I'm pretty sure the file is there for 3606, but it's another system index from what I can tell:
>
>> postgres=> select oid,* from pg_class where oid=3606;
>>  oid  |          relname           | relnamespace | reltype | relowner | relam | relfilenode | reltablespace |
relpages| reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relchecks |
reltriggers| relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid |
relacl| reloptions  
>>
------+----------------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------+------------
>>  3606 | pg_ts_parser_prsname_index |           11 |       0 |       10 |   403 |    33505478 |             0 |
2 |         1 |             0 |             0 | f           | f           | i       |        2 |         0 |
0|        0 |        0 |       0 | f          | f          | f           | f              |            0 |        |  
>> (1 row)
>
> That's bizarre.  A look into the source code says you can only get that
> particular error message when the relcache code is unable to find a
> pg_class row for OID 3606 ... so how come there is one?  My best guess
> at the moment is that the row looks valid to an MVCC snapshot but not
> under SnapshotNow rules.  What do the xmin and xmax fields of that row
> contain?
>
> Also, you might be able to get out of this problem (at least far enough
> to dump the tables you need) by copying  pg_internal.init from an
> undamaged database into the damaged one.  Better keep the old copy of
> the file (if any) in case this makes things worse, though.
>
>             regards, tom lane

--
Dave Crane
The Media Trust Company

Voice: 202.558.6438
Fax: 703.349.5468
6861 Elm Street
Suite 3C
McLean, VA  22101
USA