Hi all,
After setting up a warm standby
(pg_start_backup/rsync/pg_stop_backup), and promoting to master, we
encountered an error in the middle of an analyze of the new standby
db. (the standby server is a fresh server)
Source db: PostgreSQL 8.4.2
Standby db: PostgreSQL 8.4.6
...
INFO: analyzing "public.offer2offer"
ERROR: could not open relation base/2757655/6930168: No such file or directory
That file does not exist on the source db, nor the standby db. That
offer2offer table exists in the source db (42MB), but is 0 bytes on
the standby.
-- on the standby
select * from pg_class where relfilenode = 6930168;
-[ RECORD 1 ]--+---------------------------------------------
relname | offer2offer
relnamespace | 2200
reltype | 2760224
relowner | 10
relam | 0
relfilenode | 6930168
reltablespace | 0
relpages | 5210
reltuples | 324102
reltoastrelid | 2760225
reltoastidxid | 0
relhasindex | f
relisshared | f
relistemp | f
relkind | r
relnatts | 12
relchecks | 0
relhasoids | f
relhaspkey | f
relhasrules | f
relhastriggers | f
relhassubclass | f
relfrozenxid | 1227738213
select * from offer2offer ;
ERROR: could not open relation base/2757655/6930168: No such file or directory
-- on the source db
select * from pg_class where relname='offer2offer';
-[ RECORD 1 ]--+----------------------------------------------------
relname | offer2offer
relnamespace | 2200
reltype | 2760224
relowner | 10
relam | 0
relfilenode | 6946955
reltablespace | 0
relpages | 5216
reltuples | 324642
reltoastrelid | 2760225
reltoastidxid | 0
relhasindex | f
relisshared | f
relistemp | f
relkind | r
relnatts | 12
relchecks | 0
relhasoids | f
relhaspkey | f
relhasrules | f
relhastriggers | f
relhassubclass | f
relfrozenxid | 1228781185
-- on the source server
ls -lh `locate 6946955`
-rw------- 1 postgres postgres 41M Dec 28 15:17
/var/lib/pgsql/data/base/2757655/6946955
-rw------- 1 postgres postgres 32K Dec 28 15:17
/var/lib/pgsql/data/base/2757655/6946955_fsm
We noticed after the initial rsync that we had around 3-4 GB less in
the data dir between the source and standby. I assumed that that it
was simply because the pg_xlog dir on the standby did not have the WAL
files that existed on the source (they were stored in a different
partition).
We are running a badblocks right now, then we'll do some more disk
testing and hopefully memtest86.
Does this look like a hardware problem, and/or some catalog corruption?
Any suggestions on what steps we should take next?
Thanks!