Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup - Mailing list pgsql-general

From bricklen
Subject Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup
Date
Msg-id AANLkTikeyUHaWW6Tc5_CWvxwSW5efQNyTH8P-XjmJLy8@mail.gmail.com
Whole thread Raw
In response to Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Wed, Dec 29, 2010 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> bricklen <bricklen@gmail.com> writes:
>> After setting up a warm standby
>> (pg_start_backup/rsync/pg_stop_backup), and promoting to master, we
>> encountered an error in the middle of an analyze of the new standby
>> db. (the standby server is a fresh server)
>> [ relfilenode doesn't match on source and standby ]
>
> What can you tell us about what was happening on the source DB while
> the backup was being taken?  In particular I'm wondering if anything
> that would've given offer2offer a new relfilenode was in progress.
> Also, does the pg_class entry for offer2offer have the same xmin and
> ctid in both DBs?
>
>                        regards, tom lane

A couple other notes:
- There are two tables that are affected, not just one. I ran
individual ANALYZE commands on every table in the db and found that.
- The rsync command that we are using uses the "-e ssh -p" switch so
we can specify a port number

rsync -av -e "ssh -p 9001" --progress --partial -z /var/lib/pgsql/data
postgres@standby-tunnel:/var/lib/pgsql/

The pg_start_backup/pg_stop_backup range was about 10 hours, as the
transfer took that long (480GB transfer).

Sorry for my ignorance, I don't

The source db has between 1000 and 3000 transactions/s, so is
reasonably volatile. The two tables in question are not accessed very
heavily though.

Looking at the ctid and xmin between both databases, no, they don't
seem to match exactly. Pardon my ignorance, but would those have
changed due to vacuums, analyze, or any other forms of access?


Source offer2offer:
select ctid,xmin,* from pg_class where relname='offer2offer';
-[ RECORD 1 ]--+--------------------------------------------------------------------
ctid           | (142,2)
xmin           | 1228781192
relname        | offer2offer
relnamespace   | 2200
reltype        | 2760224
relowner       | 10
relam          | 0
relfilenode    | 6946955
reltablespace  | 0
relpages       | 5216
reltuples      | 324642
reltoastrelid  | 2760225
reltoastidxid  | 0
relhasindex    | f
relisshared    | f
relistemp      | f
relkind        | r
relnatts       | 12
relchecks      | 0
relhasoids     | f
relhaspkey     | f
relhasrules    | f
relhastriggers | f
relhassubclass | f
relfrozenxid   | 1228781185


Standby offer2offer:
select ctid,xmin,* from pg_class where relname='offer2offer';
-[ RECORD 1 ]--+---------------------------------------------------------------------
ctid           | (142,1)
xmin           | 1227738244
relname        | offer2offer
relnamespace   | 2200
reltype        | 2760224
relowner       | 10
relam          | 0
relfilenode    | 6930168
reltablespace  | 0
relpages       | 5210
reltuples      | 324102
reltoastrelid  | 2760225
reltoastidxid  | 0
relhasindex    | f
relisshared    | f
relistemp      | f
relkind        | r
relnatts       | 12
relchecks      | 0
relhasoids     | f
relhaspkey     | f
relhasrules    | f
relhastriggers | f
relhassubclass | f
relfrozenxid   | 1227738213

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup
Next
From: Tom Lane
Date:
Subject: Re: ERROR: could not open relation base/2757655/6930168: No such file or directory -- during warm standby setup