BUG #2285: Can not access database after successful PITR - file naming problems - Mailing list pgsql-bugs
From | Nicholas Vinen |
---|---|
Subject | BUG #2285: Can not access database after successful PITR - file naming problems |
Date | |
Msg-id | 20060227134626.E4695F0B03@svr2.postgresql.org Whole thread Raw |
Responses |
Re: BUG #2285: Can not access database after successful PITR - file naming problems
|
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 2285 Logged by: Nicholas Vinen Email address: hb@pg.x256.org PostgreSQL version: 8.1.3 Operating system: Gentoo Linux x86 Description: Can not access database after successful PITR - file naming problems Details: For debugging/testing purposes, I have taken to performing a PITR on a test machine from the production database back-ups. I restore a snap-shot of the production server which was taken before the time I am interested in debugging, then use the PITR method to get the database to the point I want to test at. I can then "rewind" the database and test again by restoring again. This used to work (either before 8.0 or before 8.1, I'm not sure). I also have this back-up in case the production database server dies, but can't really test restoring it on the production machine except when data is lost, otherwise I will interrupt service. The machines are almost identical, the only real difference is that one is a Pentium 3 and one is a Pentium 4, so I don't see why that should matter. They are both running virtually identical Linux installations. Now, when I try to use PITR to restore the database on my test server, the PITR succeeds as normal, but I can not access the database with an error like: psql: FATAL: could not open relation 1663/16385/605464: No such file or directory This happens when I attempt to connect to my database. I can connect to some of the others (such as the template databases) fine. Interestingly, the 605464 file was in the back-up snapshot, but the process of performing the PITR seems to rename the file, but it's still looking under the old name. This is fully reproducible, so if I am not providing sufficient information here, just let me know what you need in order to fix this. Here is a log of what I have done up to the error: rt2 ~ # cd /var/lib/postgresql/data rt2 data # rm -rf * rt2 data # gzip -cd /backup/Helpdesk/Database/Snapshots/Weekly/2006-06\ \ Sun\ 12\ Feb.io.gz | cpio -i 3229320 blocks rt2 data # ls -al base/16385/605464 -rw------- 1 postgres postgres 40960 Feb 27 02:57 base/16385/605464 rt2 data # rm pg_xlog/* rm: cannot remove `pg_xlog/archive_status': Is a directory rt2 data # cp /data/postgresql/recovery.conf /data/postgresql/postgresql.conf . rt2 data # chown postgres * rt2 data # /etc/init.d/postgresql start * Starting PostgreSQL ... rt2 data # tail -f /var/log/postgres/current Feb 27 03:06:12 [postgres] [1-1] LOG: could not create IPv6 socket: Address family not supported by protocol Feb 27 03:06:12 [postgres] [2-1] LOG: database system was interrupted at 2006-02-12 01:30:02 PST Feb 27 03:06:12 [postgres] [3-1] LOG: starting archive recovery Feb 27 03:06:12 [postgres] [4-1] LOG: restore_command = "gzip -cd /backup/Helpdesk/Database/TransactionLog/"%f">"%p"" Feb 27 03:06:12 [postgres] [5-1] LOG: recovery_target_time = 2006-02-17 08:20:00-08 Feb 27 03:06:12 [postgres] [6-1] LOG: restored log file "000000010000000B00000002.00A9FA60.backup" from archive Feb 27 03:06:14 [postgres] [7-1] LOG: restored log file "000000010000000B00000002" from archive Feb 27 03:06:14 [postgres] [8-1] LOG: checkpoint record is at B/2A9FA60 Feb 27 03:06:14 [postgres] [9-1] LOG: redo record is at B/2A9FA60; undo record is at 0/0; shutdown FALSE Feb 27 03:06:14 [postgres] [10-1] LOG: next transaction ID: 34066581; next OID: 611376 Feb 27 03:06:14 [postgres] [11-1] LOG: next MultiXactId: 633; next MultiXactOffset: 1265 Feb 27 03:06:14 [postgres] [12-1] LOG: automatic recovery in progress Feb 27 03:06:14 [postgres] [13-1] LOG: redo starts at B/2A9FAA4 Feb 27 03:06:27 [postgres] [14-1] LOG: restored log file "000000010000000B00000003" from archive Feb 27 03:06:38 [postgres] [15-1] LOG: restored log file "000000010000000B00000004" from archive Feb 27 03:06:41 [postgres] [16-1] LOG: restored log file "000000010000000B00000005" from archive .... Feb 27 03:47:13 [postgres] [682-1] LOG: restored log file "000000010000000D000000A1" from archive Feb 27 03:47:17 [postgres] [683-1] LOG: restored log file "000000010000000D000000A2" from archive Feb 27 03:47:21 [postgres] [684-1] LOG: restored log file "000000010000000D000000A3" from archive Feb 27 03:47:21 [postgres] [685-1] LOG: recovery stopping before commit of transaction 42586328, time 2006-02-17 08:20:01 PST Feb 27 03:47:21 [postgres] [686-1] LOG: redo done at D/A35794EC Feb 27 03:47:21 [postgres] [687-1] LOG: selected new timeline ID: 2 Feb 27 03:47:22 [postgres] [688-1] LOG: archive recovery complete Feb 27 03:47:32 [postgres] [689-1] LOG: database system is ready Feb 27 03:47:32 [postgres] [690-1] LOG: transaction ID wrap limit is 1103292637, limited by database "postgres" Feb 27 05:39:54 [postgres] [2-1] LOG: invalid server process ID -1 (new error in 8.1.3) rt2 data # psql -U postgres rt3 psql: FATAL: could not open relation 1663/16385/605464: No such file or directory rt2 data # ls -al base/16385/605464 rt2 data # recovery.conf: restore_command = 'gzip -cd /backup/Helpdesk/Database/TransactionLog/"%f">"%p"' recovery_target_time = '2006-02-17 16:20:00+0'; I believe this should work. I checked the PITR limitations and I don't believe I've broken any of them. I also belive this should work between seperate machines of the same architecture. It certainly has in the past. Please let me know how to proceed. By the way, I can access *some* of my databases (the smaller ones), just not the main one. I suspect perhaps it's due to an ALTER TABLE or similar command I committed on the main database inbetween the snapshot and the restore point, which is not being handled properly during the transation log replay. Thanks, Nicholas
pgsql-bugs by date: