Thread: problem with WAL files
I'm currently testing PITR features of 8.0 release. I followed the steps given by the documentation and thus deleted the WAL files in pg_xlog after having restored the database. But when I put the recovery.conf file and try to start postgres, it fails. The log file says : 2005-06-30 17:57:50 LOG: database system was shut down at 2005-06-30 17:57:26 Paris, Madrid 2005-06-30 17:57:50 LOG: starting archive recovery 2005-06-30 17:57:50 LOG: restore_command = "copy E:\\temp\\archive\\%f "%p"" 2005-06-30 17:57:50 LOG: recovery_target_time = 2005-07-01 00:45:00+02 2005-06-30 17:57:50 LOG: could not open file "C:/Program Files/PostgreSQL/8.0/data/pg_xlog/00000001000000000000000B" (log file 0, segment 11): No such file or directory 2005-06-30 17:57:50 LOG: invalid primary checkpoint record 2005-06-30 17:57:50 LOG: could not open file "C:/Program Files/PostgreSQL/8.0/data/pg_xlog/00000001000000000000000B" (log file 0, segment 11): No such file or directory 2005-06-30 17:57:50 LOG: invalid secondary checkpoint record 2005-06-30 17:57:50 PANIC: could not locate a valid checkpoint record This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2005-06-30 17:57:50 LOG: startup process (PID 356) was terminated by signal 3 2005-06-30 17:57:50 LOG: aborting startup due to startup process failure 2005-06-30 17:57:51 LOG: logger shutting down What's the problem ? I have an idea, but can't solve it : When I want to restore the database, the "psql -U postgres essai < save.txt" command fails because it seems that psql read the password in save.txt (it don't let me type it) and fails to connect. So I first connected and then restored with the \i command. Maybe this way of proceeding alter WAL files in a bad way ? I tryed many ways of restoring and deleting/putting WAL files, but all failed. Can somebody help me ? Thanks. Romain Thouvenin
Romain, This is a problem I ran into during testing as well. It seems that your base backup must also include the in use wal file. IN your case: 00000001000000000000000B. So, I think the restore documentation is slightly misleading. In my experience, what you really need to do is remove all wal files from your pg_xlog directory, then put in the in use wal file from when you did your base backup (00000001000000000000000B in your case), put in the recovery.conf and then start postgresql. This procedure works well for me, but I didn't quite grasp that initially when reading the documentation on how to backup/restore with PITR. On Thu, 30 Jun 2005, Romain Thouvenin wrote: > But when I put the recovery.conf file and try to start postgres, it fails. > The log file says : > > 2005-06-30 17:57:50 LOG: could not open file "C:/Program > Files/PostgreSQL/8.0/data/pg_xlog/00000001000000000000000B" (log file 0, > segment 11): No such file or directory -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Romain Thouvenin <rthouvenin@sfwan.com> writes: > I'm currently testing PITR features of 8.0 release. I followed the steps > given by the documentation and thus deleted the WAL files in pg_xlog > after having restored the database. But you have the WAL files in your archive area, right? > 2005-06-30 17:57:50 LOG: restore_command = "copy E:\\temp\\archive\\%f > "%p"" > 2005-06-30 17:57:50 LOG: recovery_target_time = 2005-07-01 00:45:00+02 > 2005-06-30 17:57:50 LOG: could not open file "C:/Program > Files/PostgreSQL/8.0/data/pg_xlog/00000001000000000000000B" (log file 0, > segment 11): No such file or directory Apparently it wasn't able to get the 00000001000000000000000B segment from the archive either. The displayed restore_command looks a bit bogus to me: too many \'s. Maybe you need to correct that? regards, tom lane
On Thu, 30 Jun 2005, Tom Lane wrote: > Apparently it wasn't able to get the 00000001000000000000000B segment > from the archive either. The displayed restore_command looks a bit > bogus to me: too many \'s. Maybe you need to correct that? Tom, When I tested with our system, it would not attempt to retrieve that first WAL via the restore command. All the ones after that it was happy to do it for, but if that one did not exist, it told me to get lost. That was on 8.0.3. I did not test any other versions. Putting it in the pg_xlog dir made everything go smoothly though. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: > When I tested with our system, it would not attempt to retrieve that first WAL > via the restore command. All the ones after that it was happy to do it for, > but if that one did not exist, it told me to get lost. Would you be more specific please? Looking at the code, I can't see a reason that the first one would be treated differently from others. What *exactly* did you get? regards, tom lane
On Thu, 30 Jun 2005, Tom Lane wrote: > Would you be more specific please? Looking at the code, I can't see a > reason that the first one would be treated differently from others. > What *exactly* did you get? Unfortunately, I do not have my output saved, but here's basically what happened: I made a base backup as such: * psql to start backup * rsync pgdata dir with --exclude pg_xlog * psql to stop backup Copy this to new server's pgdata dir, put in recovery.conf and start postgres. Postgres reports that it could not find wal archive (which would be the in-use wal during the base backup) and exits fatal. Copying the wal archive to the archive dir received the same warning; however, copying it to the pg_xlog dir allowed the restore to begin and the archived wal files after that happily started replaying. I'll have to retest that particular scenario and verify that something else was not causing the problem when I have a chance now that I have a fully functional and scripted out base backup and recover solution; however, it did not give me any of the errors I see when my walrestore.sh script cannot access an archived wal, so I would believe it did not even try to get it out of the archive. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: > I'll have to retest that particular scenario and verify that something > else was not causing the problem when I have a chance now that I have > a fully functional and scripted out base backup and recover solution; > however, it did not give me any of the errors I see when my > walrestore.sh script cannot access an archived wal, so I would believe > it did not even try to get it out of the archive. As far as I can see from the code, it *should* try to get it out of the archive ... and I'm quite sure we tested that case during 8.0 development ... so I suspect there is some other contributing factor in your configuration. If you could put together a self-contained test case I'd be happy to dig into it, because it sounds like a bug to me. regards, tom lane
On Thu, 30 Jun 2005, Tom Lane wrote: > Jeff Frost <jeff@frostconsultingllc.com> writes: >> I'll have to retest that particular scenario and verify that something >> else was not causing the problem when I have a chance now that I have >> a fully functional and scripted out base backup and recover solution; >> however, it did not give me any of the errors I see when my >> walrestore.sh script cannot access an archived wal, so I would believe >> it did not even try to get it out of the archive. > Tom, Finally got a chance to give this another full test and discovered that the WAL files work fine in the archive dir, but I uncovered a different problem. What had happened was that when I rsynced the walarchive and walrestore scripts over to the server, rsync made them mode 644 as they were in my cvs sandbox and thus not executable. When postgres went to call the script, I guess it stat'd the walrestore, saw that it was not executable and quietly reported that it could not find the WAL file in the pg_xlog dir. Would probably be handy to throw a warning that the restore command was not executable, no? Sorry for the false alarm before. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote: > On Thu, 30 Jun 2005, Tom Lane wrote: > > > Jeff Frost <jeff@frostconsultingllc.com> writes: > >> I'll have to retest that particular scenario and verify that something > >> else was not causing the problem when I have a chance now that I have > >> a fully functional and scripted out base backup and recover solution; > >> however, it did not give me any of the errors I see when my > >> walrestore.sh script cannot access an archived wal, so I would believe > >> it did not even try to get it out of the archive. > > > > Tom, > > Finally got a chance to give this another full test and discovered that the > WAL files work fine in the archive dir, but I uncovered a different problem. > What had happened was that when I rsynced the walarchive and walrestore > scripts over to the server, rsync made them mode 644 as they were in my cvs > sandbox and thus not executable. When postgres went to call the script, I > guess it stat'd the walrestore, saw that it was not executable and quietly > reported that it could not find the WAL file in the pg_xlog dir. Would > probably be handy to throw a warning that the restore command was not > executable, no? Sorry for the false alarm before. We basically run the command you give us via system(), so it is hard for us to know exactly why it failed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Wed, 6 Jul 2005, Bruce Momjian wrote: > We basically run the command you give us via system(), so it is hard for > us to know exactly why it failed. Understandable..but it would at least be nice to know that it failed. It does not appear to log anything at this point. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote: > On Wed, 6 Jul 2005, Bruce Momjian wrote: > > > We basically run the command you give us via system(), so it is hard for > > us to know exactly why it failed. > > Understandable..but it would at least be nice to know that it failed. It does > not appear to log anything at this point. That is strange. You said it reported the file did not exist. When the command fails, we assume it doesn't exist. Not sure what else we can do. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Wed, 6 Jul 2005, Bruce Momjian wrote: > That is strange. You said it reported the file did not exist. When the > command fails, we assume it doesn't exist. Not sure what else we can > do. I'm sorry, I should have copied and pasted the logs. It did not report a failure to copy from the archive dir, it reported that the wal file did not exist in pg_xlog. Here's a snip from the log: Jul 5 22:14:06 db1 postgres[26299]: [1-1] LOG: database system was interrupted at 2005-07-05 23:13:54 PDT Jul 5 22:14:06 db1 postgres[26299]: [2-1] LOG: starting archive recovery Jul 5 22:14:06 db1 postgres[26299]: [3-1] LOG: restore_command = "/usr/local/b in/walrestore.sh %f %p" Jul 5 22:14:06 db1 postgres[26299]: [4-1] LOG: could not open file "/usr/local /pgsql/data/pg_xlog/00000001000000060000005F" (log file 6, segment 95): No such file or Jul 5 22:14:06 db1 postgres[26299]: [4-2] directory Jul 5 22:14:06 db1 postgres[26299]: [5-1] LOG: invalid checkpoint record Jul 5 22:14:06 db1 postgres[26299]: [6-1] PANIC: could not locate required checkpoint record -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954