Thread: Postgres base Backup fails to recover all logs fails from archive
Hi This is postgreSQL 9.2.1 By accident I was running the the following command «delete from org_oks.oks_kartobjekt_flate where id = 53» on the wrong database. - I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by using«24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file systembackup and then SELECT pg_stop_backup(); The row I deleted was added after the base backup was created. - Her is the cat from 0000000100000023000000F4.00000020.backup START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4) STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4) CHECKPOINT LOCATION: 23/F4000020 BACKUP METHOD: pg_start_backup BACKUP FROM: master START TIME: 2012-11-22 09:57:03 CET LABEL: /backup/db01/base_backup_2012_11_22 STOP TIME: 2012-11-22 10:38:44 CET - I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the samebehavior) restore_command ='cp /data01/new_archive_files/%f %p' recovery_target_time = '2013-01-04 04:00:00' - Here is the log from startup LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET LOG: starting archive recovery LOG: restored log file "0000000100000023000000F4" from archive LOG: redo starts at 23/F4000080 LOG: consistent recovery state reached at 23/F5000000 LOG: restored log file "0000000100000023000000F5" from archive LOG: restored log file "0000000100000023000000F6" from archive LOG: restored log file "0000000100000023000000F7" from archive LOG: contrecord is requested by 23/F7000020 LOG: redo done at 23/F6000080 LOG: restored log file "0000000100000023000000F6" from archive cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory LOG: selected new timeline ID: 2 cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started - Here is the list some the archive files. -rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008 -rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007 -rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006 -rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005 -rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004 -rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003 -rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002 -rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001 -rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000 -rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE -rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD -rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC -rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB -rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9 -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8 -rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7 -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6 -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5 -rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4 -rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup I have two questions 1. Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before without anyproblems.) 2. Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the row Ideleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files) Thanks in advance Lars
On Sat, Jan 5, 2013 at 10:30 PM, Lars Aksel Opsahl <lop@skogoglandskap.no> wrote: > Hi > > > This is postgreSQL 9.2.1 > > > By accident I was running the the following command > > «delete from org_oks.oks_kartobjekt_flate where id = 53» > > on the wrong database. > > > - I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by using«24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file systembackup and then SELECT pg_stop_backup(); > > > The row I deleted was added after the base backup was created. > > - Her is the cat from 0000000100000023000000F4.00000020.backup > > START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4) > > STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4) > > CHECKPOINT LOCATION: 23/F4000020 > > BACKUP METHOD: pg_start_backup > > BACKUP FROM: master > > START TIME: 2012-11-22 09:57:03 CET > > LABEL: /backup/db01/base_backup_2012_11_22 > > STOP TIME: 2012-11-22 10:38:44 CET > > - I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the samebehavior) > > restore_command ='cp /data01/new_archive_files/%f %p' > > recovery_target_time = '2013-01-04 04:00:00' > > - Here is the log from startup > > LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET > > LOG: starting archive recovery > > LOG: restored log file "0000000100000023000000F4" from archive > > LOG: redo starts at 23/F4000080 > > LOG: consistent recovery state reached at 23/F5000000 > > LOG: restored log file "0000000100000023000000F5" from archive > > LOG: restored log file "0000000100000023000000F6" from archive > > LOG: restored log file "0000000100000023000000F7" from archive > > LOG: contrecord is requested by 23/F7000020 > > LOG: redo done at 23/F6000080 > > LOG: restored log file "0000000100000023000000F6" from archive > > cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory > > LOG: selected new timeline ID: 2 > > cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory > > LOG: archive recovery complete > > LOG: database system is ready to accept connections > > LOG: autovacuum launcher started > > > - Here is the list some the archive files. > > -rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008 > > -rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007 > > -rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9 > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8 > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7 > > -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6 > > -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5 > > -rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4 > > -rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup > > > I have two questions > > 1. Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before withoutany problems.) What actually went wrong? Because the log looks like a perfectly normal restore? (Though you should really add timestamps to your log) > 2. Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the rowI deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files) No, the archive files only contain the binary changes to the disk blocks, not the SQL statements or data. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Hi Thanks for your help. The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7. The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files. I don't under stand why it stops to read the rest of the log files. Lars ________________________________________ Fra: Magnus Hagander [magnus@hagander.net] Sendt: 5. januar 2013 22:45 Til: Lars Aksel Opsahl Kopi: pgsql-admin@postgresql.org Emne: Re: [ADMIN] Postgres base Backup fails to recover all logs fails from archive On Sat, Jan 5, 2013 at 10:30 PM, Lars Aksel Opsahl <lop@skogoglandskap.no> wrote: > Hi > > > This is postgreSQL 9.2.1 > > > By accident I was running the the following command > > «delete from org_oks.oks_kartobjekt_flate where id = 53» > > on the wrong database. > > > - I first picked up last basebackup from the backup server. The base backup was from 22/11 and It was made made by using«24.3.3. Making a Base Backup Using the Low Level API» with SELECT pg_start_backup('label'); and then a file systembackup and then SELECT pg_stop_backup(); > > > The row I deleted was added after the base backup was created. > > - Her is the cat from 0000000100000023000000F4.00000020.backup > > START WAL LOCATION: 23/F4000020 (file 0000000100000023000000F4) > > STOP WAL LOCATION: 23/F47303F0 (file 0000000100000023000000F4) > > CHECKPOINT LOCATION: 23/F4000020 > > BACKUP METHOD: pg_start_backup > > BACKUP FROM: master > > START TIME: 2012-11-22 09:57:03 CET > > LABEL: /backup/db01/base_backup_2012_11_22 > > STOP TIME: 2012-11-22 10:38:44 CET > > - I create the recovery.conf with the following content (I have also tested without recovery_target_time but it's the samebehavior) > > restore_command ='cp /data01/new_archive_files/%f %p' > > recovery_target_time = '2013-01-04 04:00:00' > > - Here is the log from startup > > LOG: database system was interrupted; last known up at 2012-11-22 10:22:36 CET > > LOG: starting archive recovery > > LOG: restored log file "0000000100000023000000F4" from archive > > LOG: redo starts at 23/F4000080 > > LOG: consistent recovery state reached at 23/F5000000 > > LOG: restored log file "0000000100000023000000F5" from archive > > LOG: restored log file "0000000100000023000000F6" from archive > > LOG: restored log file "0000000100000023000000F7" from archive > > LOG: contrecord is requested by 23/F7000020 > > LOG: redo done at 23/F6000080 > > LOG: restored log file "0000000100000023000000F6" from archive > > cp: cannot stat `/data01/new_archive_files/00000002.history': No such file or directory > > LOG: selected new timeline ID: 2 > > cp: cannot stat `/data01/new_archive_files/00000001.history': No such file or directory > > LOG: archive recovery complete > > LOG: database system is ready to accept connections > > LOG: autovacuum launcher started > > > - Here is the list some the archive files. > > -rwx------ 1 postgres postgres 16777216 Nov 24 07:11 000000010000002400000008 > > -rwx------ 1 postgres postgres 16777216 Nov 24 07:10 000000010000002400000007 > > -rwx------ 1 postgres postgres 16777216 Nov 24 07:00 000000010000002400000006 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:50 000000010000002400000005 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:45 000000010000002400000004 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:40 000000010000002400000003 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:32 000000010000002400000002 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:26 000000010000002400000001 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:16 000000010000002400000000 > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:08 0000000100000023000000FE > > -rwx------ 1 postgres postgres 16777216 Nov 24 06:02 0000000100000023000000FD > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:57 0000000100000023000000FC > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:39 0000000100000023000000FB > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:32 0000000100000023000000FA > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F9 > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:31 0000000100000023000000F8 > > -rwx------ 1 postgres postgres 16777216 Nov 24 05:27 0000000100000023000000F7 > > -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F6 > > -rwx------ 1 postgres postgres 16777216 Nov 22 12:37 0000000100000023000000F5 > > -rwx------ 1 postgres postgres 16777216 Nov 22 10:38 0000000100000023000000F4 > > -rwx------ 1 postgres postgres 325 Nov 22 10:38 0000000100000023000000F4.00000020.backup > > > I have two questions > > 1. Can anybody see what the problem is here and what I am doing wrong ? (I have done this many times before withoutany problems.) What actually went wrong? Because the log looks like a perfectly normal restore? (Though you should really add timestamps to your log) > 2. Is it possible to extract sql insert's from the archive files abouve and in that way find the content of the rowI deleted ? (The table name oks_kartobjekt_flate only occurs in about 10 archive files) No, the archive files only contain the binary changes to the disk blocks, not the SQL statements or data. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Sat, Jan 5, 2013 at 11:30 PM, Lars Aksel Opsahl <lop@skogoglandskap.no> wrote: > Hi > > Thanks for your help. > > The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7. > > The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files. > > I don't under stand why it stops to read the rest of the log files. Ah, I see that now. Well, it's claiming that the F7 logfile is corrupt. There's also an interesting jump in timestamps - the F7 logfile is from *much* later than F6. Almost two whole days. Is that actually normal, or did something possibly go wrong with your archiving? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Hi again I have tried to check different /var/log/messages on the servers but they only go back 1 month. I also had quick glance at Nagios and Munin logs but I could not find anything there. This database is mainly used for reading data as it is now, but it's a couple of applications which also store data. The2 days with out activity in the archive logs is strange when I checked back on the activity reports for applications.I will set the archive_timeout, but the problem is that we have small updates and the archive files has thesame size and is not depending on the amount of changed data. We use NFS to copy archive files to the backup server because that is simple, but maybe we should have used rsyn or somethingelse, but independent off how we do it, we need a method to check that the archive file is OK after it's copy. If it's not OK the files must be transfers once more from the database server. I assume the archive always will be OK whenit's created on the database server. Is the the way off doing this to make a script that copies the archive file to backup server cheeks that the files are equalbefore before it is deleted on the database server ? Is there any simple utility program that can check if the archive file is OK or do I have to do a restore to check if thefile is OK ? Thanks Lars ________________________________________ Fra: Magnus Hagander [magnus@hagander.net] Sendt: 7. januar 2013 21:47 Til: Lars Aksel Opsahl Kopi: pgsql-admin@postgresql.org Emne: Re: [ADMIN] Postgres base Backup fails to recover all logs fails from archive On Sat, Jan 5, 2013 at 11:30 PM, Lars Aksel Opsahl <lop@skogoglandskap.no> wrote: > Hi > > Thanks for your help. > > The problem is that it stops to restore the log files after reading log file 0000000100000023000000F7. > > The recovery does not continue to read log file 0000000100000023000000F8 and the rest of the log files. > > I don't under stand why it stops to read the rest of the log files. Ah, I see that now. Well, it's claiming that the F7 logfile is corrupt. There's also an interesting jump in timestamps - the F7 logfile is from *much* later than F6. Almost two whole days. Is that actually normal, or did something possibly go wrong with your archiving? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Jan 8, 2013 at 11:00 AM, Lars Aksel Opsahl <lop@skogoglandskap.no> wrote: > Hi again > > I have tried to check different /var/log/messages on the servers but they only go back 1 month. I also had quick glanceat Nagios and Munin logs but I could not find anything there. > > This database is mainly used for reading data as it is now, but it's a couple of applications which also store data. The2 days with out activity in the archive logs is strange when I checked back on the activity reports for applications.I will set the archive_timeout, but the problem is that we have small updates and the archive files has thesame size and is not depending on the amount of changed data. Yeah, that is indeed a problem. pg_receivexlog can help you arund that issue, but then your receiving end needs to be a "real box" and not just an NFS share. > We use NFS to copy archive files to the backup server because that is simple, but maybe we should have used rsyn or somethingelse, but independent off how we do it, we need a method to check that the archive file is OK after it's copy. I would generally recommend using somehting that writes the file atomically. For example, rsync. But you can use rsync with your target being the NFS share - you don't need an rsync server. Or just a manual copy+rename. As long as your NFS isn't mounted asynchronously, that should be safe. > If it's not OK the files must be transfers once more from the database server. I assume the archive always will be OK whenit's created on the database server. > > Is the the way off doing this to make a script that copies the archive file to backup server cheeks that the files areequal before before it is deleted on the database server ? Copy+rename should take care of it for you - I doubt it's an actual failure in contents wrong, it's more that something crashed in the middle of a file and therefor it ended up being corrupt. If that was written to a temp file and then renamed into place, that should not happen. > Is there any simple utility program that can check if the archive file is OK or do I have to do a restore to check if thefile is OK ? I don't believe there is such a tool. You could make some very most basic verification manually, but to verify the full contents you really need to fully parse them - and the tool to do that is "postgres". -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/