Thread: Recovering a deleted database problem
Earlier this evening I made the usual mistake someone makes at some point in their lives - and dropped a database thinking I didn't need it, then realised later I did. So, because I have DDL statement logging turned on, I could find the exact time/date it happened, and attempted to restore from my file-system level backup taken at 2am this morning, and rolled forward all my WAL logs archived throughout the day (98 files.) In the recovery.conf, I specified the date/time from the log file that the database was dropped and set recovery_target_inclusive so it would not include this transaction. However the restore has finished, and PostgreSQL thinks the database is there, but the relevant data directory in "base" is missing - so it's removed the file-system objects but not the system database entry. I've checked the base backup, and this directory is in the backup, hence it has been removed at some point during the restore. What I'm going to do now is to set the recovery target to about a minute earlier to make sure the transaction has not started when the recovery finishes - but I'm just asking if I'm missing something obvious, as this is the first time I've done a restore from WAL logs. (Note, after writing this, I tried restoring to a minute earlier (ie. 18:57:40) and still have the same problem. As a quick fix, I copied the base/35290 directory from the backup before I had run the recovery - does anyone know any caveats to doing this, as the DB seems to be working OK?) My recovery.conf is: # PostgreSQL database recovery config file restore_command = 'cp /path/to/wal/archive/%f "%p"' recovery_target_time = '"2007-01-04 18:58:40 -00:00"' recovery_target_inclusive = 'false' The log entry where I discovered the date/time is: 2007-01-04 18:58:40 GMT 84.45.66.158 postgres postgresql - LOG: statement: DROP DATABASE [dbname]; The error I get when I try to connect to [dbname] after the restore is: FATAL: database "[dbname]" does not exist DETAIL: The database subdirectory "base/35290" is missing. But the [dbname] database is still in the system catalogues: /usr/local/pgsql/bin/psql -U postgresql -d postgres -c "select datname from pg_database;" datname ---------------- postgres [db1] template1 template0 [dbname] [db2] [db3] [db4] (8 rows) This is PostgreSQL 8.1.5 on FreeBSD 6.1. Many thanks, -- Andy Shellam NetServe Support Team the Mail Network "an alternative in a standardised world"
"Andy Shellam (Mailing Lists)" <andy.shellam-lists@mailnetwork.co.uk> writes: > (Note, after writing this, I tried restoring to a minute earlier (ie. > 18:57:40) and still have the same problem. The PITR recovery process in effect rolls forward until it finds a transaction-commit record >= the specified time. Now for normal database operations, stopping just short of the commit of the transaction is enough to ensure that the transaction has no effect. But for the XLOG_DBASE_DROP record, not so --- replaying that means "rm -rf base/whatever". So you've got to make sure the replay stops before it reaches that record, and that means you need a stop time <= the commit time of some *prior* transaction. I suppose this was a slow time of day and you didn't have any other commits in the prior minute :-( ... so take another look in the log and see what was the last commit before that, and use that time. regards, tom lane
Thanks for the info Tom, too much data will have been entered into the other databases in the cluster by now so I cannot give it another shot on that server, plus all of yesterday's WAL logs will have been purged by now by the daily backup routine. Is it enough to simply have re-copied in the base/xxx directory from the base backup, after the PITR recovery had completed (obviously any changes made to that database since the base backup won't have been restored but thankfully it's backed up nightly and doesn't change too often :-) ) All CRUD operations seem to be working on that database OK and the app that (I now know) uses it hasn't complained. What I'll probably do is try to simulate the same process again on a different machine to get myself a bit more familiar. Is there any other situations you can think of where this may also be relevant, or is it just when dropping a complete database? Many thanks, Andy. Tom Lane wrote: > "Andy Shellam (Mailing Lists)" <andy.shellam-lists@mailnetwork.co.uk> writes: > >> (Note, after writing this, I tried restoring to a minute earlier (ie. >> 18:57:40) and still have the same problem. >> > > The PITR recovery process in effect rolls forward until it finds > a transaction-commit record >= the specified time. Now for normal > database operations, stopping just short of the commit of the > transaction is enough to ensure that the transaction has no effect. > But for the XLOG_DBASE_DROP record, not so --- replaying that means > "rm -rf base/whatever". So you've got to make sure the replay stops > before it reaches that record, and that means you need a stop time > <= the commit time of some *prior* transaction. I suppose this was > a slow time of day and you didn't have any other commits in the prior > minute :-( ... so take another look in the log and see what was the > last commit before that, and use that time. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > > !DSPAM:37,459e6a32137101648020742! > > > -- Andy Shellam NetServe Support Team the Mail Network "an alternative in a standardised world"
"Andy Shellam (Mailing Lists)" <andy.shellam-lists@mailnetwork.co.uk> writes: > Is it enough to simply have re-copied in the base/xxx directory from the > base backup, after the PITR recovery had completed (obviously any > changes made to that database since the base backup won't have been > restored but thankfully it's backed up nightly and doesn't change too > often :-) ) Well, I'd be a little worried about whether the base backup was self-consistent, but if it was taken at a time where the DB was idle then you can probably get away with this. > What I'll probably do is try to simulate the same process again on a > different machine to get myself a bit more familiar. Is there any other > situations you can think of where this may also be relevant, or is it > just when dropping a complete database? AFAIK the only operations that have non-rollbackable side effects are CREATE/DROP DATABASE and CREATE/DROP TABLESPACE. For any of these, you'd end up with inconsistent state if you try to stop replay just before the commit record. regards, tom lane
> Well, I'd be a little worried about whether the base backup was > self-consistent, but if it was taken at a time where the DB was idle > then you can probably get away with this. > It gets backed up at 2am in the morning and AFAIK there'd be very few (if any) transactions going through until about 5am, so it should be OK. > AFAIK the only operations that have non-rollbackable side effects are > CREATE/DROP DATABASE and CREATE/DROP TABLESPACE. For any of these, > you'd end up with inconsistent state if you try to stop replay just > before the commit record. > OK thanks for that Tom, after glancing through the online documentation, I've enabled logging of checkpoints etc. by setting "log_min_messages" to "log" - is this enough to log the times that transactions are committed, so I can find this info should I need it again in the future? Thanks, -- Andy Shellam NetServe Support Team the Mail Network "an alternative in a standardised world"