Hi folks,
When performing database recovery tests, after restoring from backup is complete, what SOPs and tools do you use to sample your database contents and verify the data looks correct? Do you have a list of queries to run? What metadata do you capture and where do you save the test report? Do you use automation? Is it built in-house, off-the-shelf, or open-source?
Thanks, I'm not a DBA but no one else works at my company so any pointers would be appreciated.
If using pg_backup/pg_restore, then something like this is perfectly adequate:
pg_backup ... $DB 2> backup_$(date +"%F_%T").log || mail -s "ERROR: backup failed at $(date +\"%F %T\")" dolan@example.com pg_restore --exit-on-error ... $DB 2> restore_$(date +"%F_%T").log || mail -s "ERROR: restore failed at $(date +\"%F %T\")" dolan@example.com
Then you know to check the log file to see what happened.
My business users don't trust that, so I created a simple, fast, imperfect script which I run at the same time as the backup:
BEGIN;
SELECT COUNT(*) FROM table_1;
SELECT COUNT(*) FROM table_2;
...SELECT COUNT(*) FROM table_N;
COMMIT;
Run the same script on the restored database. The two log files have always been identical.
-- Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.