Hello,
I've encountered a bug of PITR that corrupts the database. I'm willing to
submit the patch to fix it, but I'm wondering what approach is appropriate.
Could you give me your opinions?
[Problem]
I cannot connect to the database after performing the following steps:
1. CREATE DATABASE mydb;
2. Take a base backup with pg_basebackup.
3. DROP DATABASE mydb;
4. Shutdown the database server with "pg_ctl stop".
5. Recover the database cluster to the point where the base backup
completed, i.e., before dropping mydb. The contents of recovery.conf is:
restore_command = 'cp /arc/dir/%f %p'
recovery_target_timeline = 'latest'
recovery_target_time = 'STOP TIME recorded in the backup history file which
was created during base backup'
I expected to be able to connect to mydb because I recovered to the point
before dropping mydb. However, I cannot connect to mydb because the
directory for mydb does not exist. The entry for mydb exists in
pg_database.
[Cause]
DROP DATABASE emits the below WAL records:
1. System catalog changes including deletion of a tuple for mydb in
pg_database
2. Deletion of directories for the database
3. Transaction commit
During recovery, postgres replays 1 and 2. It ends the recovery when it
notices that the time recorded in commit record (3 above) is later than the
recovery target time. The commit record is not replayed, thus the system
catalog changes are virtually undone.
The problem is that 2 is replayed. This deletes the directory for the
database although the transaction is not committed.
[How to fix]
There are two approaches. Which do you think is the way to go?
<Approach 1>
During recovery, when the WAL record for directory deletion is found, just
record that fact for later replay (in a hash table keyed by xid). When the
corresponding transaction commit record is found, replay the directory
deletion record.
<Approach 2>
Like the DROP TABLE/INDEX case, piggyback the directory deletion record on
the transaction commit record, and eliminate the directory deletion record
altogether.
I think we need to take approach 1 even when we also does 2, because 1 is
necessary when the backup and archive WAL are already taken with the current
PostgreSQL anyway.
Regards
MauMau