[bug fix] PITR corrupts the database cluster - Mailing list pgsql-hackers

From MauMau
Subject [bug fix] PITR corrupts the database cluster
Date
Msg-id F93E42280A9A4A5EB74FC7350C801A20@maumau
Whole thread Raw
Responses Re: [bug fix] PITR corrupts the database cluster
List pgsql-hackers
Hello,

I've encountered a bug of PITR that corrupts the database.  I'm willing to 
submit the patch to fix it, but I'm wondering what approach is appropriate. 
Could you give me your opinions?

[Problem]
I cannot connect to the database after performing the following steps:

1. CREATE DATABASE mydb;
2. Take a base backup with pg_basebackup.
3. DROP DATABASE mydb;
4. Shutdown the database server with "pg_ctl stop".
5. Recover the database cluster to the point where the base backup 
completed, i.e., before dropping mydb.  The contents of recovery.conf is:
restore_command = 'cp /arc/dir/%f %p'
recovery_target_timeline = 'latest'
recovery_target_time = 'STOP TIME recorded in the backup history file which 
was created during base backup'

I expected to be able to connect to mydb because I recovered to the point 
before dropping mydb.  However, I cannot connect to mydb because the 
directory for mydb does not exist.  The entry for mydb exists in 
pg_database.


[Cause]
DROP DATABASE emits the below WAL records:

1. System catalog changes including deletion of a tuple for mydb in 
pg_database
2. Deletion of directories for the database
3. Transaction commit

During recovery, postgres replays 1 and 2.  It ends the recovery when it 
notices that the time recorded in commit record (3 above) is later than the 
recovery target time.  The commit record is not replayed, thus the system 
catalog changes are virtually undone.

The problem is that 2 is replayed.  This deletes the directory for the 
database although the transaction is not committed.


[How to fix]
There are two approaches.  Which do you think is the way to go?

<Approach 1>
During recovery, when the WAL record for directory deletion is found, just 
record that fact for later replay (in a hash table keyed by xid).  When the 
corresponding transaction commit record is found, replay the directory 
deletion record.

<Approach 2>
Like the DROP TABLE/INDEX case, piggyback the directory deletion record on 
the transaction commit record, and eliminate the directory deletion record 
altogether.

I think we need to take approach 1 even when we also does 2, because 1 is 
necessary when the backup and archive WAL are already taken with the current 
PostgreSQL anyway.


Regards
MauMau




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Performance problem in PLPgSQL
Next
From: Andres Freund
Date:
Subject: Re: [bug fix] PITR corrupts the database cluster