Re: Bug in recovery of drop database? - Mailing list pgsql-bugs

From David Steele
Subject Re: Bug in recovery of drop database?
Date
Msg-id c1c2c818-888a-45d8-8994-4ed591cc11ed@pgbackrest.org
Whole thread Raw
In response to Bug in recovery of drop database?  (David Steele <david@pgbackrest.org>)
Responses Re: Bug in recovery of drop database?
List pgsql-bugs
On 2/6/25 13:05, Andres Freund wrote:
> On 2025-02-06 15:55:21 +0000, David Steele wrote:
>>
>> Before c66a7d75 it looked like this:
>>
>> FATAL:  database "mytest" does not exist
>> DETAIL:  The database subdirectory "base/16384" is missing.
>>
>> Which seems reasonable when just looking at the error and not the detail.
> 
> Yea, I don't think my commit fundamentally changed things, it just made the
> issue a bit more apparent.  The biggest change is that it actually allows to
> "fix" the leaked space from SQL.
> 
> To actually fix this problem we'd need to move the dropping of database files
> to happen as part of transaction commit, similar to how relation files aren't
> actually truncated the moment you truncate a table.  That's not a trivial
> change...

Understatement.

> WRT your repro:
>> test/pg/bin/psql mytest -t -c 'select now()'
>>   2025-02-05 19:18:26.245167+00
>>
>> test/pg/bin/dropdb mytest
>> test/pg/bin/psql mytest
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
>> FATAL:  database "mytest" does not exist
>> ...
>> echo "recovery_target_time = '2025-02-05 19:18:26.245167+00'" >>
>> test/data/postgresql.auto.conf
>> ...
>> test/pg/bin/psql mytest -c 'select count(*) from pgbench_accounts'
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
>> FATAL:  cannot connect to invalid database "mytest"
>> HINT:  Use DROP DATABASE to drop invalid databases.
> 
> I'd guess this is due to using the default recovery_target_inclusive=true.

I got the same result with recovery_target_inclusive=false. I think you 
could get it to work with a more precise timestamp or better, xid or lsn.

> The reason:
> 
>> However, if I insert this command before the drop database:
>>
>> test/pg/bin/psql mytest -c 'create table test (id int)'
>>
>> Then recovery works fine since it is able to stop before the the drop
>> database. This is an issue on HEAD as well as all current back branches.
> 
> works, presumably is that now there's another transaction to stop at
> *after*. Without the additional transaction the first complete transaction
> with the recovery_target_time

Given I'm grabbing the timestamp before the create table I'd expect it 
to stop before the table is created whether recovery_target_inclusive is 
true or false. I just gave it a try and sure enough either way I get:

$ test/pg/bin/psql mytest -c 'select count(*) from test'
ERROR:  relation "test" does not exist

> IOW, while the two-step-drop-database is an issue, even without it, your
> reproducer doesn't seem like it actually would work, even if it were a single
> step, to implement the below scenario:

The repro works almost every time for me on all six active branches. It 
hasn't worked a few times, but in those cases I suspect I accidentally 
skipped a step. I'm running it manually each time.

>> We have seen this several times now because a favorite test for users is to
>> drop a database and see if they can recover it with PITR. PITR in a test
>> database is always a bit dicey since little/no WAL activity means recovery
>> doesn't have much to go on RE recovery targets, but at the least the
>> database should be fully there or fully not there.
> 
> Because the repro actually sets the recovery target to *after* the drop
> database.
> 
> Am I misunderstanding something?

Without looking at the code, my guess is the dropping of files WAL 
record does not have a transaction timestamp that recovery can use so 
recovery blows by the dropping of files and stops before the entry is 
removed from pg_database.

It sounds like this is more or less a known issue. If we decide not to 
fix it how about I at least document it as a caveat for drop database?

Regards,
-David



pgsql-bugs by date:

Previous
From: David Steele
Date:
Subject: Re: Bug in recovery of drop database?
Next
From: David Steele
Date:
Subject: Re: Bug in recovery of drop database?