Re: Bug in recovery of drop database? - Mailing list pgsql-bugs
From | David Steele |
---|---|
Subject | Re: Bug in recovery of drop database? |
Date | |
Msg-id | c1c2c818-888a-45d8-8994-4ed591cc11ed@pgbackrest.org Whole thread Raw |
In response to | Bug in recovery of drop database? (David Steele <david@pgbackrest.org>) |
Responses |
Re: Bug in recovery of drop database?
|
List | pgsql-bugs |
On 2/6/25 13:05, Andres Freund wrote: > On 2025-02-06 15:55:21 +0000, David Steele wrote: >> >> Before c66a7d75 it looked like this: >> >> FATAL: database "mytest" does not exist >> DETAIL: The database subdirectory "base/16384" is missing. >> >> Which seems reasonable when just looking at the error and not the detail. > > Yea, I don't think my commit fundamentally changed things, it just made the > issue a bit more apparent. The biggest change is that it actually allows to > "fix" the leaked space from SQL. > > To actually fix this problem we'd need to move the dropping of database files > to happen as part of transaction commit, similar to how relation files aren't > actually truncated the moment you truncate a table. That's not a trivial > change... Understatement. > WRT your repro: >> test/pg/bin/psql mytest -t -c 'select now()' >> 2025-02-05 19:18:26.245167+00 >> >> test/pg/bin/dropdb mytest >> test/pg/bin/psql mytest >> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: >> FATAL: database "mytest" does not exist >> ... >> echo "recovery_target_time = '2025-02-05 19:18:26.245167+00'" >> >> test/data/postgresql.auto.conf >> ... >> test/pg/bin/psql mytest -c 'select count(*) from pgbench_accounts' >> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: >> FATAL: cannot connect to invalid database "mytest" >> HINT: Use DROP DATABASE to drop invalid databases. > > I'd guess this is due to using the default recovery_target_inclusive=true. I got the same result with recovery_target_inclusive=false. I think you could get it to work with a more precise timestamp or better, xid or lsn. > The reason: > >> However, if I insert this command before the drop database: >> >> test/pg/bin/psql mytest -c 'create table test (id int)' >> >> Then recovery works fine since it is able to stop before the the drop >> database. This is an issue on HEAD as well as all current back branches. > > works, presumably is that now there's another transaction to stop at > *after*. Without the additional transaction the first complete transaction > with the recovery_target_time Given I'm grabbing the timestamp before the create table I'd expect it to stop before the table is created whether recovery_target_inclusive is true or false. I just gave it a try and sure enough either way I get: $ test/pg/bin/psql mytest -c 'select count(*) from test' ERROR: relation "test" does not exist > IOW, while the two-step-drop-database is an issue, even without it, your > reproducer doesn't seem like it actually would work, even if it were a single > step, to implement the below scenario: The repro works almost every time for me on all six active branches. It hasn't worked a few times, but in those cases I suspect I accidentally skipped a step. I'm running it manually each time. >> We have seen this several times now because a favorite test for users is to >> drop a database and see if they can recover it with PITR. PITR in a test >> database is always a bit dicey since little/no WAL activity means recovery >> doesn't have much to go on RE recovery targets, but at the least the >> database should be fully there or fully not there. > > Because the repro actually sets the recovery target to *after* the drop > database. > > Am I misunderstanding something? Without looking at the code, my guess is the dropping of files WAL record does not have a transaction timestamp that recovery can use so recovery blows by the dropping of files and stops before the entry is removed from pg_database. It sounds like this is more or less a known issue. If we decide not to fix it how about I at least document it as a caveat for drop database? Regards, -David
pgsql-bugs by date: