Re: Bug in recovery of drop database? - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: Bug in recovery of drop database? |
Date | |
Msg-id | cv3jqr2n6o4jgp6f6mvscecno3i4ps5hgbs57h72djw47vjzu7@uvmkcshcspyv Whole thread Raw |
In response to | Re: Bug in recovery of drop database? (David Steele <david@pgbackrest.org>) |
List | pgsql-bugs |
Hi, On 2025-02-06 15:55:21 +0000, David Steele wrote: > On 2/5/25 15:24, David Steele wrote: > > > > psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: > > FATAL: cannot connect to invalid database "mytest" > > HINT: Use DROP DATABASE to drop invalid databases. > > Since this error message started cropping up fairly recently I had a look at > related commits and found this: > > c66a7d75 Handle DROP DATABASE getting interrupted > > This commit changed the error message so it is more obvious that something > has gone wrong, but the underlying issue appears the same for recovery. > > Before c66a7d75 it looked like this: > > FATAL: database "mytest" does not exist > DETAIL: The database subdirectory "base/16384" is missing. > > Which seems reasonable when just looking at the error and not the detail. Yea, I don't think my commit fundamentally changed things, it just made the issue a bit more apparent. The biggest change is that it actually allows to "fix" the leaked space from SQL. To actually fix this problem we'd need to move the dropping of database files to happen as part of transaction commit, similar to how relation files aren't actually truncated the moment you truncate a table. That's not a trivial change... WRT your repro: > test/pg/bin/psql mytest -t -c 'select now()' > 2025-02-05 19:18:26.245167+00 > > test/pg/bin/dropdb mytest > test/pg/bin/psql mytest > psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: > FATAL: database "mytest" does not exist > ... > echo "recovery_target_time = '2025-02-05 19:18:26.245167+00'" >> > test/data/postgresql.auto.conf > ... > test/pg/bin/psql mytest -c 'select count(*) from pgbench_accounts' > psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: > FATAL: cannot connect to invalid database "mytest" > HINT: Use DROP DATABASE to drop invalid databases. I'd guess this is due to using the default recovery_target_inclusive=true. The reason: > However, if I insert this command before the drop database: > > test/pg/bin/psql mytest -c 'create table test (id int)' > > Then recovery works fine since it is able to stop before the the drop > database. This is an issue on HEAD as well as all current back branches. works, presumably is that now there's another transaction to stop at *after*. Without the additional transaction the first complete transaction with the recovery_target_time IOW, while the two-step-drop-database is an issue, even without it, your reproducer doesn't seem like it actually would work, even if it were a single step, to implement the below scenario: > We have seen this several times now because a favorite test for users is to > drop a database and see if they can recover it with PITR. PITR in a test > database is always a bit dicey since little/no WAL activity means recovery > doesn't have much to go on RE recovery targets, but at the least the > database should be fully there or fully not there. Because the repro actually sets the recovery target to *after* the drop database. Am I misunderstanding something? Greetings, Andres Freund
pgsql-bugs by date: