On 8/05/2023 9:47 pm, Andres Freund wrote:
> Did you have any occasions where CREATE or DROP DATABASE was interrupted?
> Either due the connection being terminated or a crash?
I've uploaded an edited version of the PG log for the time as
https://objective.realityexists.net/temp/log-extract-2023-05-02.txt
(test_behavior_638186279733138190 and test_behavior_638186280406544656
are the two DBs that got corrupted).
I cannot see any crash in the test logs or the PG logs, but whether it
was interrupted is less clear. I don't know whether the the tests ran
successfully up to the point where they tried to drop the DBs (I've
since added logging to show that next time), but DROP DATABASE did not
return after 30 seconds and the client library (Npgsql) then tried to
cancel the requests. We then tried to drop the DB again, with the same
results in both cases. After the second attempts timed out we closed the
connections anyway - so maybe that was the interruption?
> As described in
> https://postgr.es/m/20230314174521.74jl6ffqsee5mtug%40awork3.anarazel.de
> we don't handle that correctly for DROP DATABASE.
>
> I think that might actually fit the symptoms - the DropDatabaseBuffers() will
> throw away the dirty buffer contents from the WAL strategy CREATE DATABASE,
> but if you then get cancelled ata point before all the files are removed, the
> on-disk fails with all-zeroes would remain.
Oooh, that does seem to fit! Thank you for digging that up.