deferred writing of two-phase state files adds fragility - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | deferred writing of two-phase state files adds fragility |
Date | |
Msg-id | CA+Tgmob2e542abFO-RspquqVYzpt7X4JeOKMDVXwDEowqzmcOg@mail.gmail.com Whole thread Raw |
Responses |
Re: deferred writing of two-phase state files adds fragility
Re: deferred writing of two-phase state files adds fragility |
List | pgsql-hackers |
Let's suppose that you execute PREPARE TRANSACTION and, before the next CHECKPOINT, the WAL record for the PREPARE TRANSACTION gets corrupted on disk. This might seem like an unlikely scenario, and it is, but we saw a case at EDB not too long ago. To a first approximation, the world ends. You can't execute COMMIT TRANSACTION or ROLLBACK TRANSACTION, so there's now way to resolve the prepared transaction. You also can't checkpoint, because that requires writing a twophase state file for the prepared transaction, and that's not possible because the WAL can't be read. What you have is a mostly working system, except that it's going to bloat over time because the prepared transaction is going to hold back the VACUUM horizon. And you basically have no way out of that problem, because there's no tool that says "I understand that my database is going to be corrupted, that's ok, just forget about that twophase transaction". If you shut down the database, then things become truly awful. You can't get a clean shutdown because you can't checkpoint, so you're going to resume recovery from the last checkpoint before the problem happened, find the corrupted WAL, and fail. As long as your database was up, you at least had the possibility of getting all of your data out of it by running pg_dump, as long as you can survive the amount of time that's going to take. And, if you did do that, you wouldn't even have corruption. But once your database has gone down, you can't get it back up again without running pg_resetwal. Running pg_resetwal is not very appealing here -- first because now you do have corruption whereas before the shutdown you didn't, and second because the last checkpoint could already be a long time in the past, depending on how quickly you realized you have this problem. Before 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71, things would not have been quite so bad. Checkpoints wouldn't fail, so you might never even realize you had a problem, or you might just need to rebuild your standbys. If you had corruption in a different place, like the twophase file itself, you could simply shut down cleanly, remove the twophase file, and start back up. I'm not quite sure whether that's equivalent to a forced abort of the twophase transaction or whether it might leave you with some latent corruption, but I suspect the problems you'll have will be pretty tame compared to what happens in the scenario described above. Just to be clear, I am not suggesting that we should revert that commit. I'm actually not sure whether we should change anything at all, but I'm not very comfortable with the status quo, either. It's unavoidable that the database will sometimes end up in a bad state -- Murphy's law, entropy, or whatever you want to call it guarantees that. But I like it a lot better when there's something that I can reasonably do to get the database OUT of that bad state, and in this situation nothing works -- or at least, nothing that I could think of works. It would be nice to improve on that somehow, if anybody has a good idea. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: