Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery - Mailing list pgsql-bugs
From
Thomas Crayford
Subject
Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery
On Fri, Sep 28, 2018 at 11:59 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Sep 28, 2018 at 01:02:42PM +0100, Thomas Crayford wrote: > Ok, thanks for the pointer. It seems like the race condition I talked about > is still accurate, does that seem right?
KeepFileRestoredFromArchive() looks like a good candidate on the matter as it removes a WAL segment before replacing it by another with the same name. I have a hard time understanding why the checkpointer would try to recycle a segment just recovered though as the startup process would immediately try to use it. I have not spent more than one hour looking at potential spots though, which is not much for this kind of race conditions.
It is also why I am curious about what kind of restore_command you are using. -- Michael