On Sat, Nov 30, 2019 at 03:09:39PM +0300, Grigory Smolkin wrote:
> I`ve digged a bit into this problem, and it`s turned out that in
> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
> O_EXCL' flags, which makes this routine as not very reentrant.
What did you see as I/O problem before facing the actual error
reported here? Was it just ENOSPC, a fsync failure, or just a failure
in closing the fd? The first pattern is mostly what I guess happened,
still a fsync failure would not trigger a PANIC here (actually we
really should do that!), but I am raising a different thread about
that issue.
> Since an exclusive lock is taken before temp file creation, I think it
> should be safe to replace O_EXCL with O_TRUNC.
> Script to reproduce and patch are attached.
Agreed. I prefer the O_TRUNC option because that's less code churn.
Also, as it can still be useful to have a look at the temporary state
file after a crash or a failure, doing unlink() in the error code
paths is no good option IMO.
Have others thoughts or objections to share?
--
Michael