Thanks for your feedback.
On 1/20/22, 11:47 AM, "Andres Freund" <andres@anarazel.de> wrote:
> It seems odd to change a bunch of things to not be errors anymore, but then
> add new sources of errors. If we try to deal with concurrent deletions or
> permission issues - otherwise what's the point of making unlink() not an error
> anymore - why do we expect to be able to lstat()?
My reasoning for making lstat() an ERROR was that there's a chance we
need to fsync() the file, and if we can't fsync() a file for whatever
reason, we definitely want to ERROR. I suppose we could conditionally
ERROR based on the file name, but I don't know if that's really worth
the complexity.
> I'd be more on board accepting some selective errors. E.g. not erroring on
> ENOENT, but continuing to error on others (most likely ENOACCESS). I think we
> should *not* change the
I think this approach would work for the use-case Bharath mentioned
upthread. In any case, if deleting a file fails because the file was
already deleted, there's no point in ERROR-ing. I think filtering
errors is a bit trickier for lstat(). If we would've fsync'd the file
but lstat() gives us ENOENT, we may have a problem. (However, there's
also a good chance we wouldn't notice such problems if the race didn't
occur.) I'll play around with it.
>> + /*
>> + * We just log a message if a file doesn't fit the pattern, it's
>> + * probably some editor's lock/state file or similar...
>> + */
>
> An editor's lock file that starts with map- would presumably be the whole
> filename plus an additional file-ending. But this check won't catch those.
Right, it will either fsync() or unlink() those. I'll work on the
comment. Or do you think it's worth validating that there are no
trailing characters? I looked into that a bit earlier, and the code
felt excessive to me, but I don't have a strong opinion here.
>> + * Unlike failures to unlink() old logical rewrite files, we must
>> + * ERROR if we're unable to sync transient state down to disk. This
>> + * allows replay to assume that everything written out before
>> + * checkpoint start is persisted.
>> */
>
> Hm, not quite happy with the second bit. Checkpointed state being durable
> isn't about replay directly, it's about the basic property of a checkpoint
> being fulfilled?
I'll work on this. FWIW I modeled this based on the comment above the
function. Do you think that should be adjusted as well?
>> + /* persist directory entries to disk */
>> + fsync_fname("pg_logical/mappings", true);
>
> This is probably worth backpatching, if you split it out I'll do so.
Sure thing, will do shortly.
Nathan