On Sat, Oct 15, 2022 at 10:19:05AM +0530, Bharath Rupireddy wrote:
> Can you please help me understand how name collisions can happen with
> temp file names including WAL file name, timestamp to millisecond
> scale, and PID? Having the timestamp is enough to provide a non-unique
> temp file name when PID wraparound occurs, right? Am I missing
> something here?
Outside of contrived cases involving multiple servers, inaccurate clocks,
PID reuse, etc., it seems unlikely.
> Hm, we cannot remove the temp file for all sorts of crashes, but
> having on_shmem_exit() or before_shmem_exit() or atexit() or any such
> callback removing it would help us cover some crash scenarios (that
> exit with proc_exit() or exit()) at least. I think the basic_archive
> module currently leaves temp files around even when the server is
> restarted legitimately while copying to or renaming the temp file, no?
I think the right way to do this would be to add handling for leftover
files in the sigsetjmp() block and a shutdown callback (which just sets up
a before_shmem_exit callback). While this should ensure those files are
cleaned up after an ERROR or FATAL, crashes and unlink() failures could
still leave files behind. We'd probably also need to avoid cleaning up the
temp file if copy_file() fails because it already exists, as we won't know
if it's actually ours. Overall, I suspect it'd be more trouble than it's
worth.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com