Re: subscriptionCheck failures on nightjar - Mailing list pgsql-hackers

From Andres Freund
Subject Re: subscriptionCheck failures on nightjar
Date
Msg-id 20190213171101.6wpz7tardp3t3uvk@alap3.anarazel.de
Whole thread Raw
In response to Re: subscriptionCheck failures on nightjar  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: subscriptionCheck failures on nightjar  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:

Cool. How exactly?

Nice catch.

> Anyway, I think we might be able to fix this along the lines of
> 
>     CloseTransientFile(fd);
> 
> +   /* ensure snapshot file is down to stable storage */
> +   fsync_fname(tmppath, false);
>     fsync_fname("pg_logical/snapshots", true);
> 
>     /*
>      * We may overwrite the work from some other backend, but that's ok, our
>      * snapshot is valid as well, we'll just have done some superfluous work.
>      */
>     if (rename(tmppath, path) != 0)
>     {
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not rename file \"%s\" to \"%s\": %m",
>                         tmppath, path)));
>     }
> 
> -   /* make sure we persist */
> -   fsync_fname(path, false);
> +   /* ensure we persist the file rename */
>     fsync_fname("pg_logical/snapshots", true);

Hm, but that's not the same? On some filesystems one needs the directory
fsync, on some the file fsync, and I think both in some cases. ISTM we
should just open the file before the rename, and then use fsync() on the
filehandle rather than the filename.  Then a concurrent rename ought not
to hurt us?


> The existing code here seems simply wacky/unsafe to me regardless
> of this race condition: couldn't it potentially result in a corrupt
> snapshot file appearing with a valid name, if the system crashes
> after persisting the rename but before it's pushed the data out?

What do you mean precisely with "before it's pushed the data out"?


> I also wonder why bother with the directory sync just before the
> rename.

Because on some FS/OS combinations the size of the renamed-into-place
file isn't guaranteed to be durable unless the directory was
fsynced. Otherwise there's the possibility to have incomplete data if we
crash after renaming, but before fsyncing.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Reaping Temp tables to avoid XID wraparound
Next
From: Tom Lane
Date:
Subject: Re: subscriptionCheck failures on nightjar