Re: avoid multiple hard links to same WAL file after a crash - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: avoid multiple hard links to same WAL file after a crash
Date
Msg-id 20220502230613.GA3398932@nathanxps13
Whole thread Raw
In response to Re: avoid multiple hard links to same WAL file after a crash  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: avoid multiple hard links to same WAL file after a crash
List pgsql-hackers
On Mon, May 02, 2022 at 10:39:07AM -0700, Nathan Bossart wrote:
> On Mon, May 02, 2022 at 07:48:18PM +0900, Michael Paquier wrote:
>> The WAL receiver upgrades the ERROR to a FATAL, and restarts
>> streaming shortly after.  Using durable_rename() would not be an issue
>> here.
> 
> Thanks for investigating this one.  I think I agree that we should simply
> switch to durable_rename() (without a file existence check beforehand).

Here is a new patch set.  For now, I've only removed the file existence
check in writeTimeLineHistoryFile().  I don't know if I'm totally convinced
that there isn't a problem here (e.g., due to concurrent .ready file
creation), but since some platforms have been using rename() for some time,
I don't know how worried we should be.  I thought about adding some kind of
locking between the WAL receiver and startup processes, but that seems
excessive.  Alternatively, we could just fix xlog.c as proposed earlier
[0].  AFAICT that is the only caller that can experience problems due to
the multiple-hard-link issue.  All other callers are simply renaming a
temporary file into place, and the temporary file can be discarded if left
behind after a crash.

[0] https://postgr.es/m/20220407182954.GA1231544%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: strange slow query - lost lot of time somewhere
Next
From: David Christensen
Date:
Subject: Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL