avoid multiple hard links to same WAL file after a crash - Mailing list pgsql-hackers

From Nathan Bossart
Subject avoid multiple hard links to same WAL file after a crash
Date
Msg-id 20220407182954.GA1231544@nathanxps13
Whole thread Raw
Responses Re: avoid multiple hard links to same WAL file after a crash
Re: avoid multiple hard links to same WAL file after a crash
List pgsql-hackers
Hi hackers,

I am splitting this off of a previous thread aimed at reducing archiving
overhead [0], as I believe this fix might deserve back-patching.

Presently, WAL recycling uses durable_rename_excl(), which notes that a
crash at an unfortunate moment can result in two links to the same file.
My testing [1] demonstrated that it was possible to end up with two links
to the same file in pg_wal after a crash just before unlink() during WAL
recycling.  Specifically, the test produced links to the same file for the
current WAL file and the next one because the half-recycled WAL file was
re-recycled upon restarting.  This seems likely to lead to WAL corruption.

The attached patch prevents this problem by using durable_rename() instead
of durable_rename_excl() for WAL recycling.  This removes the protection
against accidentally overwriting an existing WAL file, but there shouldn't
be one.

This patch also sets the stage for reducing archiving overhead (as
discussed in the other thread [0]).  The proposed change to reduce
archiving overhead will make it more likely that the server will attempt to
re-archive segments after a crash.  This might lead to archive corruption
if the server concurrently writes to the same file via the aforementioned
bug.

[0] https://www.postgresql.org/message-id/20220222011948.GA3850532%40nathanxps13
[1] https://www.postgresql.org/message-id/20220222173711.GA3852671%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: LogwrtResult contended spinlock
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences