Re: Back-patch of: avoid multiple hard links to same WAL file after a crash - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Back-patch of: avoid multiple hard links to same WAL file after a crash
Date
Msg-id Z2L6e1w-xABVTBRR@nathan
Whole thread Raw
In response to Back-patch of: avoid multiple hard links to same WAL file after a crash  (Robert Pang <robertpang@google.com>)
Responses Re: Back-patch of: avoid multiple hard links to same WAL file after a crash
List pgsql-hackers
On Tue, Dec 17, 2024 at 04:50:16PM -0800, Robert Pang wrote:
> We recently observed a few cases where Postgres running on Linux
> encountered an issue with WAL segment files. Specifically, two WAL
> segments were linked to the same physical file after Postgres ran out
> of memory and the OOM killer terminated one of its processes. This
> resulted in the WAL segments overwriting each other and Postgres
> failing a later recovery.

Yikes!

> We found this fix [1] that has been applied to Postgres 16, but the
> cases we observed were running Postgres 15. Given that older major
> versions will be supported for a good number of years, and the
> potential for irrecoverability exists (even if rare), we would like to
> discuss the possibility of back-patching this fix.

IMHO this is a good time to reevaluate.  It looks like we originally didn't
back-patch out of an abundance of caution, but now that this one has had
time to bake, I think it's worth seriously considering, especially now that
we have a report from the field.

-- 
nathan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Regression tests fail on OpenBSD due to low semmns value
Next
From: Andres Freund
Date:
Subject: Re: Regression tests fail on OpenBSD due to low semmns value