Re: Corruption during WAL replay - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Corruption during WAL replay
Date
Msg-id 20200411004905.GA12834@alvherre.pgsql
Whole thread Raw
In response to Re: Corruption during WAL replay  (Andres Freund <andres@anarazel.de>)
Responses Re: Corruption during WAL replay  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2020-Mar-30, Andres Freund wrote:

> If we are really concerned with truncation failing - I don't know why we
> would be, we accept that we have to be able to modify files etc to stay
> up - we can add a pre-check ensuring that permissions are set up
> appropriately to allow us to truncate.

I remember I saw a case where the datadir was NFS or some other network
filesystem thingy, and it lost connection just before autovacuum
truncation, or something like that -- so there was no permission
failure, but the truncate failed and yet PG soldiered on.  I think the
connection was re-established soon thereafter and things went back to
"normal", with nobody realizing that a truncate had been lost.
Corruption was discovered a long time afterwards IIRC (weeks or months,
I don't remember).

I didn't review Teja's patch carefully, but the idea of panicking on
failure (causing WAL replay) seems better than the current behavior.
I'd rather put the server to wait until storage is really back.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_basebackup, manifests and backends older than ~12
Next
From: Andres Freund
Date:
Subject: Re: Corruption during WAL replay