Re: Sketch of a fix for that truncation data corruption issue - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Sketch of a fix for that truncation data corruption issue
Date
Msg-id 20181212015415.5pphghl3buuz2hob@alap3.anarazel.de
Whole thread Raw
In response to Re: Sketch of a fix for that truncation data corruption issue  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2018-12-12 10:49:59 +0900, Robert Haas wrote:
> Just thinking about this a bit, the problem with truncating first and
> then writing the WAL record is that if the WAL record never makes it
> to disk, any physical standbys will end up out of sync with the
> master, leading to disaster. But the problem with writing the WAL
> record first is that the actual operation might fail, and then
> standbys will end up out of sync with the master, leading to disaster.
> The obvious way to finesse that latter problem is just PANIC if
> ftruncate() fails -- then we'll crash restart and retry, and if we
> still can't do it, well, the DBA will have to fix that before the
> system can come on line.  I'm not sure that's really all that bad --
> if we can't truncate, we're kinda hosed.  How, other than a
> permissions problem, does that even happen?

I think it's correct to panic in that situation. As you say it's really
unlikely for that to happen in normal circumstances (as long as we
handle obvious stuff like EINTR) - and added complexity to avoid it
seems very unlikely to be tested.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Sketch of a fix for that truncation data corruption issue
Next
From: Robert Haas
Date:
Subject: Re: Remove Deprecated Exclusive Backup Mode