Re: silent data loss with ext4 / all current versions - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: silent data loss with ext4 / all current versions
Date
Msg-id CAB7nPqQ+wHynKARTXP63SBv9oqbGxOz0Godx99YL=QGHY4FPbA@mail.gmail.com
Whole thread Raw
In response to Re: silent data loss with ext4 / all current versions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Jan 23, 2016 at 11:39 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 01/23/2016 02:35 AM, Michael Paquier wrote:
>>
>> On Fri, Jan 22, 2016 at 9:41 PM, Greg Stark <stark@mit.edu> wrote:
>>> On Fri, Jan 22, 2016 at 8:26 AM, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> wrote:
>>> LVM snapshots would have the advantage that you can keep running the
>>> database and you can take lots of snapshots with relatively little
>>> overhead. Having dozens or hundreds of snapshots would be unacceptable
>>> performance drain in production but for testing it should be practical
>>> and they take relatively little space -- just the blocks changed since
>>> the snapshot was taken.
>>
>>
>> Another idea: hardcode a PANIC just after rename() with
>> restart_after_crash = off (this needs is IsBootstrapProcess() checks).
>> Once server crashes, kill-9 the VM. Then restart the VM and the
>> Postgres instance with a new binary that does not have the PANIC, and
>> see how things are moving on. There is a window of up to several
>> seconds after the rename() call, so I guess that this would work.
>
>
> I don't see how that would improve anything, as the PANIC has no impact on
> the I/O requests already issued to the system. What you need is some sort of
> coordination between the database and the script that kills the VM (or takes
> a LVM snapshot).

Well, to emulate the noise that non-renamed files have on the system
we could simply emulate the loss of rename() by just commenting it out
and then forcibly crash the instance or just PANIC the instance just
before rename(). This would emulate what we are looking for, no? What
we want to check is how the system reacts should an unwanted file be
in place.
For example, take the rename() call in InstallXLogFileSegment(),
crashing with an non-effective rename() will cause the presence of an
annoying xlogtemp file. Making the rename persistent would make the
server complain about an invalid magic number in a segment that has
just been created.
-- 
Michael



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: insert/update performance
Next
From: Michael Paquier
Date:
Subject: Re: proposal: function parse_ident