On Sat, Jan 23, 2016 at 11:39 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 01/23/2016 02:35 AM, Michael Paquier wrote:
>>
>> On Fri, Jan 22, 2016 at 9:41 PM, Greg Stark <stark@mit.edu> wrote:
>>> On Fri, Jan 22, 2016 at 8:26 AM, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> wrote:
>>> LVM snapshots would have the advantage that you can keep running the
>>> database and you can take lots of snapshots with relatively little
>>> overhead. Having dozens or hundreds of snapshots would be unacceptable
>>> performance drain in production but for testing it should be practical
>>> and they take relatively little space -- just the blocks changed since
>>> the snapshot was taken.
>>
>>
>> Another idea: hardcode a PANIC just after rename() with
>> restart_after_crash = off (this needs is IsBootstrapProcess() checks).
>> Once server crashes, kill-9 the VM. Then restart the VM and the
>> Postgres instance with a new binary that does not have the PANIC, and
>> see how things are moving on. There is a window of up to several
>> seconds after the rename() call, so I guess that this would work.
>
>
> I don't see how that would improve anything, as the PANIC has no impact on
> the I/O requests already issued to the system. What you need is some sort of
> coordination between the database and the script that kills the VM (or takes
> a LVM snapshot).
Well, to emulate the noise that non-renamed files have on the system
we could simply emulate the loss of rename() by just commenting it out
and then forcibly crash the instance or just PANIC the instance just
before rename(). This would emulate what we are looking for, no? What
we want to check is how the system reacts should an unwanted file be
in place.
For example, take the rename() call in InstallXLogFileSegment(),
crashing with an non-effective rename() will cause the presence of an
annoying xlogtemp file. Making the rename persistent would make the
server complain about an invalid magic number in a segment that has
just been created.
--
Michael