Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Recovery inconsistencies, standby much larger than primary
Date
Msg-id 2959.1391730125@sss.pgh.pa.us
Whole thread Raw
In response to Re: Recovery inconsistencies, standby much larger than primary  (Greg Stark <stark@mit.edu>)
Responses Re: Recovery inconsistencies, standby much larger than primary  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Greg Stark <stark@mit.edu> writes:
> On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> That's not necessarily true. If e.g. the buffer mapping would change
>> racily, the result write from the bgwriter could very well end up
>> increasing the file size, leaving a hole inbetween its write and the
>> original size.

> a) the segment isn't sparse and b) there were whole segments full of
> nuls between the end of the tables and the final blocks.

> So the file was definitely extended by Postgres, not the OS and the
> bgwriter passes EXTENSION_FAIL which means it wouldn't create those
> intervening segments.

But ... when InRecovery, md.c will create such segments too.  We had
dismissed that on the grounds that the files would be sparse because
of the way md.c creates them.  However, it is real damn hard to see
how the loop in XLogReadBufferExtended could've accessed a bogus block,
other than hardware misfeasance which I don't believe any more than
you do.  The blkno that's passed to that function came directly out
of a WAL record that's in the private memory of the startup process
and recently passed a CRC check.  You'd have to believe some sort
of asynchronous memory clobber inside the startup process.

On the other hand, if _mdfd_getseg did the deed, there's a whole lot
more space for something funny to have happened, because now we're
talking about a buffer being written in preparation for eviction
from shared buffers, long after WAL replay filled it.

So I'm wondering if there's something wrong with our deduction from
file non-sparseness.  In this connection, google quickly found me
a report of XFS "losing" the sparse state of a file across multiple
writes:
http://oss.sgi.com/archives/xfs/2011-06/msg00225.html
I wonder whether that bug or a similar one exists in your production
kernel.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary
Next
From: Andrew Dunstan
Date:
Subject: Re: jsonb and nested hstore