Home > mailing lists

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Recovery inconsistencies, standby much larger than primary
Date	February 7, 2014 02:42:14
Msg-id	2959.1391730125@sss.pgh.pa.us Whole thread Raw
In response to	Re: Recovery inconsistencies, standby much larger than primary (Greg Stark <stark@mit.edu>)
Responses	Re: Recovery inconsistencies, standby much larger than primary (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Greg Stark <stark@mit.edu> writes:
> On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> That's not necessarily true. If e.g. the buffer mapping would change
>> racily, the result write from the bgwriter could very well end up
>> increasing the file size, leaving a hole inbetween its write and the
>> original size.

> a) the segment isn't sparse and b) there were whole segments full of
> nuls between the end of the tables and the final blocks.

> So the file was definitely extended by Postgres, not the OS and the
> bgwriter passes EXTENSION_FAIL which means it wouldn't create those
> intervening segments.

But ... when InRecovery, md.c will create such segments too.  We had
dismissed that on the grounds that the files would be sparse because
of the way md.c creates them.  However, it is real damn hard to see
how the loop in XLogReadBufferExtended could've accessed a bogus block,
other than hardware misfeasance which I don't believe any more than
you do.  The blkno that's passed to that function came directly out
of a WAL record that's in the private memory of the startup process
and recently passed a CRC check.  You'd have to believe some sort
of asynchronous memory clobber inside the startup process.

On the other hand, if _mdfd_getseg did the deed, there's a whole lot
more space for something funny to have happened, because now we're
talking about a buffer being written in preparation for eviction
from shared buffers, long after WAL replay filled it.

So I'm wondering if there's something wrong with our deduction from
file non-sparseness.  In this connection, google quickly found me
a report of XFS "losing" the sparse state of a file across multiple
writes:
http://oss.sgi.com/archives/xfs/2011-06/msg00225.html
I wonder whether that bug or a similar one exists in your production
kernel.
        regards, tom lane

pgsql-hackers by date:

From: Greg Stark
Date: 07 February 2014, 02:07:25
Subject: Re: Recovery inconsistencies, standby much larger than primary

From: Andrew Dunstan
Date: 07 February 2014, 02:47:46
Subject: Re: jsonb and nested hstore

Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

Previous

Next