Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Recovery inconsistencies, standby much larger than primary
Date
Msg-id CAM-w4HMJ8qHKfjEBGw6SUjgLV-+zh8ege0cA1PJ5wG=Ys=nAyA@mail.gmail.com
Whole thread Raw
In response to Re: Recovery inconsistencies, standby much larger than primary  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Recovery inconsistencies, standby much larger than primary  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Recovery inconsistencies, standby much larger than primary  (Andrea Suisani <sickpig@opinioni.net>)
List pgsql-hackers
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> How about the attached instead?
>
>> This does possibly allocate an extra block past the target block. I'm
>> not sure how surprising that would be for the rest of the code.
>
> Should be fine; we could end up with an extra block after a failed
> extension operation in any case.

I know it's fine on the active database, I'm not so clear whether it's
compatible with the xlog records from the primary. I suppose it'll
just see an Initialize Page record and happily see the nul block and
initialize it. It's still a bit scary.

I of course think my code is vastly clearer than the existing code but
yes, I see the "unnecessary churn" argument. I think that's the
fundamental problem with lacking tests, it makes the code get more and
more complex as we're reluctant to simplify it out of fear.

>> For what it's worth I've confirmed the bug in wal-e caused the initial
>> problem.
>
> Huh?  Bug in wal-e?  What bug?

WAL-E actually didn't restore a whole 1GB file due to a transient S3
problem, in fact a bunch of them. It's remarkable that Postgres kept
going with that much data missing. But the arithmetic worked out on
the case I checked it on, which was the last one that I just sent the
xlog record for last night. In that case there was precisely one
segment missing and the relation was extended by the number of
segments you would expect if it filled in that missing segment and
then jumped to the end of the relation.


-- 
greg



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: dynamic shared memory and locks
Next
From: Andrew Dunstan
Date:
Subject: issue with gininsert under very high load