Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Recovery inconsistencies, standby much larger than primary
Date
Msg-id CAM-w4HMEs_Aj0_CptbZy--01YG4R0WfP3REeru+aQTSqcNsrOw@mail.gmail.com
Whole thread Raw
In response to Re: Recovery inconsistencies, standby much larger than primary  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Sat, Feb 15, 2014 at 11:45 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I guess the theoretically correct thing would be to make all WAL records
> about truncation and unlinking contain the current size of the relation,
> but especially with deletions and forks that will probably turn out to
> be annoying to do.

Here's another alternative.

In md.c when extending a file to RELSEG_SIZE always check if the next
segment is already there and truncate it if it is to avoid magically
slurping in that data. That maintains the invariant that the first
short segment will mark the end of the relation. If you have a short
or missing segment then you'll ignore all the later segments.

I think to make this work you would have to sync the newly truncated
segment first before extending the current segment though. And this
would cause every relation extension to do an extra filesystem lookup.
Perhaps only doing this in recovery or *with assertions enabled?)
would mitigate that cost.

This makes a mockery of the comment in xlogutils.c that we would
rather not lose data in the case of a lost inode. But I feel like the
data in the later segments was already lost before the earlier segment
was filled up, it hardly helps matters if it can sometimes be unlost
if the earlier data happens to get written to in a particular pattern.


-- 
greg



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: narwhal and PGDLLIMPORT
Next
From: Tom Lane
Date:
Subject: Re: narwhal and PGDLLIMPORT