Re: backup manifests - Mailing list pgsql-hackers

From Andres Freund
Subject Re: backup manifests
Date
Msg-id 20200402182346.6iffoadxu2hsbi2s@alap3.anarazel.de
Whole thread Raw
In response to Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-04-02 14:16:27 -0400, Robert Haas wrote:
> On Thu, Apr 2, 2020 at 1:23 PM Andres Freund <andres@anarazel.de> wrote:
> > I suspect its possible to control the timing by preventing the
> > checkpoint at the end of recovery from completing within a relevant
> > timeframe. I think configuring a large checkpoint_timeout and using a
> > non-fast base backup ought to do the trick. The state can be advanced by
> > separately triggering an immediate checkpoint? Or by changing the
> > checkpoint_timeout?
> 
> That might make the window fairly wide on normal systems, but I'm not
> sure about Raspberry Pi BF members or things running
> CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.

You can set checkpoint_timeout to be a day. If that's not enough, well,
then I think we have other problems.


> > FWIW, the only check I'd really like to see in this release is the
> > crosscheck with the files length and the actually read data (to be able
> > to disagnose FS issues).
> 
> Not sure I understand this comment. Isn't that a subset of what the
> patch already does? Are you asking for something to be changed?

Yes, I am asking for something to be changed: I'd like the code that
read()s the file when computing the checksum to add up how many bytes
were read, and compare that to the size in the manifest. And if there's
a difference report an error about that, instead of a checksum failure.

I've repeatedly seen filesystem issues lead to to earlier EOFs when
read()ing than what stat() returns. It'll be pretty annoying to have to
debug a general "checksum failure", rather than just knowing that
reading stopped after 100MB of 1GB.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: backup manifests
Next
From: Peter Geoghegan
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.