Re: [HACKERS] Checksums by default? - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] Checksums by default?
Date
Msg-id CAB7nPqQi7poySncvhMyfNOB=_hDvMQosN8=Hi0nOAQ38fUgaJw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Checksums by default?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: [HACKERS] Checksums by default?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Thu, Jan 26, 2017 at 9:32 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> >>> As it is, there are backup solutions which *do* check the checksum when
>> >>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> >>> but something which really exists and will hopefully keep users from
>> >>> losing data.
>> >>
>> >> Wouldn't that have issues with torn pages?
>> >
>> > Why? What do you foresee here? I would think such backup solutions are
>> > careful enough to ensure correctly the durability of pages so as they
>> > are not partially written.
>>
>> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
>> tool from overlapping with a write(fd, buf, 8192) performed by the
>> backend.
>
> As Michael mentioned, that'd depend on if things are atomic from a
> user's perspective at certain sizes (perhaps 4k, which wouldn't be too
> surprising, but may also be system-dependent), in which case verifying
> that the page is in the WAL would be sufficient.

That would be enough. It should also be rare enough that there would
not be that many pages to track when looking at records from the
backup start position to minimum recovery point. It could be also
simpler, though more time-consuming, to just let a backup recover up
to the minimum recovery point (recovery_target = 'immediate'), and
then run the checksum sanity checks. There are other checks usually
needed on a backup anyway like being sure that index pages are in good
shape even with a correct checksum, etc.

But here I am really high-jacking the thread, so I'll stop..
-- 
Michael



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Checksums by default?
Next
From: Stephen Frost
Date:
Subject: Re: [HACKERS] Checksums by default?