Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure - Mailing list pgsql-hackers

From David Christensen
Subject Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure
Date
Msg-id 2732F6F2-B8E0-4DC7-A1CB-F883DE127C5C@endpoint.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> On Mar 9, 2017, at 1:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:
>> Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the
necessarypieces out with the understanding that some people thought the disable code might be useful in its own merit.
Ican add documentation for the various states.  The CHECKSUM_REVALIDATE is the only one I feel is a little
wibbly-wobbly;the other states are required because of the fact that enabling checksums requires distinguishing between
“inprocess of enabling” and “everything is enabled”. 
>
> Ah, sorry, I had missed that patch.
>
>> My design notes for the patch were submitted to the list with little comment; see:
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>>
>> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed
approaches(particularly the system catalog changes + the concept of a checksum cycle).] 
>
> I think the concept of a checksum cycle might be overkill.  It would
> be useful if we ever wanted to change the checksum algorithm, but I
> don't see any particular reason why we need to build infrastructure
> for that.  If we wanted to change the checksum algorithm, I think it
> likely that we'd do that in the course of, say, widening it to 4 bytes
> as part of a bigger change in the page format, and this infrastructure
> would be too narrow to help with that.

I hear what you are saying, however a boolean on pg_class/pg_database to show if the relation in question is
insufficientif we allow arbitrary enabling/disabling while enabling is in progress.  In particular, if we disable
checksumswhile the enabling was in progress and had only a boolean to indicate whether the checksums are complete for a
relationthere will have been a window when new pages in a relation will *not* be checksummed but the system table flag
willindicate that the checksum is up-to-date, which is false.  This would lead to checksum failures when those pages
areencountered.  Similar failures will occur as well when doing a pg_upgrade of an in-progress enabling.  Saying you
can’tdisable/cancel the checksum process while it’s running seems undesirable too, as what happens if you have a system
failuremid-process. 

We could certainly avoid this problem by just saying that we have to start over with the entire cluster and process
everypage from scratch rather than trying to preserve/track state that we know is good, perhaps only dirtying buffers
whichhave a non-matching checksum while we’re in the conversion state, but this seems undesirable to me, plus if we
madeit work in a single session we’d have a long-running snapshot open (presumably) which would wreak havoc while it
processesthe entire database (as someone had suggested by doing it in a single function that just hangs while it’s
running)

I think we still need a way to short-circuit the process/incrementally update and note which tables have been
checksummedand which ones haven’t.  (Perhaps I could make the code which currently checks DataChecksumsEnabled() a
littlesmarter, depending on the relation it’s looking at.) 

As per one of Jim’s suggestions, I’ve been looking at making the data_checkum_state localized per database at postinit
time,but really it probably doesn’t matter to anything but the buffer code whether it’s a global setting, a
per-databasesetting or what. 


> While I'm glad to see work finally going on to allow enabling and
> disabling checksums, I think it's too late to put this in v10.  We
> have a rough rule that large new patches shouldn't be appearing for
> the first time in the last CommitFest, and I think this falls into
> that category.  I also think it would be unwise to commit just the
> bits of the infrastructure that let us disable checksums without
> having time for a thorough review of the whole thing; to me, the
> chances that we'll make decisions that we later regret seem fairly
> high.  I would rather wait for v11 and have a little more time to try
> to get everything right.

Makes sense, but to that end I think we need to have at least some agreement on how this should work ahead of time.
Obviouslyit’s easier to critique existing code, but I don’t want to go out in left field of a particular approach is
goingto be obviously unworkable per some of the more experienced devs here. 
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171






pgsql-hackers by date:

Previous
From: Naytro Naytro
Date:
Subject: Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6
Next
From: Naytro Naytro
Date:
Subject: Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6