Thread: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

[HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

From
David Christensen
Date:
Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster.  This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.

We do a few things:

- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
  the potentially-required states for this feature (as well as enabling).

- Add pg_control support for parsing/displaying the new setting.

- Distinguish between the possible checksum states and be specific about whether we care about
  checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
  functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().

- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
  cluster.

I have *not* changed the default in initdb to enable checksums, but this would be trivial.





--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

From
Magnus Hagander
Date:
On Thu, Feb 16, 2017 at 9:58 PM, David Christensen <david@endpoint.com> wrote:
Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster.  This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.

We do a few things:

- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
  the potentially-required states for this feature (as well as enabling).

- Add pg_control support for parsing/displaying the new setting.

- Distinguish between the possible checksum states and be specific about whether we care about
  checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
  functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().

- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
  cluster.

I have *not* changed the default in initdb to enable checksums, but this would be trivial.


Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support. Otherwise you turn it off on the master, but it remains on on a replica which will cause failures once datablocks without checksum starts replicating.

--

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure

From
David Christensen
Date:
> On Feb 17, 2017, at 10:31 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
> Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support.
Otherwiseyou turn it off on the master, but it remains on on a replica which will cause failures once datablocks
withoutchecksum starts replicating. 

Good point; I’ll send a revision soon.
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171






Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure

From
David Christensen
Date:
On Feb 17, 2017, at 10:31 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
> On Thu, Feb 16, 2017 at 9:58 PM, David Christensen <david@endpoint.com> wrote:
> Extracted from a larger patch, this patch provides the basic infrastructure for turning data
> checksums off in a cluster.  This also sets up the necessary pg_control fields to support the
> necessary multiple states for handling the transitions.
>
> We do a few things:
>
> - Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
>   the potentially-required states for this feature (as well as enabling).
>
> - Add pg_control support for parsing/displaying the new setting.
>
> - Distinguish between the possible checksum states and be specific about whether we care about
>   checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
>   functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().
>
> - Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
>   cluster.
>
> I have *not* changed the default in initdb to enable checksums, but this would be trivial.
>
>
> Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support.
Otherwiseyou turn it off on the master, but it remains on on a replica which will cause failures once datablocks
withoutchecksum starts replicating. 

Enclosed is a version which supports WAL logging and proper application of the checksum state change.  I have verified
itworks with a replica as far as applying the updated data_checksum_state, though I had to manually call
pg_switch_wal()on the master to get it to show up on the replica.  (I don’t know if this is a flaw in my patch or just
anatural consequence of testing on a low-volume local cluster.) 

--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

From
Robert Haas
Date:
On Fri, Feb 17, 2017 at 2:28 AM, David Christensen <david@endpoint.com> wrote:
> - Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
>   the potentially-required states for this feature (as well as enabling).

Color me skeptical.  I don't know what CHECKSUMS_ENABLING,
CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
represent -- and there's no comments in the patch explaining it -- but
if we haven't yet written the code to enable checksums, how do we know
for sure which states it will require?

If we're going to accept a patch to disable checksums without also
having the capability to enable checksums, I think we should leave out
the speculative elements about what might be needed on the "enable"
side and just implement the minimal "disable" side.

However, FWIW, I don't accept that being able to disable checksums
online is a sufficient advance to justify enabling checksums by
default.  Tomas had some good points on another thread about what
might be needed to really make that a good choice, and I'm still
skeptical about whether checksums catch any meaningful number of
errors that wouldn't be caught otherwise, and about the degree to
which any complaints it issues are actionable.  I'm not really against
this patch on its own merits, but I think it's a small advance in an
area that needs a lot of work.  I think it would be a lot more useful
if we had a way to *enable* checksums online.  Then people who find
out that checksums exist and want them have an easier way of getting
them, and anyone who uses the functionality in this patch and then
regrets it has a way to get back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure

From
David Christensen
Date:
> On Feb 19, 2017, at 5:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Feb 17, 2017 at 2:28 AM, David Christensen <david@endpoint.com> wrote:
>> - Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
>>  the potentially-required states for this feature (as well as enabling).
>
> Color me skeptical.  I don't know what CHECKSUMS_ENABLING,
> CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
> represent -- and there's no comments in the patch explaining it -- but
> if we haven't yet written the code to enable checksums, how do we know
> for sure which states it will require?
>
> If we're going to accept a patch to disable checksums without also
> having the capability to enable checksums, I think we should leave out
> the speculative elements about what might be needed on the "enable"
> side and just implement the minimal "disable" side.
>
> However, FWIW, I don't accept that being able to disable checksums
> online is a sufficient advance to justify enabling checksums by
> default.  Tomas had some good points on another thread about what
> might be needed to really make that a good choice, and I'm still
> skeptical about whether checksums catch any meaningful number of
> errors that wouldn't be caught otherwise, and about the degree to
> which any complaints it issues are actionable.  I'm not really against
> this patch on its own merits, but I think it's a small advance in an
> area that needs a lot of work.  I think it would be a lot more useful
> if we had a way to *enable* checksums online.  Then people who find
> out that checksums exist and want them have an easier way of getting
> them, and anyone who uses the functionality in this patch and then
> regrets it has a way to get back.


Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary
piecesout with the understanding that some people thought the disable code might be useful in its own merit.  I can add
documentationfor the various states.  The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the
otherstates are required because of the fact that enabling checksums requires distinguishing between “in process of
enabling”and “everything is enabled”. 

My design notes for the patch were submitted to the list with little comment; see:
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed
approaches(particularly the system catalog changes + the concept of a checksum cycle).] 

Best,

David
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171






On 2/19/17 11:02 AM, David Christensen wrote:
> My design notes for the patch were submitted to the list with little comment; see:
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>
> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed
approaches(particularly the system catalog changes + the concept of a checksum cycle).]
 

A couple notes:

- AFAIK unlogged tables get checksummed today if checksums are enabled; 
the same should hold true if someone enables checksums on the whole cluster.

- Shared relations should be handled as well; you don't mention them.

- If an entire cluster is going to be considered as checksummed, then 
even databases that don't allow connections would need to get enabled.

I like the idea of revalidation, but I'd suggest leaving that off of the 
first pass.

It might be easier on a first pass to look at supporting per-database 
checksums (in this case, essentially treating shared catalogs as their 
own database). All normal backends do per-database stuff (such as 
setting current_database) during startup anyway. That doesn't really 
help for things like recovery and replication though. :/ And there's 
still the question of SLRUs (or are those not checksum'd today??).

BTW, it occurs to me that this is related to the problem we have with 
trying to make changes that break page binary compatibility. If we had a 
method for handling that it would probably be useful for enabling 
checksums as well. You'd essentially treat an un-checksum'd page as if 
it was an "old page version". The biggest problem there is dealing with 
the potential that the new page needs to be larger than the old one was, 
but maybe there's some useful progress to be had in this area before 
tackling the "page too small" problem.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure

From
David Christensen
Date:
> On Feb 19, 2017, at 8:14 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
>
> On 2/19/17 11:02 AM, David Christensen wrote:
>> My design notes for the patch were submitted to the list with little comment; see:
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>>
>> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed
approaches(particularly the system catalog changes + the concept of a checksum cycle).] 
>
> A couple notes:
>
> - AFAIK unlogged tables get checksummed today if checksums are enabled; the same should hold true if someone enables
checksumson the whole cluster. 

Agreed; AFAIK this should already be working if it’s using the PageIsVerified() API, since I just effectively modified
thelogic there, depending on state. 

> - Shared relations should be handled as well; you don't mention them.

I agree that they should be handled as well; I thought I had mentioned it later in the design doc, though TBH I’m not
sureif there is more involved than just visiting the global relations in pg_class.  In addition we need to visit all
forksof each relation.  I’m not certain if loading those into shared_buffers would be sufficient; I think so, but I’d
beglad to be informed otherwise.  (As long as they’re already using the PageIsVerified() API call they get this logic
forfree. 

> - If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections
wouldneed to get enabled. 

Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding,
unlessthere’s a way to connect to those databases internally that bypasses that check.  Open to ideas, though for a
firstpass seems like the “datallowconn” approach is the least amount of work. 

> I like the idea of revalidation, but I'd suggest leaving that off of the first pass.

Yeah, agreed.

> It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating
sharedcatalogs as their own database). All normal backends do per-database stuff (such as setting current_database)
duringstartup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still
thequestion of SLRUs (or are those not checksum'd today??). 

So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the
specificdatabase it treats it as in enforcing state, even if the rest of the cluster hasn’t completed?  Hmm, might
thinkon that a bit, but it seems pretty straightforward. 

What issues do you see affecting replication and recovery specifically (other than the entire cluster not being
complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change
things.

> BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary
compatibility.If we had a method for handling that it would probably be useful for enabling checksums as well. You'd
essentiallytreat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with
thepotential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be
hadin this area before tackling the "page too small" problem. 

I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future
infrastructure. That being said, what I could see being a general case is the piece which basically walks all pages in
thecluster; as long as the page checksum/format validation happens at Page read time, we could do page version
checks/conversionsat the same time, and the only special code we’d need is to keep track of which files still need to
bevisited and how to minimize the impact on the cluster via some sort of throttling/leveling.  (It’s also similar to
vacuumin that way, however we have been going out of our way to make vacuum smart enough to *avoid* visiting every
page,so I think it is a different enough use case that we can’t tie the two systems together, nor do I feel like taking
thatproject on.) 

We could call the checksum_cycle something else (page_walk_cycle?  bikeshed time!) and basically have the
infrastructureto initiate online/gradual conversion/processing of all pages for free. 

Thoughts?

David
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171






On 2/20/17 11:22 AM, David Christensen wrote:
>> - If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections
wouldneed to get enabled.
 
> Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding,
unlessthere’s a way to connect to those databases internally that bypasses that check.  Open to ideas, though for a
firstpass seems like the “datallowconn” approach is the least amount of work.
 

The problem with ignoring datallowconn is any database where that's 
false is fair game for CREATE DATABASE. I think just enforcing that 
everything's connectable is good enough for now.

>> I like the idea of revalidation, but I'd suggest leaving that off of the first pass.
> Yeah, agreed.
>
>> It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating
sharedcatalogs as their own database). All normal backends do per-database stuff (such as setting current_database)
duringstartup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still
thequestion of SLRUs (or are those not checksum'd today??).
 
> So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the
specificdatabase it treats it as in enforcing state, even if the rest of the cluster hasn’t completed?  Hmm, might
thinkon that a bit, but it seems pretty straightforward.
 

Something like that, yeah.

> What issues do you see affecting replication and recovery specifically (other than the entire cluster not being
complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change
things.

I'm specifically worried about the entire cluster not being complete. 
That makes it harder for replicas to know what blocks they can and can't 
verify the checksum on.

That *might* still be simpler than trying to handle converting the 
entire cluster in one shot. If it's not simpler I certainly wouldn't do 
it right now.

>> BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary
compatibility.If we had a method for handling that it would probably be useful for enabling checksums as well. You'd
essentiallytreat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with
thepotential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be
hadin this area before tackling the "page too small" problem.
 
> I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future
infrastructure.

Yeah, I was just mentioning it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:
> Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the
necessarypieces out with the understanding that some people thought the disable code might be useful in its own merit.
Ican add documentation for the various states.  The CHECKSUM_REVALIDATE is the only one I feel is a little
wibbly-wobbly;the other states are required because of the fact that enabling checksums requires distinguishing between
“inprocess of enabling” and “everything is enabled”. 

Ah, sorry, I had missed that patch.

> My design notes for the patch were submitted to the list with little comment; see:
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>
> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed
approaches(particularly the system catalog changes + the concept of a checksum cycle).] 

I think the concept of a checksum cycle might be overkill.  It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that.  If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.

While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10.  We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category.  I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high.  I would rather wait for v11 and have a little more time to try
to get everything right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supportinginfrastructure

From
David Christensen
Date:
> On Mar 9, 2017, at 1:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:
>> Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the
necessarypieces out with the understanding that some people thought the disable code might be useful in its own merit.
Ican add documentation for the various states.  The CHECKSUM_REVALIDATE is the only one I feel is a little
wibbly-wobbly;the other states are required because of the fact that enabling checksums requires distinguishing between
“inprocess of enabling” and “everything is enabled”. 
>
> Ah, sorry, I had missed that patch.
>
>> My design notes for the patch were submitted to the list with little comment; see:
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>>
>> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed
approaches(particularly the system catalog changes + the concept of a checksum cycle).] 
>
> I think the concept of a checksum cycle might be overkill.  It would
> be useful if we ever wanted to change the checksum algorithm, but I
> don't see any particular reason why we need to build infrastructure
> for that.  If we wanted to change the checksum algorithm, I think it
> likely that we'd do that in the course of, say, widening it to 4 bytes
> as part of a bigger change in the page format, and this infrastructure
> would be too narrow to help with that.

I hear what you are saying, however a boolean on pg_class/pg_database to show if the relation in question is
insufficientif we allow arbitrary enabling/disabling while enabling is in progress.  In particular, if we disable
checksumswhile the enabling was in progress and had only a boolean to indicate whether the checksums are complete for a
relationthere will have been a window when new pages in a relation will *not* be checksummed but the system table flag
willindicate that the checksum is up-to-date, which is false.  This would lead to checksum failures when those pages
areencountered.  Similar failures will occur as well when doing a pg_upgrade of an in-progress enabling.  Saying you
can’tdisable/cancel the checksum process while it’s running seems undesirable too, as what happens if you have a system
failuremid-process. 

We could certainly avoid this problem by just saying that we have to start over with the entire cluster and process
everypage from scratch rather than trying to preserve/track state that we know is good, perhaps only dirtying buffers
whichhave a non-matching checksum while we’re in the conversion state, but this seems undesirable to me, plus if we
madeit work in a single session we’d have a long-running snapshot open (presumably) which would wreak havoc while it
processesthe entire database (as someone had suggested by doing it in a single function that just hangs while it’s
running)

I think we still need a way to short-circuit the process/incrementally update and note which tables have been
checksummedand which ones haven’t.  (Perhaps I could make the code which currently checks DataChecksumsEnabled() a
littlesmarter, depending on the relation it’s looking at.) 

As per one of Jim’s suggestions, I’ve been looking at making the data_checkum_state localized per database at postinit
time,but really it probably doesn’t matter to anything but the buffer code whether it’s a global setting, a
per-databasesetting or what. 


> While I'm glad to see work finally going on to allow enabling and
> disabling checksums, I think it's too late to put this in v10.  We
> have a rough rule that large new patches shouldn't be appearing for
> the first time in the last CommitFest, and I think this falls into
> that category.  I also think it would be unwise to commit just the
> bits of the infrastructure that let us disable checksums without
> having time for a thorough review of the whole thing; to me, the
> chances that we'll make decisions that we later regret seem fairly
> high.  I would rather wait for v11 and have a little more time to try
> to get everything right.

Makes sense, but to that end I think we need to have at least some agreement on how this should work ahead of time.
Obviouslyit’s easier to critique existing code, but I don’t want to go out in left field of a particular approach is
goingto be obviously unworkable per some of the more experienced devs here. 
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171