Re: Enabling Checksums - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Enabling Checksums
Date
Msg-id 514664C3.9080404@2ndQuadrant.com
Whole thread Raw
In response to Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Enabling Checksums  (Daniel Farina <daniel@heroku.com>)
Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Enabling Checksums  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 3/17/13 1:41 PM, Simon Riggs wrote:
> So I'm now moving towards commit using a CRC algorithm. I'll put in a
> feature to allow algorithm be selected at initdb time, though that is
> mainly a convenience  to allow us to more easily do further testing on
> speedups and whether there are any platform specific regressions
> there.

That sounds reasonable.  As I just posted, I'm hoping Ants can help make 
a pass over a CRC16 version, since his one on the Fletcher one seemed 
very productive.  If you're spending time looking at this, I know I'd 
prefer to see you poking at the WAL related aspects instead.  There are 
more of us who are capable of crunching CRC code than the list of people 
who have practice at WAL changes like you do.

I see the situation with checksums right now as being similar to the 
commit/postpone situation for Hot Standby in 9.0.  The code is uglier 
and surely buggier than we'd like, but it has been getting beat on 
regularly for over a year now to knock problems out.  There are surely 
more bugs left to find.  The improved testing that comes only from 
something being committed is probably necessary to really advance the 
testing coverage though.  But with adopting the feature being a strict 
opt-in, the bug rate for non-adopters isn't that broad.  All the TLI 
rearrangements is a lot of the patch, but that's pretty mechanical work 
that doesn't seem that risky.

There was one question that kepts coming up in person this week (Simon, 
Jeff, Daniel, Josh Berkus, and myself were all in the same place for a 
few days) that I wanted to address with some thoughts on-list.  Given 
that the current overhead is right on the edge of being acceptable, the 
concern is whether committing this will lock the project into a 
permanent problem that can't be improved later.  I think it's 
manageable, though.  Here's how I interpret the data we have:

-The checksum has to change from Fletcher 16 to CRC-16.  The "hairy" 
parts of the feature don't change very much from that though.  I see 
exactly which checksum is produced is a pretty small detail, from a code 
correctness perspective.  It's not like this will be starting over the 
testing cycle completely.  The performance change should be quantified 
though.

-Some common workloads will show no performance drop, like things that 
fit into shared_buffers and don't write hint bits.

-Some common workloads that write things seem to hit about a 2% drop, 
presumably because they hit one of the slower situations around 10% of 
the time.

-There are a decent number of hard to deal with workloads that have 
shared_buffers <-> OS cache thrashing, and any approach here will 
regularly hit them with around a 20% drop.  There's some hope that this 
will improve later, especially if a CRC is used and later versions can 
pick up the Intel i7 CRC32 hardware acceleration.  The magnitude of this 
overhead doesn't seem too negotiable though.  We've heard enough 
comparisons with other people's implementations now to see that's near 
the best anyone does here.  If the weird slowdowns some people report 
with very large values of shared_buffers is fixed, that will make this 
situation better.  That's on my hit list of things I really want to see 
sorted in the next release.

-The worst of the worst case behavior is Jeff's "SELECTs now write a WAL 
logged hint bit now" test, which can easily exceed a 20% drop.  There 
have been lots of features submitted in the last two releases that try 
to improve hint bit operations.  Some of those didn't show enough of a 
win to be worth the trouble.  It may be the case, though, that in a 
checksummed environment those wins are suddenly big enough to matter. 
If any of those go in later, the worst case for checksums could then 
improve too.  Having to test both ways, with and without checksums, 
complicates the performance testing.  But the project has to start 
adopting a better approach to that in the next year regardless IMHO, and 
I'm scheduling time to help as much as I can with it.  (That's a whole 
other discussion)

-Having COPY FREEZE available now is a useful tool to eliminate a lot of 
the load/expensive hint bit write scenarios I know exist in the real 
world.  I think the docs for checksumming should even highlight that 
synergy.

As long as the feature is off by default, so that people have to turn it 
on to hit the biggest changed code paths, the exposure to potential bugs 
doesn't seem too bad.  New WAL data is no fun, but it's not like this 
hasn't happened before.

For version <9.3+1>, there's a decent sized list of potential 
performance improvements that seem possible.  I don't see any reason to 
believe committing a CRC16 based version of this will lock the 
implementation into a bad form that can't be optimized later.  The 
comparison with Hot Standby again seems apt again here.  There was a 
decent list of rough edges that were hit by early 9.0 adopters only when 
they turned the feature on.  Then many were improved in 9.1. 
Checksumming seems it could follow the same path.  Committed for 9.3, 
improvements expected during <9.3+1> work, generally considered well 
tested by the release of <9.3+1>.

On the testing front, we've seen on-list interest in this feature from 
companies like Heroku and Enova, who both have some resources and 
practice to help testing too.  Heroku can spin up test instances with 
workloads any number of ways.  Enova can make a Londiste standby with 
checksums turned on to hit it with a logical replicated workload, while 
the master stays un-checksummed.

If this goes in, I fully intent to hold both companies to hitting the 
feature with as many workloads as they can help generate during (and 
beyond) beta.  I have my own stress tests I'll keep running too.  If the 
bug rate from the beta adopters is bad and doesn't improve, there's is 
always the uncomfortable possibility of reverting it before the first RC.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Enabling Checksums
Next
From: Craig Ringer
Date:
Subject: Re: Trust intermediate CA for client certificates