Re: Enabling Checksums - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Enabling Checksums
Date
Msg-id 5147C9E7.4090608@2ndQuadrant.com
Whole thread Raw
In response to Re: Enabling Checksums  (Daniel Farina <daniel@heroku.com>)
Responses Re: Enabling Checksums  (Daniel Farina <daniel@heroku.com>)
List pgsql-hackers
On 3/18/13 5:36 PM, Daniel Farina wrote:
> Clarification, because I think this assessment as delivered feeds some
> unnecessary FUD about EBS:
>
> EBS is quite reliable.  Presuming that all noticed corruptions are
> strictly EBS's problem (that's quite a stretch), I'd say the defect
> rate falls somewhere in the range of volume-centuries.

I wasn't trying to flog EBS as any more or less reliable than other 
types of storage.  What I was trying to emphasize, similarly to your 
"quite a stretch" comment, was the uncertainty involved when such 
deployments fail.  Failures happen due to many causes outside of just 
EBS itself.  But people are so far removed from the physical objects 
that fail, it's harder now to point blame the right way when things fail.

A quick example will demonstrate what I mean.  Let's say my server at 
home dies.  There's some terrible log messages, it crashes, and when it 
comes back up it's broken.  Troubleshooting and possibly replacement 
parts follow.  I will normally expect an eventual resolution that 
includes data like "the drive showed X SMART errors" or "I swapped the 
memory with a similar system and the problem followed the RAM".  I'll 
learn something about what failed that I might use as feedback to adjust 
my practices.  But an EC2+EBS failure doesn't let you get to the root 
cause effectively most of the time, and that makes people nervous.

I can already see "how do checksums alone help narrow the blame?" as the 
next question.  I'll post something summarizing how I use them for that 
tomorrow, just out of juice for that tonight.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade segfaults when given an invalid PGSERVICE value
Next
From: Darren Duncan
Date:
Subject: Re: machine-parseable object descriptions