Re: Enabling Checksums - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Enabling Checksums
Date
Msg-id CAM-w4HN38U8c=taT10y=d9B_8DXuuYSy+prEe0z6+n0q-tCCuw@mail.gmail.com
Whole thread Raw
In response to Re: Enabling Checksums  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Enabling Checksums  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
> After some examination of the systems involved, we conculded that the
> issue was the FreeBSD drivers for the new storage, which were unstable
> and had custom source patches.  However, without PostgreSQL checksums,
> we couldn't *prove* it wasn't PostgreSQL at fault.  It ended up taking
> weeks of testing, most of which was useless, to prove to them they had a
> driver problem so it could be fixed.  If Postgres had had checksums, we
> could have avoided wasting a couple weeks looking for non-existant
> PostgreSQL bugs.

How would Postgres checksums have proven that?

A checksum failure just means *something* has gone wrong. it could
still be Postgres that's done it. In fact I would hazard that checksum
failures would be the way most Postgres bugs will be found at some
point.

> Also, I'm kinda embarassed that, at this point, InnoDB has checksums and
> we don't.  :-(

As much as it sounds silly I think this is a valid argument. Not just
InnoDB but Oracle and other database and even other storage software.

I think even if the patch doesn't get accepted this go around it'll be
in the next release. Either we'll think of solutions for some of the
performance bottlenecks, we'll iron out the transition so you can turn
it off and on freely, or we'll just realize that people are running
with the patch and life is ok even with these problems.

If i understand the performance issues right the main problem is the
extra round trip to the wal log which can require a sync. Is that
right? That seems like a deal breaker to me. I would think an 0-10%
i/o bandwidth or cpu bandwidth penalty would be acceptable but an
extra rotational latency even just on some transactions would be a
real killer.


-- 
greg



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Duplicate JSON Object Keys
Next
From: Greg Stark
Date:
Subject: Re: Why do we still perform a check for pre-sorted input within qsort variants?