Re: Substituting Checksum Algorithm (was: Enabling Checksums) - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Date
Msg-id 517FF9DA.7060504@2ndQuadrant.com
Whole thread Raw
In response to Re: Substituting Checksum Algorithm (was: Enabling Checksums)  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Substituting Checksum Algorithm (was: Enabling Checksums)
List pgsql-hackers
I re-ran the benchmark that's had me most worried against the committed
code and things look good so far.  I've been keeping quiet because my
tests recently have all agreed with what Ants already described.  This
is more a confirmation summary than new data.

The problem case has been Jeff's test 2 "worst-case overhead for
calculating checksum while reading data" from the OS cache.  I wrapped
that into a test harness and gave results similar to Jeff's at
http://www.postgresql.org/message-id/5133D732.4090801@2ndQuadrant.com
based on the originally proposed Fletcher-16 checksum.

I made some system improvements since then such that the absolute
runtime improved for most of the tests I'm running.  But the percentage
changes didn't seem off enough to bother re-running the Fletcher tests
again.  Details are in attached spreadsheet, to summarize:

-The original Fletcher-16 code slowed this test case down 24 to 32%,
depending on whether you look at the average of 3 runs or the median.

-The initial checksum commit with the truncated WAL CRC was almost an
order of magnitude worse:  146% to 224% slowdown.  The test case that
took ~830ms was taking as much as 2652ms with that method.  I'm still
not sure why the first run of this test is always so much faster than
the second and third.  But since it happens so often I think it's fair
to consider that worst case really important.

-Committed FNV-1a implementation is now slightly better than Fletcher-16
speed wise:  19 to 27% slowdown.

-Slicing by 8 CRC I didn't test because once I'd fully come around to
agree with Ants's position it didn't seem likely to be useful.  I don't
want to lose track of that idea though, it might be the right path for a
future implementation with 32 bit checksums.

Since the >=25% slowdown on this test with Fletcher-16 turned into more
like a 2% drop on more mixed workloads, I'd expect we're back to where
that's again the case with the new FNV-1a.  I plan to step back to
looking at more of those cases, but it will take a few days at least to
start sorting that out.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Remaining beta blockers
Next
From: Josh Berkus
Date:
Subject: Re: Remaining beta blockers