On Thu, May 3, 2018 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> On Thu, May 3, 2018 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> It strikes me also that, at least for debugging purposes, it's seriously
>>> awful that you can't tell from outside what result this function got.
>
>> I don't think *broken* CPUs are something we need to handle, are they?
>
> I'm not worried so much about broken hardware as about scenarios like
> "Munro got the magic constant wrong and nobody ever noticed", or more
> likely "somebody broke it later and we didn't notice". We absolutely
> do not expect the code path with function-returns-the-wrong-answer to be
> taken, and I think it would be appropriate to complain loudly if it is.
Ok. Here is a patch that compares hw and sw results and calls
elog(ERROR) if they don't match. It also does elog(DEBUG1) with its
result just before returning.
Here's what I see at startup on my ARMv8 machine when I set
log_min_messages = debug1 in my .conf (it's the very first line
emitted):
2018-05-03 05:07:25.904 UTC [19677] DEBUG: using armv8 crc2 hardware = 1
Here's what I see if I hack the _armv8() function to do kill(getpid(), SIGILL):
2018-05-03 05:09:47.012 UTC [21079] DEBUG: using armv8 crc2 hardware = 0
Here's what I see if I hack the _armv8() function to add 1 to its result:
2018-05-03 05:11:07.366 UTC [22218] FATAL: crc32 hardware and
software results disagree
2018-05-03 05:11:07.367 UTC [22218] LOG: database system is shut down
--
Thomas Munro
http://www.enterprisedb.com