always use runtime checks for CRC-32C instructions - Mailing list pgsql-hackers

From Nathan Bossart
Subject always use runtime checks for CRC-32C instructions
Date
Msg-id 20231030161706.GA3011@nathanxps13
Whole thread Raw
Responses Re: always use runtime checks for CRC-32C instructions
List pgsql-hackers
This is an offshoot of the "CRC32C Parallel Computation Optimization on
ARM" thread [0].  I intend for this to be a prerequisite patch set.

Presently, for the SSE 4.2 and ARMv8 CRC instructions used in the CRC32C
code for WAL records, etc., we first check if the intrinsics are available
with the default compiler flags.  If so, we only bother compiling the
implementation that uses those intrinsics.  If not, we also check whether
the intrinsics are available with some extra CFLAGS, and if they are, we
compile both the implementation that uses the intrinsics as well as a
fallback implementation that doesn't require any special instructions.
Then, at runtime, we check what's available in the hardware and choose the
appropriate CRC32C implementation.

The aforementioned other thread [0] aims to further optimize this code by
using another instruction that requires additional configure and/or runtime
checks.  $SUBJECT has been in the back of my mind for a while, but given
proposals to add further complexity to this code, I figured it might be a
good time to propose this simplification.  Specifically, I think we
shouldn't worry about trying to compile only the special instrinics
versions, and instead always try to build both and choose the appropriate
one at runtime.

AFAICT the trade-offs aren't too bad.  With some simple testing, I see that
the runtime check occurs once at startup, so I don't anticipate any
noticeable performance impact.  I suppose each process might need to do the
check in EXEC_BACKEND builds, but even so, I suspect the difference is
negligible.

I also see that the SSE 4.2 runtime check requires the CPUID instruction,
so we wouldn't use the instrinsics for hardware that supports SSE 4.2 but
not CPUID.  However, I'm not sure such hardware even exists.  Wikipedia
says that CPUID was introduced in 1993 [1], and meson.build appears to omit
the CPUID check when determining which CRC32C implementation to use.
Furthermore, meson.build alludes to problems with some of the CPUID-related
checks:

    # XXX: The configure.ac check for __cpuid() is broken, we don't copy that
    # here. To prevent problems due to two detection methods working, stop
    # checking after one.

Are there any other reasons that we should try to avoid the runtime check
when possible?

I've attached two patches.  0001 adds a debug message to the SSE 4.2
runtime check that matches the one already present for the ARMv8 check.
This message just notes whether the runtime check found that the special
CRC instructions are available.  0002 is a first attempt at $SUBJECT.  I've
tested it on both x86 and ARM, and it seems to work as intended.  You'll
notice that I'm still checking for the intrinsics with the default compiler
flags first.  I didn't see any strong reason to change this, and doing so
allows us to avoid sending extra CFLAGS when possible.

Thoughts?

[0] https://postgr.es/m/DB9PR08MB6991329A73923BF8ED4B3422F5DBA%40DB9PR08MB6991.eurprd08.prod.outlook.com
[1] https://en.wikipedia.org/wiki/CPUID

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Add semi-join pushdown to postgres_fdw
Next
From: Matthias van de Meent
Date:
Subject: Re: Improving btree performance through specializing by key shape, take 2