On Mon, Oct 30, 2023 at 12:39:23PM -0400, Tom Lane wrote:
> On the one hand, I agree that we need to keep the complexity from
> getting out of hand. On the other hand, I wonder if this approach
> isn't optimizing for the wrong case. How many machines that PG 17
> will ever be run on in production will lack SSE 4.2 (for Intel)
> or ARMv8 instructions (on the ARM side)?
For the CRC instructions in use today, I wouldn't be surprised if that
number is pretty small, but for newer or optional instructions (like ARM's
PMULL), I don't think we'll be so lucky. Even if we do feel comfortable
assuming the presence of SSE 4.2, etc., we'll likely still need to add
runtime checks for future optimizations.
> It seems like a shame
> to be burdening these instructions with a subroutine call for the
> benefit of long-obsolete hardware versions. Maybe that overhead
> is negligible, but it doesn't sound like you tried to measure it.
When I went to measure this, I noticed that my relatively new x86 machine
with a relatively new version of gcc uses the runtime check. I then
skimmed through a few dozen buildfarm machines and found that, of all x86
and ARM machines that supported the specialized CRC instructions, only one
ARM machine did not use the runtime check. Of course, this is far from a
scientific data point, but it seems to indicate that the runtime check is
the norm.
(I still need to measure it.)
> Anyway, I agree that the cost of a one-time-per-process probe should
> be negligible; it's the per-use cost that I worry about. If you can
> do some measurements proving that that worry is ill-founded, then
> I'm good with test-first.
Will do.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com