use ARM intrinsics in pg_lfind32() where available - Mailing list pgsql-hackers

From Nathan Bossart
Subject use ARM intrinsics in pg_lfind32() where available
Date
Msg-id 20220819200829.GA395728@nathanxps13
Whole thread Raw
Responses Re: use ARM intrinsics in pg_lfind32() where available
List pgsql-hackers
Hi hackers,

This is a follow-up for recent changes that optimized [sub]xip lookups in
XidInMVCCSnapshot() on Intel hardware [0] [1].  I've attached a patch that
uses ARM Advanced SIMD (Neon) intrinsic functions where available to speed
up the search.  The approach is nearly identical to the SSE2 version, and
the usual benchmark [2] shows similar improvements.

  writers  head  simd
  8        866   836
  16       849   833
  32       782   822
  64       846   833
  128      805   821
  256      722   739
  512      529   674
  768      374   608
  1024     268   522

I've tested the patch on a recent macOS (M1 Pro) and Amazon Linux
(Graviton2), and I've confirmed that the instructions aren't used on a
Linux/Intel machine.  I did add a new configure check to see if the
relevant intrinsics are available, but I didn't add a runtime check like
there is for the CRC instructions since the compilers I used support these
intrinsics by default.  (I don't think a runtime check would work very well
with the inline function, anyway.)  AFAICT these intrinsics are pretty
standard on aarch64, although IIUC the spec indicates that they are
technically optional.  I suspect that a simple check for "aarch64" would be
sufficient, but I haven't investigated the level of compiler support yet.

Thoughts?

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6ef167
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37a6e5d
[2] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e@2ndquadrant.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Use array as object (src/fe_utils/parallel_slot.c)
Next
From: Nathan Bossart
Date:
Subject: Re: [PATCH] Optimize json_lex_string by batching character copying