Hi hackers,
This is a follow-up for recent changes that optimized [sub]xip lookups in
XidInMVCCSnapshot() on Intel hardware [0] [1]. I've attached a patch that
uses ARM Advanced SIMD (Neon) intrinsic functions where available to speed
up the search. The approach is nearly identical to the SSE2 version, and
the usual benchmark [2] shows similar improvements.
writers head simd
8 866 836
16 849 833
32 782 822
64 846 833
128 805 821
256 722 739
512 529 674
768 374 608
1024 268 522
I've tested the patch on a recent macOS (M1 Pro) and Amazon Linux
(Graviton2), and I've confirmed that the instructions aren't used on a
Linux/Intel machine. I did add a new configure check to see if the
relevant intrinsics are available, but I didn't add a runtime check like
there is for the CRC instructions since the compilers I used support these
intrinsics by default. (I don't think a runtime check would work very well
with the inline function, anyway.) AFAICT these intrinsics are pretty
standard on aarch64, although IIUC the spec indicates that they are
technically optional. I suspect that a simple check for "aarch64" would be
sufficient, but I haven't investigated the level of compiler support yet.
Thoughts?
[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6ef167
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37a6e5d
[2] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e@2ndquadrant.com
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com