Re: define pg_structiszero(addr, s, r) - Mailing list pgsql-hackers

From David Rowley
Subject Re: define pg_structiszero(addr, s, r)
Date
Msg-id CAApHDvq7P-JgFhgtxUPqhavG-qSDVUhyWaEX9M8_MNorFEijZA@mail.gmail.com
Whole thread Raw
In response to Re: define pg_structiszero(addr, s, r)  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Wed, 6 Nov 2024 at 04:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Another option could be to use SIMD instructions to check multiple bytes
> is zero in a single operation. Maybe just an idea to keep in mind and experiment
> if we feel the need later on.

Could do. I just wrote it that way to give the compiler flexibility to
do SIMD implicitly. That seemed easier than messing around with SIMD
intrinsics. I guess the compiler won't use SIMD with the single
size_t-at-a-time version as it can't be certain it's ok to access the
memory beyond the first zero word. Because I wrote the "if" condition
using bitwise-OR, there's no boolean short-circuiting, so the compiler
sees it must be safe to access all the memory for the loop iteration.

If I use -march=native or -march=znver2 on my Zen2 machine, gcc does
use SIMD operators.  Clang uses some 128-bit registers without
specifying -march:

drowley@amd3990x:~$ gcc -O2 allzeros.c -march=native -o allzeros &&
for i in {1..3}; do ./allzeros; done
char: done in 1940539 nanoseconds
size_t: done in 261731 nanoseconds (7.41425 times faster than char)
size_t * 4: done in 130415 nanoseconds (14.8797 times faster than char)
size_t * 8: done in 70031 nanoseconds (27.7097 times faster than char)
char: done in 3030132 nanoseconds
size_t: done in 477044 nanoseconds (6.35189 times faster than char)
size_t * 4: done in 123551 nanoseconds (24.5254 times faster than char)
size_t * 8: done in 68549 nanoseconds (44.2039 times faster than char)
char: done in 3214037 nanoseconds
size_t: done in 256901 nanoseconds (12.5108 times faster than char)
size_t * 4: done in 126017 nanoseconds (25.5048 times faster than char)
size_t * 8: done in 73167 nanoseconds (43.9274 times faster than char)

David



pgsql-hackers by date:

Previous
From: Nikolay Samokhvalov
Date:
Subject: Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
Next
From: "David G. Johnston"
Date:
Subject: Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE