Re: define pg_structiszero(addr, s, r) - Mailing list pgsql-hackers

From David Rowley
Subject Re: define pg_structiszero(addr, s, r)
Date
Msg-id CAApHDvp2jx_=pFbgj-O1_ZmzP9WOZKfwLzZrS_=ZmbsqMQQ59g@mail.gmail.com
Whole thread Raw
In response to Re: define pg_structiszero(addr, s, r)  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Fri, 8 Nov 2024 at 18:34, Michael Paquier <michael@paquier.xyz> wrote:
> I've done a round of comment and term cleanup for the whole patch,
> while on it.

I don't think "intrinsics" is the correct word to use here:

+ * - 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers
+ *   to use SIMD intrinsics if available, up to the last aligned location

and

+ * All comparisons are combined with a single OR operation, making it a
+ * good candidate for SIMD intrinsics, if available.

an intrinsic function is a function built into the compiler that
provides some lower-level functionality. e.g. __builtin_popcount().

I'm slightly worried due to the current rate we're receiving cleanup
suggestions that someone might come along and think they'd be doing us
a favour by submitting a patch to "fixup the inefficient bitwise-ORs
and use boolean-OR". Maybe a comment like the following might prevent
that from happening.

+ * For performance reasons, we manually unroll this loop and purposefully
+ * use bitwise-ORs to combine each comparison.  This prevents boolean
+ * short-circuiting and lets the compiler know that it's safe to access all 8
+ * elements regardless of the result of the other comparisons.  This seems
+ * to be enough to coax a few compilers into using SIMD instructions.

> Btw, gcc seems a bit smarter than clang when it comes to optimizing
> the code depending on the size of the structures.  gcc gives up on
> SIMD if it's sure that the structure on which we are going to use the
> allzero check won't need it at all, and clang keeps it even if it does
> not need it.  That was interesting to see, while going through the
> review..

Can you share your test case for this?  I tried with [1] and the
latest gcc does not seem to be smart enough to figure this out.  I
tried adding some additional len checks that the compiler can use as a
cue and won't need to emit code for the checks providing the function
does get inlined. That was enough to get the compiler to not emit the
loops when they'll not be used. See the -DCHECK_LEN flag I'm passing
in the 2nd compiler window. I just don't know if putting something
like that into the code is a good idea as if the function wasn't
inlined for some reason, the extra len checks would have to be
compiled into the function.

David

[1] https://godbolt.org/z/xa81ro8GK



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Commit Timestamp and LSN Inversion issue
Next
From: Amit Kapila
Date:
Subject: Re: Fix small typo, use InvalidRelFileNumber instead of InvalidOid