Re: use ARM intrinsics in pg_lfind32() where available - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: use ARM intrinsics in pg_lfind32() where available
Date
Msg-id 20220827221234.GA15951@nathanxps13
Whole thread Raw
In response to Re: use ARM intrinsics in pg_lfind32() where available  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: use ARM intrinsics in pg_lfind32() where available
List pgsql-hackers
Thanks for taking a look.

On Sat, Aug 27, 2022 at 01:59:06PM +0700, John Naylor wrote:
> I don't forsee any use of emulating vector registers with uint64 if
> they only hold two ints. I wonder if it'd be better if all vector32
> functions were guarded with #ifndef NO_USE_SIMD. (I wonder if
> declarations without definitions cause warnings...)

Yeah.  I was a bit worried about the readability of this file with so many
#ifndefs, but after trying it out, I suppose it doesn't look _too_ bad.

> + * NB: This function assumes that each lane in the given vector either has all
> + * bits set or all bits zeroed, as it is mainly intended for use with
> + * operations that produce such vectors (e.g., vector32_eq()).  If this
> + * assumption is not true, this function's behavior is undefined.
> + */
> 
> Hmm?

Yup.  The problem is that AFAICT there's no equivalent to
_mm_movemask_epi8() on aarch64, so you end up with something like

    vmaxvq_u8(vandq_u8(v, vector8_broadcast(0x80))) != 0

But for pg_lfind32(), we really just want to know if any lane is set, which
only requires a call to vmaxvq_u32().  I haven't had a chance to look too
closely, but my guess is that this ultimately results in an extra AND
operation in the aarch64 path, so maybe it doesn't impact performance too
much.  The other option would be to open-code the intrinsic function calls
into pg_lfind.h.  I'm trying to avoid the latter, but maybe it's the right
thing to do for now...  What do you think?

> -#elif defined(USE_SSE2)
> +#elif defined(USE_SSE2) || defined(USE_NEON)
> 
> I think we can just say #else.

Yes.

> -#if defined(USE_SSE2)
> - __m128i sub;
> +#ifndef USE_NO_SIMD
> + Vector8 sub;
> 
> +#elif defined(USE_NEON)
> +
> + /* use the same approach as the USE_SSE2 block above */
> + sub = vqsubq_u8(v, vector8_broadcast(c));
> + result = vector8_has_zero(sub);
> 
> I think we should invent a helper that does saturating subtraction and
> call that, inlining the sub var so we don't need to mess with it
> further.

Good idea, will do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: use ARM intrinsics in pg_lfind32() where available
Next
From: Nathan Bossart
Date:
Subject: Re: use ARM intrinsics in pg_lfind32() where available