Re: [PATCH] Hex-coding optimizations using SVE on ARM. - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date
Msg-id CANWCAZYXsJ54Qr8cwt+Yt15kku4C4-owu+KXPKyFpweN3L36jQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Hex-coding optimizations using SVE on ARM.  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Wed, Sep 24, 2025 at 2:02 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Sep 22, 2025 at 03:05:44PM -0500, Nathan Bossart wrote:
> > I was able to improve the hex_decode() implementation a bit.
>
> I took a closer look at how hex_decode() performs with smaller inputs.
> There are some small regressions, so I tried fixing them by adding the
> following to the beginning of the function:
>
>     if (likely(tail_idx == 0))
>         return hex_decode_safe_scalar(src, len, dst, escontext);
>
> This helped a little, but it mostly just slowed things down for larger
> inputs on AArch64:

> I didn't do this test for hex_encode(), but I'd expect it to follow a
> similar pattern.  I'm tempted to suggest that these regressions are within
> tolerable levels and to forge on with v10.

My first thought is, I'd hazard a guess that short byteas are much
less common than short strings.

My second thought is, the decode case is not that critical. From the
end-to-end tests above, the speed of the decode case had a relatively
small global effect compared to the encode case (Perhaps because reads
are cheaper than writes).

+ if (unlikely(!hex_decode_simd_helper(srcv, &dstv1)))
+ break;

But if you really want to do something here, sprinkling "(un)likely"'s
here seems like solving the wrong problem (even if they make any
difference), since the early return is optimizing for exceptional
conditions. In other places (cf. the UTF8 string verifier), we
accumulate errors, and only if we have them at the end do we restart
from the beginning with the slow error-checking path that can show the
user the offending input.

> In any case, IMHO this patch is
> approaching committable quality, so I'd be grateful for any feedback.

+vector8_sssub(const Vector8 v1, const Vector8 v2)

It's hard to parse "sss", so maybe we can borrow an Intel-ism and use
"iss" for the signed case?

+/* vector manipulation */
+#ifndef USE_NO_SIMD
+static inline Vector8 vector8_interleave_low(const Vector8 v1, const
Vector8 v2);
+static inline Vector8 vector8_interleave_high(const Vector8 v1, const
Vector8 v2);
+static inline Vector8 vector8_pack_16(const Vector8 v1, const Vector8 v2);
+static inline Vector32 vector32_shift_left_nibble(const Vector32 v1);
+static inline Vector32 vector32_shift_right_nibble(const Vector32 v1);
+static inline Vector32 vector32_shift_right_byte(const Vector32 v1);

Do we need declarations for these? I recall that the existing
declarations are there for functions that are also used internally.

The nibble/byte things are rather specific. Wouldn't it be more
logical to expose the already-generic shift operations and let the
caller say by how much? Or does the compiler refuse because the
intrinsic doesn't get an immediate value? Some are like that, but I'm
not sure about these. If so, that's annoying and I wonder if there's a
workaround.

+vector8_has_ge(const Vector8 v, const uint8 c)
+{
+#ifdef USE_SSE2
+ Vector8 umax = _mm_max_epu8(v, vector8_broadcast(c));
+ Vector8 cmpe = _mm_cmpeq_epi8(umax, v);
+
+ return vector8_is_highbit_set(cmpe);

We take pains to avoid signed comparison on unsigned input for the
"le" case, and I don't see why it's okay here.

Do the regression tests have long enough cases that test exceptional
paths, like invalid bytes and embedded whitespace? If not, we need
some.

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: ALTER DOMAIN ADD NOT NULL NOT VALID
Next
From: shveta malik
Date:
Subject: Re: Clear logical slot's 'synced' flag on promotion of standby