Re: [PATCH] Hex-coding optimizations using SVE on ARM. - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date
Msg-id aNRlYQoxTe8lOIEc@nathan
Whole thread Raw
In response to Re: [PATCH] Hex-coding optimizations using SVE on ARM.  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PATCH] Hex-coding optimizations using SVE on ARM.
List pgsql-hackers
On Wed, Sep 24, 2025 at 10:59:38AM +0700, John Naylor wrote:
> + if (unlikely(!hex_decode_simd_helper(srcv, &dstv1)))
> + break;
> 
> But if you really want to do something here, sprinkling "(un)likely"'s
> here seems like solving the wrong problem (even if they make any
> difference), since the early return is optimizing for exceptional
> conditions. In other places (cf. the UTF8 string verifier), we
> accumulate errors, and only if we have them at the end do we restart
> from the beginning with the slow error-checking path that can show the
> user the offending input.

I switched to an accumulator approach in v11.

> +vector8_sssub(const Vector8 v1, const Vector8 v2)
> 
> It's hard to parse "sss", so maybe we can borrow an Intel-ism and use
> "iss" for the signed case?

Done.

> +/* vector manipulation */
> +#ifndef USE_NO_SIMD
> +static inline Vector8 vector8_interleave_low(const Vector8 v1, const
> Vector8 v2);
> +static inline Vector8 vector8_interleave_high(const Vector8 v1, const
> Vector8 v2);
> +static inline Vector8 vector8_pack_16(const Vector8 v1, const Vector8 v2);
> +static inline Vector32 vector32_shift_left_nibble(const Vector32 v1);
> +static inline Vector32 vector32_shift_right_nibble(const Vector32 v1);
> +static inline Vector32 vector32_shift_right_byte(const Vector32 v1);
> 
> Do we need declarations for these? I recall that the existing
> declarations are there for functions that are also used internally.

Removed.

> The nibble/byte things are rather specific. Wouldn't it be more
> logical to expose the already-generic shift operations and let the
> caller say by how much? Or does the compiler refuse because the
> intrinsic doesn't get an immediate value? Some are like that, but I'm
> not sure about these. If so, that's annoying and I wonder if there's a
> workaround.

Yeah, the compiler refuses unless the value is an integer literal.  I
thought of using a switch statement to cover all the values used in-tree,
but I didn't like that, either.

> +vector8_has_ge(const Vector8 v, const uint8 c)
> +{
> +#ifdef USE_SSE2
> + Vector8 umax = _mm_max_epu8(v, vector8_broadcast(c));
> + Vector8 cmpe = _mm_cmpeq_epi8(umax, v);
> +
> + return vector8_is_highbit_set(cmpe);
> 
> We take pains to avoid signed comparison on unsigned input for the
> "le" case, and I don't see why it's okay here.

_mm_max_epu8() does unsigned comparisons, I think...

> Do the regression tests have long enough cases that test exceptional
> paths, like invalid bytes and embedded whitespace? If not, we need
> some.

Added.

I've also fixed builds on gcc/arm64, as discussed elsewhere [0].  Here are
the current numbers on my laptop:

                arm
    buf  | HEAD  | patch | % diff
  -------+-------+-------+--------
       2 |     4 |     4 |      0
       4 |     6 |     6 |      0
       8 |     8 |     8 |      0
      16 |    11 |    12 |     -9
      32 |    18 |     5 |     72
      64 |    38 |     6 |     84
     256 |   134 |    17 |     87
    1024 |   513 |    63 |     88
    4096 |  2081 |   262 |     87
   16384 |  8524 |  1058 |     88
   65536 | 34731 |  4224 |     88

[0] https://postgr.es/m/aNQtN89VW8z-yo3B%40nathan

-- 
nathan

Attachment

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Incorrect logic in XLogNeedsFlush()
Next
From: Andrew Kim
Date:
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations