On Mon, Sep 29, 2025 at 03:45:27PM +0700, John Naylor wrote:
> On Fri, Sep 26, 2025 at 1:50 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> On Thu, Sep 25, 2025 at 09:16:35PM +0700, John Naylor wrote:
>> > (Separately, now I'm wondering if we can do the same for
>> > vector8_has_le since _mm_min_epu8 and vminvq_u8 both exist, and that
>> > would allow getting rid of )
>>
>> I think so. I doubt there's any performance advantage, but it could be
>> nice for code cleanup. (I'm assuming you meant to say vector8_ssub
>> (renamed to vector8_ussub() in the patch) after "getting rid of.")
>
> Yes right, sorry. And it seems good to do such cleanup first, since it
> doesn't make sense to rename something that is about to be deleted.
Will do. I'll plan on committing the other patch [0] soon.
> Hmm, for this case, we can sidestep the maintainability questions
> entirely by instead using the new interleave functions to build the
> masks:
>
> vector8_interleave_low(vector8_zero(), vector8_broadcast(0x0f))
> vector8_interleave_low(vector8_broadcast(0x0f), vector8_zero())
>
> This generates identical code as v12 on Arm and is not bad on x86.
> What do you think of the attached?
WFM
> While looking around again, it looks like the "msk" variable isn't a
> mask like the implies to me. Not sure of a better name because I'm not
> sure what it represents aside from a temp variable.
Renamed to "tmp".
> +#elif defined(USE_NEON)
> + switch (i)
> + {
> + case 4:
> + return (Vector8) vshrq_n_u32((Vector32) v1, 4);
> + case 8:
> + return (Vector8) vshrq_n_u32((Vector32) v1, 8);
> + default:
> + pg_unreachable();
> + return vector8_broadcast(0);
> + }
>
> This is just a compiler hint, so if the input is not handled I think
> it will return the wrong answer rather than alerting the developer, so
> we probabaly want "Assert(false)" here.
Fixed.
> Other than that, the pack/unpack functions could use some
> documentation about which parameter is low/high.
Added.
[0] https://postgr.es/m/attachment/182185/v3-0001-Optimize-vector8_has_le-on-AArch64.patch
--
nathan