Re: [POC] verifying UTF-8 using SIMD instructions - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [POC] verifying UTF-8 using SIMD instructions
Date
Msg-id CA+hUKGKbH3TSE9LiXqsOyYjvqBo838e=9PM2BR4cuPajYfCvMQ@mail.gmail.com
Whole thread Raw
In response to [POC] verifying UTF-8 using SIMD instructions  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [POC] verifying UTF-8 using SIMD instructions  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Sat, Mar 13, 2021 at 4:37 AM John Naylor
<john.naylor@enterprisedb.com> wrote:
> On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> > I was not thinking about auto-vectorizing the code in
> > pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization
> > inside the individual helper functions that you wrote, such as
> > _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(),
>
> If the PhD holders who came up with this algorithm thought it possible to do it that way, I'm sure they would have.
Inreality, simdjson has different files for SSE4, AVX, AVX512, NEON, and Altivec. We can incorporate any of those as
needed.That's a PG15 project, though, and I'm not volunteering. 

Just for fun/experimentation, here's a quick (and probably too naive)
translation of those helper functions to NEON, on top of the v15
patch.

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: refactoring basebackup.c
Next
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector