I've decided I'm not quite comfortable with the additional complexity in the build system introduced by the SIMD portion of the previous patches. It would make more sense if the pure C portion were unchanged, but with the shift-based DFA plus the bitwise ASCII check, we have a portable implementation that's still a substantial improvement over the current validator. In v24, I've included only that much, and the diff is only about 1/3 as many lines. If future improvements to COPY FROM put additional pressure on this path, we can always add SIMD support later.
One thing not in this patch is a possible improvement to pg_utf8_verifychar() that Heikki and I worked on upthread as part of earlier attempts to rewrite pg_utf8_verifystr(). That's worth looking into separately.
On Thu, Aug 26, 2021 at 12:09 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:
>
> >Attached is v23 incorporating the 32-bit transition table, with the necessary comment adjustments
>
> 32bit table is nice.
Thanks for taking a look!
> Would you please replace
https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL with
>
https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp> in the header of src/port/pg_utf8_fallback.c?
>
> It would make the URL more stable in case the file gets renamed.
>
> Vladimir
>
Makes sense, so done that way.
--
John Naylor
EDB:
http://www.enterprisedb.com