Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

From John Naylor
Subject Re: speed up verifying UTF-8
Date
Msg-id CAFBsxsHUgNeytyF6TyoUBgf8whqRxvStbWtok9qcDJzDZ78FLw@mail.gmail.com
Whole thread Raw
In response to Re: speed up verifying UTF-8  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Responses Re: speed up verifying UTF-8
Re: speed up verifying UTF-8
List pgsql-hackers
I've decided I'm not quite comfortable with the additional complexity in the build system introduced by the SIMD portion of the previous patches. It would make more sense if the pure C portion were unchanged, but with the shift-based DFA plus the bitwise ASCII check, we have a portable implementation that's still a substantial improvement over the current validator. In v24, I've included only that much, and the diff is only about 1/3 as many lines. If future improvements to COPY FROM put additional pressure on this path, we can always add SIMD support later.

One thing not in this patch is a possible improvement to pg_utf8_verifychar() that Heikki and I worked on upthread as part of earlier attempts to rewrite pg_utf8_verifystr(). That's worth looking into separately.

On Thu, Aug 26, 2021 at 12:09 PM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>
> >Attached is v23 incorporating the 32-bit transition table, with the necessary comment adjustments
>
> 32bit table is nice.

Thanks for taking a look!

> Would you please replace https://github.com/BobSteagall/utf_utils/blob/master/src/utf_utils.cpp URL with
> https://github.com/BobSteagall/utf_utils/blob/6b7a465265de2f5fa6133d653df0c9bdd73bbcf8/src/utf_utils.cpp
> in the header of src/port/pg_utf8_fallback.c?
>
> It would make the URL more stable in case the file gets renamed.
>
> Vladimir
>

Makes sense, so done that way.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Isaac Morland
Date:
Subject: Re: CREATE ROLE IF NOT EXISTS
Next
From: John Naylor
Date:
Subject: Re: [RFC] building postgres with meson