Re: [POC] verifying UTF-8 using SIMD instructions - Mailing list pgsql-hackers

From John Naylor
Subject Re: [POC] verifying UTF-8 using SIMD instructions
Date
Msg-id CAFBsxsFyDfp2d6=9gvPZSEmCDQyTLeZvkbqQTSvGGT3X+Fa0GQ@mail.gmail.com
Whole thread Raw
In response to Re: [POC] verifying UTF-8 using SIMD instructions  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Mon, Feb 15, 2021 at 9:32 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Mon, Feb 15, 2021 at 9:18 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > I'm guessing that's because the unaligned access in check_ascii() is
> > expensive on this platform.

> Some possible remedies:

> 3) #ifdef out the ascii check for 32-bit platforms.

> 4) Same as the non-UTF8 case -- only check for ascii 8 bytes at a time. I'll probably try this first.

I've attached a couple patches to try on top of v4; maybe they'll help the Arm32 regression. 01 reduces the stride to 8 bytes, and 02 applies on top of v1 to disable the fallback fast path entirely on 32-bit platforms. A bit of a heavy hammer, but it'll confirm (or not) your theory about unaligned loads.

Also, I've included patches to explain more fully how I modeled non-UTF-8 performance while still using the UTF-8 tests. I think it was a useful thing to do, and I have a theory that might predict how a non-UTF8 encoding will perform with the fast path.

03A and 03B are independent of each other and conflict, but both apply on top of v4 (don't need 02). Both replace the v4 fallback with the ascii fastpath + pg_utf8_verifychar() in the loop, similar to utf-8 on master. 03A has a local static copy of pg_utf8_islegal(), and 03B uses the existing global function. (On x86, you can disable SSE4 by passing USE_FALLBACK_UTF8=1 to configure.)

While Clang 10 regressed for me on pure multibyte in a similar test upthread, on Linux gcc 8.4 there isn't a regression at all. IIRC, gcc wasn't as good as Clang when the API changed a few weeks ago, so its regression from v4 is still faster than master. Clang only regressed with my changes because it somehow handled master much better to begin with.

x86-64 Linux gcc 8.4

master

 chinese | mixed | ascii
---------+-------+-------
    1453 |   857 |   428

v4 (fallback verifier written as a single function)

 chinese | mixed | ascii
---------+-------+-------
     815 |   514 |    82

v4 plus addendum 03A -- emulate non-utf-8 using a copy of pg_utf8_is_legal() as a static function

 chinese | mixed | ascii
---------+-------+-------
    1115 |   547 |    87

v4 plus addendum 03B -- emulate non-utf-8 using pg_utf8_is_legal() as a global function

 chinese | mixed | ascii
---------+-------+-------
    1279 |   604 |    82

(I also tried the same on ppc64le Linux, gcc 4.8.5 and while not great, it never got worse than master either on pure multibyte.)

This is supposed to model the performance of a non-utf8 encoding, where we don't have a bespoke function written from scratch. Here's my theory: If an encoding has pg_*_mblen(), a global function, inside pg_*_verifychar(), it seems it won't benefit as much from an ascii fast path as one whose pg_*_verifychar() has no function calls. I'm not sure whether a compiler can inline a global function's body into call sites in the unit where it's defined. (I haven't looked at the assembly.) But recall that you didn't commit 0002 from the earlier encoding change, because it wasn't performing. I looked at that patch again, and while it inlined the pg_utf8_verifychar() call, it still called the global function pg_utf8_islegal().

If the above is anything to go by, on gcc at least, I don't think we need to worry about a regression when adding an ascii fast path to non-utf-8 multibyte encodings.

Regarding SSE, I've added an ascii fast path in my local branch, but it's not going to be as big a difference because 1) the check is more expensive in terms of branches than the C case, and 2) because the general case is so fast already, it's hard to improve upon. I just need to do some testing and cleanup on the whole thing, and that'll be ready to share.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: archive status ".ready" files may be created too early
Next
From: "Bossart, Nathan"
Date:
Subject: Re: documentation fix for SET ROLE