Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

From John Naylor
Subject Re: speed up verifying UTF-8
Date
Msg-id CAFBsxsHDXCROQe-UC1nZOdcdaCO90rihiYhBYrLHrf_sLKUY=g@mail.gmail.com
Whole thread Raw
In response to Re: speed up verifying UTF-8  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: speed up verifying UTF-8  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers

On Mon, Jul 26, 2021 at 8:56 AM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> >
> > Does that (and "len >= 32" condition) mean the patch does not improve validation of the shorter strings (the ones less than 32 bytes)?
>
> Right. Also, the 32 byte threshold was just a temporary need for testing 32-byte stride -- testing different thresholds wouldn't hurt.  I'm not terribly concerned about short strings, though, as long as we don't regress.  

I put together the attached quick test to try to rationalize the fast-path threshold. (In case it isn't obvious, it must be at least 16 on all builds, since wchar.c doesn't know which implementation it's calling, and SSE register width sets the lower bound.) I changed the threshold first to 16, and then 100000, which will force using the byte-at-a-time code.

If we have only 16 bytes in the input, it still seems to be faster to use SSE, even though it's called through a function pointer on x86. I didn't test the DFA path, but I don't think the conclusion would be different. I'll include the 16 threshold next time I need to update the patch.

Macbook x86, clang 12:

master + use 16:
 asc16 | asc32 | asc64 | mb16 | mb32 | mb64
-------+-------+-------+------+------+------
   270 |   279 |   282 |  291 |  296 |  304

force byte-at-a-time:
 asc16 | asc32 | asc64 | mb16 | mb32 | mb64
-------+-------+-------+------+------+------
   277 |   292 |   310 |  296 |  317 |  362

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Next
From: Julien Rouhaud
Date:
Subject: Re: pg_upgrade does not upgrade pg_stat_statements properly