I wrote:
> To simplify the constants, I do shift down to uint32, and I didn't bother working around that. v16alpha regressed on worst-case input, so for v16beta I went back to earlier coding for the one-byte ascii check. That helped, but it's still slower than v14.
It occurred to me that I could rewrite the switch test into simple comparisons, like I already had for the 2- and 4-byte lead cases. While at it, I folded the leading byte and continuation tests into a single operation, like this:
/* 3-byte lead with two continuation bytes */
else if ((chunk & 0xF0C0C00000000000) == 0xE080800000000000)
...and also tried using 64-bit constants to avoid shifting. Still didn't quite beat v14, but got pretty close:
> The numbers on Power8 / gcc 4.8 (little endian):
>
> HEAD:
>
> chinese | mixed | ascii | mixed16 | mixed8
> ---------+-------+-------+---------+--------
> 2951 | 1521 | 871 | 1474 | 1508
>
> v14:
>
> chinese | mixed | ascii | mixed16 | mixed8
> ---------+-------+-------+---------+--------
> 885 | 607 | 179 | 774 | 1325
v16gamma:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
952 | 632 | 180 | 800 | 1333
A big-endian 64-bit platform just might shave enough cycles to beat v14 this way... or not.
--
John Naylor
EDB:
http://www.enterprisedb.com