Home > mailing lists

Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: speed up verifying UTF-8
Date	July 15, 2021 22:00:05
Msg-id	CAFBsxsEzzTR=Zd=HnT2TZcQ8So1AzWbD1xXUvRsos8w-0C_nPg@mail.gmail.com Whole thread Raw
In response to	Re: speed up verifying UTF-8 (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: speed up verifying UTF-8
List	pgsql-hackers

Tree view

I wrote:

> To simplify the constants, I do shift down to uint32, and I didn't bother working around that. v16alpha regressed on worst-case input, so for v16beta I went back to earlier coding for the one-byte ascii check. That helped, but it's still slower than v14.

It occurred to me that I could rewrite the switch test into simple comparisons, like I already had for the 2- and 4-byte lead cases. While at it, I folded the leading byte and continuation tests into a single operation, like this:

/* 3-byte lead with two continuation bytes */
else if ((chunk & 0xF0C0C00000000000) == 0xE080800000000000)

...and also tried using 64-bit constants to avoid shifting. Still didn't quite beat v14, but got pretty close:

> The numbers on Power8 / gcc 4.8 (little endian):
>
> HEAD:
>
> chinese | mixed | ascii | mixed16 | mixed8
> ---------+-------+-------+---------+--------
> 2951 | 1521 | 871 | 1474 | 1508
>
> v14:
>
> chinese | mixed | ascii | mixed16 | mixed8
> ---------+-------+-------+---------+--------
> 885 | 607 | 179 | 774 | 1325

v16gamma:

chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
952 | 632 | 180 | 800 | 1333

A big-endian 64-bit platform just might shave enough cycles to beat v14 this way... or not.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

v16gamma-Rewrite-pg_utf8_verifystr-for-speed.txt

pgsql-hackers by date:

From: Mark Dilger
Date: 15 July 2021, 21:17:32
Subject: Re: data corruption hazard in reorderbuffer.c

From: Tomas Vondra
Date: 15 July 2021, 22:32:07
Subject: Re: data corruption hazard in reorderbuffer.c

Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

Attachment

Previous

Next