Home > mailing lists

Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

From	Amit Khandekar
Subject	Re: speed up verifying UTF-8
Date	July 19, 2021 08:23:22
Msg-id	CAJ3gD9c=dhu3D2tYkPyZ-vEwt5RqUUgGJWfcQ7jFrVOiQW3SqQ@mail.gmail.com Whole thread Raw
In response to	Re: speed up verifying UTF-8 (John Naylor <john.naylor@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Sat, 17 Jul 2021 at 04:48, John Naylor <john.naylor@enterprisedb.com> wrote:
> v17-0001 is the same as v14. 0002 is a stripped-down implementation of Amit's
> chunk idea for multibyte, and it's pretty good on x86. On Power8, not so
> much. 0003 and 0004 are shot-in-the-dark guesses to improve it on Power8,
> with some success, but end up making x86 weirdly slow, so I'm afraid that
> could happen on other platforms as well.

Thanks for trying the chunk approach. I tested your v17 versions on
Arm64. For the chinese characters, v17-0002 gave some improvement over
v14. But for all the other character sets, there was around 10%
degradation w.r.t. v14. I thought maybe the hhton64 call and memcpy()
for each mb character might be the culprit, so I tried iterating over
all the characters in the chunk within the same pg_utf8_verify_one()
function by left-shifting the bits. But that worsened the figures. So
I gave up that idea.

Here are the numbers on Arm64 :

HEAD:
 chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
    1781 |  1095 |   628 |     944 |   1151

v14:
 chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
     852 |   484 |   144 |     584 |    971

v17-0001+2:
 chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
     731 |   520 |   152 |     645 |   1118

Haven't looked at your v18 patch set yet.

pgsql-hackers by date:

From: Amit Kapila
Date: 19 July 2021, 08:22:35
Subject: Re: Skipping logical replication transactions on subscriber side

From: Peter Smith
Date: 19 July 2021, 08:24:39
Subject: Re: logical replication empty transactions

Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

Previous

Next