Home > mailing lists

Re: speed up verifying UTF-8 - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: speed up verifying UTF-8
Date	July 16, 2021 23:18:33
Msg-id	CAFBsxsEJbV=C28b4Q4rZyMP=wavvYZfzK4t8msqeHcYWq9tB+A@mail.gmail.com Whole thread Raw
In response to	Re: speed up verifying UTF-8 (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: speed up verifying UTF-8 Re: speed up verifying UTF-8
List	pgsql-hackers

My v16 experimental patches were a bit messy, so I've organized an experimental series that applies cumulatively, to try to trace the effects of various things.

v17-0001 is the same as v14. 0002 is a stripped-down implementation of Amit's chunk idea for multibyte, and it's pretty good on x86. On Power8, not so much. 0003 and 0004 are shot-in-the-dark guesses to improve it on Power8, with some success, but end up making x86 weirdly slow, so I'm afraid that could happen on other platforms as well.

v14 still looks like the safe bet for now. It also has the advantage of using the same function both in and out of the fastpath, which will come in handy when moving it to src/port as the fallback for SSE.

Power8, gcc 4.8:

HEAD:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
2944 | 1523 | 871 | 1473 | 1509

v17-0001:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
888 | 607 | 179 | 777 | 1328

v17-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1017 | 718 | 156 | 1213 | 2138

v17-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1205 | 662 | 180 | 767 | 1256

v17-0004:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1085 | 660 | 224 | 868 | 1369

Macbook x86, clang 12:

HEAD:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
974 | 691 | 370 | 456 | 526

v17-0001:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
674 | 346 | 78 | 309 | 504

v17-0002:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
516 | 324 | 78 | 331 | 544

v17-0003:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
621 | 537 | 323 | 413 | 602

v17-0004:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
576 | 439 | 154 | 557 | 915

--

John Naylor

EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous

From: Peter Geoghegan
Date: 16 July 2021, 23:13:03
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Next

From: John Naylor
Date: 17 July 2021, 00:02:33
Subject: Re: speed up verifying UTF-8