On 29/06/2021 14:20, John Naylor wrote:
> I still wasn't quite happy with the churn in the regression tests, so
> for v13 I gave up on using both the existing utf8 table and my new one
> for the "padded input" tests, and instead just copied the NUL byte test
> into the new table. Also added a primary key to make sure the padded
> test won't give weird results if a new entry has a duplicate description.
>
> I came up with "highbit_carry" as a more descriptive variable name than
> "x", but that doesn't matter a whole lot.
>
> It also occurred to me that if we're going to check one 8-byte chunk at
> a time (like v12 does), maybe it's only worth it to load 8 bytes at a
> time. An earlier version did this, but without the recent tweaks. The
> worst-case scenario now might be different from the one with 16-bytes,
> but for now just tested the previous worst case (mixed2).
I tested the new worst case scenario on my laptop:
gcc master:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1311 | 758 | 405 | 583 | 725
gcc v13:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
956 | 472 | 160 | 572 | 939
mixed16 is the same as "mixed2" in the previous rounds, with
'123456789012345ä' as the repeating string, and mixed8 uses '1234567ä',
which I believe is the worst case for patch v13. So v13 is somewhat
slower than master in the worst case.
Hmm, there's one more simple trick we can do: We can have a separate
fast-path version of the loop when there are at least 8 bytes of input
left, skipping all the length checks. With that:
gcc v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
737 | 412 | 94 | 476 | 725
All the above numbers were with gcc 10.2.1. For completeness, with clang
11.0.1-2 I got:
clang master:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
1044 | 724 | 403 | 930 | 603
(1 row)
clang v13:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
596 | 445 | 79 | 417 | 715
(1 row)
clang v14:
chinese | mixed | ascii | mixed16 | mixed8
---------+-------+-------+---------+--------
600 | 337 | 93 | 318 | 511
Attached is patch v14 with that optimization. It needs some cleanup, I
just hacked it up quickly for performance testing.
- Heikki