On 14/03/2025 05:43, Jeff Davis wrote:
> On Wed, 2025-03-12 at 23:39 +0300, Alexander Borisov wrote:
>> v5 attached.
>
> Attached v6j.
>
> * marked arrays as "static const" rather than just "static"
> * ran pgindent
> * changed data types where appropriate (uint32->pg_wchar)
> * modified perl code so that it produces code that's already pgindented
> * cleanup of perl code, removing unnecessary subroutines and variables
> * added a few comments
> * ran pgperltidy
>
> Some of the perl code working with ranges still needs further cleanup
> and explanation, though.
>
> Also, I ran some of my own simple tests (mostly ASCII) and it showed
> over 10% speedup. That combined with the smaller table sizes makes this
> well worth it.
Looks good overall.
> static const pg_wchar case_map_lower[1677] =
> {
> 0x000000, /* U+000000 */
> 0x000000, /* U+000000 */
> 0x000001, /* U+000001 */
> 0x000002, /* U+000002 */
The duplicated 0x000000 looks wrong. I understand that the 0'th entry is
reserved, and the actual codepoints start at index 1, but the /*
U+000000 */ comment on the 0'th entry is misleading.
> static const uint8 case_map_special[1677] =
> {
> 0x000000, /* U+000000 */
> 0x000000, /* U+000000 */
> ...
0x000000 implies an 24-bit integer, but these are uint8's. Let's use
plain base-10 decimals here rather than hex, like in 'case_map'.
Attached are fixes for those and some other minor things.
--
Heikki Linnakangas
Neon (https://neon.tech)