Thread: Re: Optimization for lower(), upper(), casefold() functions.
On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote: > I tried adding a loop to create tables, and everything looks fine > (v7). > Also removed unnecessary (hanging) global variables. Changed. I used a loop more similar to your first one (hash of arrays), and I left case_map_special outside of the loop. It a different type of array: rather than mapping to a codepoint, it maps to another index, and I think it's a different case (and needs a different comment). > > It's all about not getting too carried away. In my vision these > subroutines (user-defined functions) will have to be moved to perl > module (.pm) in the future, after the patch is committed, so that the > code can be used for Normalization Forms. I prefer to generalize when we have the other code in place. As it was, it was a bit confusing why the extra arguments and subroutines were there. Also, make_ranges() doesn't seem to do what is described in the comment: it produces a single range. I changed "> $limit" to ">= $limit", but then it generates 3 ranges, not two. I rewrote it to be more clear what's going on: * I used new looping logic that, at least to me, seems a bit simpler. * I changed the ranges from an array of 4 elements to a hash with 3 keys, which is more descriptive when using it. * I changed the terminology of a range so that it's Start and End, because $from and $to are used by branches() to mean something else. In branch(): * I got rid of the third element "VALUE" from the range. It was used to be the start of the next range, but there was already code in the caller to look ahead to the next range, so it didn't serve any purpose. * If we rely on some compiler optimizations, it might be possible to simplify branch() a bit, but I'm fine with the way it's done. When looking around, I didn't find a lot of material discussing this generated-branches approach. While it's mentioned a few places, I couldn't even find a consistent name for it. If you know of something where I can read some more analysis, please send a link. Committed. Thank you! Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Committed. Thank you! crake doesn't like your perl style: ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not lexical at line 638, column 2. See page 108 ofPBP. ([Variables::RequireLexicalLoopIterators] Severity: 5) regards, tom lane
On Sat, Mar 15, 2025 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
> Committed. Thank you!
crake doesn't like your perl style:
./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not lexical at line 638, column 2. See page 108 of PBP.
I suppose pgperltidy didn't catch that. I will fix it shortly.
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Sat, Mar 15, 2025 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> crake doesn't like your perl style: >> ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not >> lexical at line 638, column 2. See page 108 of PBP. > I suppose pgperltidy didn't catch that. I will fix it shortly. crake is running perlcritic, which is a whole different animal. regards, tom lane
15.03.2025 23:07, Jeff Davis wrote: > On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote: >> I tried adding a loop to create tables, and everything looks fine >> (v7). [...] > I prefer to generalize when we have the other code in place. As it was, > it was a bit confusing why the extra arguments and subroutines were > there. Thanks Jeff for the review of my patch! > Also, make_ranges() doesn't seem to do what is described in the > comment: it produces a single range. I changed "> $limit" to ">= > $limit", but then it generates 3 ranges, not two. I rewrote it to be > more clear what's going on: [...] I had some comments, but since the patch is already committed, it's fine. > When looking around, I didn't find a lot of material discussing this > generated-branches approach. While it's mentioned a few places, I > couldn't even find a consistent name for it. If you know of something > where I can read some more analysis, please send a link. Unfortunately, I can't send any links. I came up with this myself, for my project https://github.com/lexbor/lexbor. I saw that Postgres could improve the approach to Normalization Form, then I noticed that case is not all good either. That's why I took it up. In other words, this approach is purely my idea, and it is showing well. I really like Postgres and want to help you improve it, and I'm very good at Unicode. I feel like my knowledge will be useful for Postgres. Thank you Jeff again for your time. Regards, Alexander Borisov
One more thing: I observe that headerscheck is now unhappy: $ src/tools/pginclude/headerscheck In file included from /tmp/headerscheck.yOpahZ/test.c:2: ./src/include/common/unicode_case_table.h:8598:24: warning: 'casekind_map' defined but not used [-Wunused-variable] static const pg_wchar *casekind_map[NCaseKind] = ^~~~~~~~~~~~ It's not apparent to me why that table needs to be in a header file and not in the sole user .c file? Also, probably better to make it const: -static const pg_wchar *casekind_map[NCaseKind] = +static const pg_wchar * const casekind_map[NCaseKind] = regards, tom lane
On Tue, 2025-03-18 at 11:11 -0400, Tom Lane wrote: > It's not apparent to me why that table needs to be in a header > file and not in the sole user .c file? Thank you, fixed. > Also, probably better to make it const: > > -static const pg_wchar *casekind_map[NCaseKind] = > +static const pg_wchar * const casekind_map[NCaseKind] = Fixed also (except pgindent had a slightly different opinion about spaces). Was this a general suggestion, or did you see something in particular that would make it more optimizable this way? Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2025-03-18 at 11:11 -0400, Tom Lane wrote: >> Also, probably better to make it const: >> >> -static const pg_wchar *casekind_map[NCaseKind] = >> +static const pg_wchar * const casekind_map[NCaseKind] = > Was this a general suggestion, or did you see something in particular > that would make it more optimizable this way? No, just a general style position that tables that aren't supposed to change should be const. Cases like this are a tad insidious because it looks like you did make the table const, only you didn't. regards, tom lane