On Sat, 2025-04-12 at 05:34 -0700, Noah Misch wrote:
> I think the code for (2) and for "I/i in Turkish" haven't returned.
> Given
> commit e3fa2b0 restored the v17 "I/i in Turkish" treatment for plain
> lower(),
> the regex code likely needs a similar restoration. If not, the regex
> comments
> would need to change to match the code.
Great find, thank you! I'm curious how you came about this difference,
was it through testing or code inspection?
Patch attached. I also updated the top of the comment so that it's
clear that it's referring to the libc provider specifically, and that
ICU still has an issue with non-UTF8 encodings.
Also, the force-to-ASCII-behavior special case is different for
pg_wc_tolower/uppper vs LOWER()/UPPER: the former depends only on
whether it's the default locale, whereas the latter depends on whether
it's the default locale and the encoding is single-byte. Therefore the
results in the tr_TR.UTF-8 locale for the libc provider are
inconsistent:
=> select 'i' ~* 'I', 'I' ~* 'i', lower('I') = 'i', upper('i') = 'I';
?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------
t | t | f | f
That behavior goes back a long way, so I'm not suggesting that we
change it.
Regards,
Jeff Davis