Re: [18] Unintentional behavior change in commit e9931bfb75 - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: [18] Unintentional behavior change in commit e9931bfb75
Date
Msg-id 667d3b3f730a97f71ebecb74f917167d8ffba427.camel@j-davis.com
Whole thread Raw
In response to Re: [18] Unintentional behavior change in commit e9931bfb75  (Noah Misch <noah@leadboat.com>)
Responses Re: [18] Unintentional behavior change in commit e9931bfb75
List pgsql-hackers
On Sat, 2025-04-12 at 05:34 -0700, Noah Misch wrote:
> I think the code for (2) and for "I/i in Turkish" haven't returned. 
> Given
> commit e3fa2b0 restored the v17 "I/i in Turkish" treatment for plain
> lower(),
> the regex code likely needs a similar restoration.  If not, the regex
> comments
> would need to change to match the code.

Great find, thank you! I'm curious how you came about this difference,
was it through testing or code inspection?

Patch attached. I also updated the top of the comment so that it's
clear that it's referring to the libc provider specifically, and that
ICU still has an issue with non-UTF8 encodings.

Also, the force-to-ASCII-behavior special case is different for
pg_wc_tolower/uppper vs LOWER()/UPPER: the former depends only on
whether it's the default locale, whereas the latter depends on whether
it's the default locale and the encoding is single-byte. Therefore the
results in the tr_TR.UTF-8 locale for the libc provider are
inconsistent:

  => select 'i' ~* 'I', 'I' ~* 'i', lower('I') = 'i', upper('i') = 'I';
   ?column? | ?column? | ?column? | ?column?
  ----------+----------+----------+----------
   t        | t        | f        | f

That behavior goes back a long way, so I'm not suggesting that we
change it.

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Fix a resource leak (src/backend/utils/adt/rowtypes.c)
Next
From: Tom Lane
Date:
Subject: Re: bug in stored generated column over domain with constraints.