Re: Remaining dependency on setlocale() - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Remaining dependency on setlocale()
Date
Msg-id 0e186a9a92634f0c5675a618ff5685d00cd8f836.camel@j-davis.com
Whole thread Raw
In response to Re: Remaining dependency on setlocale()  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: Remaining dependency on setlocale()
List pgsql-hackers
On Fri, 2025-12-05 at 16:01 +0100, Peter Eisentraut wrote:
> v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch
>
> The function comment reads "Check if b has a prefix of a." -- Is that
> the same as "Check if a is a prefix of b."?  The latter might be
> clearer.

Yes, fixed.

Note: I separated this into two patches. 0003 fixes the multibyte
mishandling issue, and 0004 consistently performs case folding. 0003 is
backpatchable, I believe.

> but the patch removes SB_lower_char().

Fixed and committed.

> v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch
>
> Might have a small typo in the commit message:
>
> ; and preserve and char-at-a-time logic for bytea.

Fixed.

I also changed it into two functions: like_fixed_prefix(), which is
almost unchanged from the original; and like_fixed_prefix_ci(), which
is multibyte and locale-aware. It was too confusing to have single-byte
and multi-byte logic in the same function, and they didn't share much
code anyway.

> case '\xc7':        /* C with cedilla */
>
> so the premise that "fuzzystrmatch is designed for ASCII" does not
> appear to be correct.  Needs more analysis.
>
> (But apparently it's not multibyte aware at all, so I don't know what
> to
> do about that.)

I didn't notice that, thank you. Agreed, we need a bit more discussion
around this case as well as soundex().

> v11-0008-downcase_identifier-use-method-table-from-locale.patch
>
> I'm confused here about the name of the function pg_strfold_ident(). 
> In
> general, case "folding" results in an opaque string that is really
> only
> useful for comparing against other case-folded strings.  But for
> identifiers we are actually interested lower-casing.  I think this
> should be corrected in the API naming.

Agreed and fixed.

Also, I added 0006, which saves a locale_t object for ICU in this one
case where it's required. Surely that's not what we want in the long
term, but we don't have the infrastructure for decoding pg_wchar into
code points yet, and 0006 avoids the dependency on the global LC_CTYPE
setting.

> v11-0009-Control-LC_COLLATE-with-GUC.patch
>
> I know there were some complaints about compatibility with
> extensions,
> but I don't think anything concrete was presented.  I would like to
> see
> more evidence that we need this.
>
> Also, recall that we used to have a lc_collate GUC, and in the end
> people got confused that it didn't actually show a meaningful value
> when
> you used ICU.  So we removed that.  It seems adding this back in
> would
> create a similar kind of confusion.  So to avoid that, maybe this
> should
> be called fallback_lc_collate or something like that.

Yes, this is a POC patch and needs more discussion.

What are your thoughts about a similar lc_ctype GUC, though? That has
slightly different trade-offs.


I believe v12 0001-0005 are about ready for commit, and 0003 should be
backported.

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows
Next
From: Mark Wong
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD