Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: Built-in CTYPE provider
Date
Msg-id 6f3e94c0-f174-4380-9b69-072f8a838881@manitou-mail.org
Whole thread Raw
In response to Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
List pgsql-hackers
    Jeff Davis wrote:

> Attached a more complete version that fixes a few bugs

[v15 patch]

When selecting the builtin provider with initdb, I'm getting the
following setup:

$ bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata

  The database cluster will be initialized with this locale configuration:
    default collation provider:  builtin
    default collation locale:     C.UTF-8
    LC_COLLATE:  C.UTF-8
    LC_CTYPE:     C.UTF-8
    LC_MESSAGES: C.UTF-8
    LC_MONETARY: C.UTF-8
    LC_NUMERIC:  C.UTF-8
    LC_TIME:     C.UTF-8
  The default database encoding has accordingly been set to "UTF8".
  The default text search configuration will be set to "english".

This is from an environment where LANG=fr_FR.UTF-8

I would expect all LC_* variables to be fr_FR.UTF-8, and the default
text search configuration to be "french".  It is what happens
when selecting ICU as the provider in the same environment:

$ bin/initdb --icu-locale=en --locale-provider=icu -D/tmp/pgdata

  Using language tag "en" for ICU locale "en".
  The database cluster will be initialized with this locale configuration:
    default collation provider:  icu
    default collation locale:     en
    LC_COLLATE:  fr_FR.UTF-8
    LC_CTYPE:     fr_FR.UTF-8
    LC_MESSAGES: fr_FR.UTF-8
    LC_MONETARY: fr_FR.UTF-8
    LC_NUMERIC:  fr_FR.UTF-8
    LC_TIME:     fr_FR.UTF-8
  The default database encoding has accordingly been set to "UTF8".
  The default text search configuration will be set to "french".

The collation setup does not influence the rest of the localization.
The problem AFAIU is that --locale has two distinct
meanings in the v15 patch:
--locale-provider=X --locale=Y means use "X" as the provider
with "Y" as datlocale, and it means use "Y" as the locale for all
localized libc functionalities.

I wonder what would happen if invoking
 bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata
on a system where C.UTF-8 does not exist as a libc locale.
Would it fail? (I don't have an OS like this to test ATM, will try later).

A related comment is about naming the builtin locale C.UTF-8, the same
name as in libc. On one hand this is semantically sound, but on the
other hand, it's likely to confuse people. What about using completely
different names, like "pg_unicode" or something else prefixed by "pg_"
both for the locale name and the collation name (currently
C.UTF-8/c_utf8)?



Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: GUCifying MAX_WAL_SEND
Next
From: Tom Lane
Date:
Subject: Re: initdb --no-locale=C doesn't work as specified when the environment is not C