Thread: Re: pgsql: Use ICU by default at initdb time.

Re: pgsql: Use ICU by default at initdb time.

From
Peter Eisentraut
Date:
On 10.03.23 03:26, Jeff Davis wrote:
> That's because ICU always uses UTF-8 by default. ICU works just fine
> with many other encodings; is there a reason it doesn't take it from
> the environment just like for provider=libc?

I think originally the locale forced the encoding.  With ICU, we have a 
choice.  We could either stick to the encoding suggested by the OS, or 
pick our own.

Arguably, if we are going to nudge toward ICU, maybe we should nudge 
toward UTF-8 as well.




Re: pgsql: Use ICU by default at initdb time.

From
Jeff Davis
Date:
On Fri, 2023-03-10 at 10:59 +0100, Peter Eisentraut wrote:
> I think originally the locale forced the encoding.  With ICU, we have
> a
> choice.  We could either stick to the encoding suggested by the OS,
> or
> pick our own.

We still need LC_COLLATE and LC_CTYPE to match the database encoding
though. If we get those from the environment (which are connected to an
encoding), then I think we need to get the encoding from the
environment, too, right?

> Arguably, if we are going to nudge toward ICU, maybe we should nudge
> toward UTF-8 as well.

The OSes are already doing a pretty good job of that. Regardless, we
need to remove the dependence on LC_CTYPE and LC_COLLATE when the
provider is ICU first (we're close to that point but not quite there).

Regards,
    Jeff Davis




Re: pgsql: Use ICU by default at initdb time.

From
Peter Eisentraut
Date:
On 10.03.23 15:38, Jeff Davis wrote:
> On Fri, 2023-03-10 at 10:59 +0100, Peter Eisentraut wrote:
>> I think originally the locale forced the encoding.  With ICU, we have
>> a
>> choice.  We could either stick to the encoding suggested by the OS,
>> or
>> pick our own.
> 
> We still need LC_COLLATE and LC_CTYPE to match the database encoding
> though. If we get those from the environment (which are connected to an
> encoding), then I think we need to get the encoding from the
> environment, too, right?

Yes, of course.  So we can't really do what I was thinking of.




Re: pgsql: Use ICU by default at initdb time.

From
Jeff Davis
Date:
On Fri, 2023-03-10 at 15:48 +0100, Peter Eisentraut wrote:
> Yes, of course.  So we can't really do what I was thinking of.

OK, I plan to commit something like the patch in this thread soon. I
just need to add an explanatory comment.

It passes CI, but it's possible that there could be more buildfarm
failures that I'll need to look at afterward, so I'll count this as a
"trial fix".

Regards,
    Jeff Davis





Re: pgsql: Use ICU by default at initdb time.

From
Jeff Davis
Date:
On Fri, 2023-03-10 at 07:48 -0800, Jeff Davis wrote:
> On Fri, 2023-03-10 at 15:48 +0100, Peter Eisentraut wrote:
> > Yes, of course.  So we can't really do what I was thinking of.
>
> OK, I plan to commit something like the patch in this thread soon. I
> just need to add an explanatory comment.

Committed a slightly narrower fix that derives the default encoding the
same way for both libc and ICU; except that ICU still uses UTF-8 for
C/POSIX/--no-locale (because ICU doesn't work with SQL_ASCII).

That seemed more consistent with the comments around
pg_get_encoding_from_locale() and it was also easier to document the -E
switch in initdb.

I'll keep an eye on the buildfarm to see if this fixes the problem or
causes other issues. But it seems like the right change.

Regards,
    Jeff Davis