Re: TM format can mix encodings in to_char() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: TM format can mix encodings in to_char()
Date
Msg-id 24472.1555775401@sss.pgh.pa.us
Whole thread Raw
In response to Re: TM format can mix encodings in to_char()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TM format can mix encodings in to_char()
List pgsql-hackers
I wrote:
> Hmm.  I'd always imagined that the way that libc works is that LC_CTYPE
> determines the encoding (codeset) it's using across the board, so that
> functions like strftime would deliver data in that encoding.
> [ and much more based on that ]

After further study of the code, the situation seems less dire than
I feared yesterday.  In the first place, we disallow settings of
LC_COLLATE and LC_CTYPE that don't match the database encoding, see
tests in dbcommands.c's check_encoding_locale_matches() and in initdb.
So that core functionality will be consistent in any case.

Also, I see that PGLC_localeconv() is effectively doing exactly what
you suggested for strings that are encoded according to LC_MONETARY
and LC_NUMERIC:

        encoding = pg_get_encoding_from_locale(locale_monetary, true);

        db_encoding_convert(encoding, &worklconv.int_curr_symbol);
        db_encoding_convert(encoding, &worklconv.currency_symbol);
        ...

This is a little bit off, now that I look at it, because it's
failing to account for the possibility of getting -1 from
pg_get_encoding_from_locale.  It should probably do what
pg_bind_textdomain_codeset does:

    if (encoding < 0)
        encoding = PG_SQL_ASCII;

since passing PG_SQL_ASCII to the conversion will have the effect of
validating the data without any actual conversion.

I remain wary of this idea because it's depending on something that's
undefined per POSIX, but apparently it's working well enough for
LC_MONETARY and LC_NUMERIC, so we can probably get away with it for
LC_TIME as well.  Anyway the current code clearly does not work on
glibc, and I also verified that there's a problem on FreeBSD, so
this patch should make things better.

Also, experimentation suggests that LC_MESSAGES actually does work
the way I thought this stuff works, ie, its implied codeset isn't
really used.  (I think this only matters for strerror(), since we
force the issue for gettext, but glibc's strerror() is clearly not
paying attention to that.)  Sigh, who needs consistency?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Add missing operator <->(box, point)
Next
From: Andrey Borodin
Date:
Subject: Re: block-level incremental backup