Thread: Re: [GENERAL] trouble with to_char('L')

Re: [GENERAL] trouble with to_char('L')

From

Tom Lane

Date:

29 May 2009, 15:16:31

Hiroshi Inoue <inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> I think what this suggests is that there probably needs to be some
>> encoding conversion logic near the places we examine localeconv()
>> output.

> Attached is a patch to the current CVS.
> It uses a similar way like LC_TIME stuff does.

I'm not really in a position to test/commit this, since I don't have a
Windows machine.  However, since no one else is stepping up to deal with
it, here's a quick review:

* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same.  What if they're different?

* What if the selected locale corresponds to Unicode (ie UTF16)
encoding?

* #define'ing strdup() to do something rather different from strdup
seems pretty horrid from the standpoint of code readability and
maintainability, especially with nary a comment explaining it.

* Code will dump core on malloc failure.

* Since this code is surely not performance critical, I wouldn't bother
with trying to optimize it; hence drop the special case for all-ASCII.

* Surely we already have a symbol somewhere that can be used in
place of this: #define    MAX_BYTES_PER_CHARACTER    4

        regards, tom lane

Re: [GENERAL] trouble with to_char('L')

From

Hiroshi Inoue

Date:

02 June 2009, 00:22:24

Tom Lane wrote:
> Hiroshi Inoue <inoue@tpf.co.jp> writes:
>> Tom Lane wrote:
>>> I think what this suggests is that there probably needs to be some
>>> encoding conversion logic near the places we examine localeconv()
>>> output.
> 
>> Attached is a patch to the current CVS.
>> It uses a similar way like LC_TIME stuff does.
> 
> I'm not really in a position to test/commit this, since I don't have a
> Windows machine.  However, since no one else is stepping up to deal with
> it, here's a quick review:

Thanks for the review.
I've forgotten the patch because Japanese doesn't have trouble with
this issue (the currency symbol is ascii \). If this is really
expected to be fixed, I would update the patch according to your
suggestion.

> * This seems to be assuming that the user has set LC_MONETARY and
> LC_NUMERIC the same.  What if they're different?

Strictky speaking they should be handled individually.

> * What if the selected locale corresponds to Unicode (ie UTF16)
> encoding?

As far as I tested set_locale(LC_MONETARY, xxx.65001) causes an error.

> * #define'ing strdup() to do something rather different from strdup
> seems pretty horrid from the standpoint of code readability and
> maintainability, especially with nary a comment explaining it.

Maybe using a function instead of strdup() which calls dbstr_win32()
in case of Windows would be better.
BTW grouping and money_grouping seem to be out of encoding conversion.
Are they guaranteed to be null terminated?

> * Code will dump core on malloc failure.

I can take care of it.

> * Since this code is surely not performance critical, I wouldn't bother
> with trying to optimize it; hence drop the special case for all-ASCII.

I can take care of it.
> 
> * Surely we already have a symbol somewhere that can be used in
> place of this:
>      #define    MAX_BYTES_PER_CHARACTER    4

I can't find it.
max(pg_encoding_max_length(encoding), pg_encoding_max_length(PG_UTF8))
may be better.

regards,
Hiroshi Inoue

Re: [GENERAL] trouble with to_char('L')

From

Tom Lane

Date:

03 June 2009, 15:06:24

Hiroshi Inoue <inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> * This seems to be assuming that the user has set LC_MONETARY and
>> LC_NUMERIC the same.  What if they're different?

> Strictky speaking they should be handled individually.

I thought about this some more, and I wonder why you did it like this at
all.  The patch claimed to be copying the LC_TIME code, but the LC_TIME
code isn't trying to temporarily change any locale settings.  What we
are doing in that code is assuming that the system will give us back
the localized strings in the encoding identified by CP_ACP; so all we
have to do is convert CP_ACP to wide chars and then to UTF8.  Can't we
use a similar approach for the output of localeconv?
        regards, tom lane

Re: [GENERAL] trouble with to_char('L')

From

Hiroshi Inoue

Date:

03 June 2009, 20:41:57

Tom Lane wrote:
> Hiroshi Inoue <inoue@tpf.co.jp> writes:
>> Tom Lane wrote:
>>> * This seems to be assuming that the user has set LC_MONETARY and
>>> LC_NUMERIC the same.  What if they're different?
> 
>> Strictky speaking they should be handled individually.
> 
> I thought about this some more, and I wonder why you did it like this at
> all.  The patch claimed to be copying the LC_TIME code, but the LC_TIME
> code isn't trying to temporarily change any locale settings. 

LC_TIME and LC_CTYPE (on Windows) settings are changed temporarily
in cache_locale_time() in pg_locale.c.

> What we
> are doing in that code is assuming that the system will give us back
> the localized strings in the encoding identified by CP_ACP; 

AFAIK it's not right. LC_TIME, LC_MONETARY or LC_NUMERIC related output
is encoded using LC_CTYPE setting.

> so all we
> have to do is convert CP_ACP to wide chars and then to UTF8.  Can't we
> use a similar approach for the output of localeconv?

What LC_CTIME code and my patch intend is setting LC_CTYPE to an
appropriate value so that related output is converted correctly.
If we can set LC_CTYPE to xxx_xxx.65001(UTF8), we can eliminate
two steps but it causes an error on Windows.

regards,
HIroshi Inoue