Thread: BUG #6277: Money datatype conversion wrong with Russian locale

BUG #6277: Money datatype conversion wrong with Russian locale

From

"Alexander LAW"

Date:

29 October 2011, 09:10:19

The following bug has been logged online:

Bug reference:      6277
Logged by:          Alexander LAW
Email address:      exclusion@gmail.com
PostgreSQL version: 9.1.1
Operating system:   Windows
Description:        Money datatype conversion wrong with Russian locale
Details:

When lc_monetary is set to Russian, money values having 4+ digits formatted
incorrectly.
E.g. following select:

set lc_monetary='Russian';
select '1000'::money;

returns a string that can't be displayed.
It's caused by wrong mon_thousands_sep processing in
backend/utils/adt/cash.c, cash_out function.
The code assumes that the thousands separator fits in one character. But in
Russian locale we have non-breakable space as the thousands separator (0xC2
0xA0 in UTF-8).
The cash_out function uses only the first char (0xC2) of it and thus returns
an invalid string: 1\xC2000.

Re: BUG #6277: Money datatype conversion wrong with Russian locale

From

Tom Lane

Date:

29 October 2011, 13:17:27

"Alexander LAW" <exclusion@gmail.com> writes:
> It's caused by wrong mon_thousands_sep processing in
> backend/utils/adt/cash.c, cash_out function.
> The code assumes that the thousands separator fits in one character. But in
> Russian locale we have non-breakable space as the thousands separator (0xC2
> 0xA0 in UTF-8).

Hmm ... looks like cash_out really needs a significant rewrite to make
that work nicely.  It's combining counting of digit positions with
counting of output bytes, which was messy enough already, but gets
worse fast if the thousands separator isn't a single byte.

Does anyone know of locales where the decimal point isn't a single byte?
I'm wondering if that assumption needs to be got rid of too.

            regards, tom lane

Re: BUG #6277: Money datatype conversion wrong with Russian locale

From

Alexander Lakhin

Date:

30 October 2011, 00:47:05

I think there is no need to leave such assumptions. I would propose the
following fix: http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too)
I can send it as a patch if you wish. Please notice a comments regarding
regression tests. IMHO at least currency symbol separator should be
processed as specified in lconv. And maybe mon_decimal_point,
currency_symbol and negative_sign should be allowed to be empty too if
it's defined by a locale.

Best regards,
Alexander

29.10.2011 20:17, Tom Lane writes:
> "Alexander LAW"<exclusion@gmail.com>  writes:
>> It's caused by wrong mon_thousands_sep processing in
>> backend/utils/adt/cash.c, cash_out function.
>> The code assumes that the thousands separator fits in one character. But in
>> Russian locale we have non-breakable space as the thousands separator (0xC2
>> 0xA0 in UTF-8).
> Hmm ... looks like cash_out really needs a significant rewrite to make
> that work nicely.  It's combining counting of digit positions with
> counting of output bytes, which was messy enough already, but gets
> worse fast if the thousands separator isn't a single byte.
>
> Does anyone know of locales where the decimal point isn't a single byte?
> I'm wondering if that assumption needs to be got rid of too.
>
>             regards, tom lane

Re: BUG #6277: Money datatype conversion wrong with Russian locale

From

Tom Lane

Date:

30 October 2011, 16:09:43

Alexander Lakhin <exclusion@gmail.com> writes:
> I think there is no need to leave such assumptions. I would propose the
> following fix: http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too)
> I can send it as a patch if you wish. Please notice a comments regarding
> regression tests. IMHO at least currency symbol separator should be
> processed as specified in lconv. And maybe mon_decimal_point,
> currency_symbol and negative_sign should be allowed to be empty too if
> it's defined by a locale.

I've committed a patch that improves the handling of cs_precedes,
sign_posn et al.  I don't agree with your proposal to slavishly follow
the locale definition no matter how brain-dead it is, because that would
destroy the ability to dump and reload data without data loss --- for
example, if we obeyed an empty negative_sign setting, we'd lose the
information that a value had been negative.  AFAICT there are no such
locales anyway, other than POSIX for which we definitely don't want to
believe the empty-string setting.

            regards, tom lane

Re: BUG #6277: Money datatype conversion wrong with Russian locale

From

Alexander LAW

Date:

31 October 2011, 07:45:54

Thank you, Tom.
I tried patched file and it works fine now (with Russian locale both in
Linux and Windows).
I've noticed that you strict mon_decimal_point to one char, but at least
two locales (fa_IR and ps_AF) (I looked at glibc-2.14) define
mon_decimal_point as "<U066B>" so it takes two bytes.
And IMHO the locale sanity check better move to PGLC_localeconv and
don't perform these checks with each number conversion.

Best regards,
Alexander.


30.10.2011 23:09, Tom Lane ïèøåò:
> Alexander Lakhin<exclusion@gmail.com>  writes:
>> I think there is no need to leave such assumptions. I would propose the
>> following fix:http://pastebin.com/EBw5YB65  (it corrects a BUG #6144 too)
>> I can send it as a patch if you wish. Please notice a comments regarding
>> regression tests. IMHO at least currency symbol separator should be
>> processed as specified in lconv. And maybe mon_decimal_point,
>> currency_symbol and negative_sign should be allowed to be empty too if
>> it's defined by a locale.
> I've committed a patch that improves the handling of cs_precedes,
> sign_posn et al.  I don't agree with your proposal to slavishly follow
> the locale definition no matter how brain-dead it is, because that would
> destroy the ability to dump and reload data without data loss --- for
> example, if we obeyed an empty negative_sign setting, we'd lose the
> information that a value had been negative.  AFAICT there are no such
> locales anyway, other than POSIX for which we definitely don't want to
> believe the empty-string setting.
>
>             regards, tom lane