Thread: BUG #6277: Money datatype conversion wrong with Russian locale
The following bug has been logged online: Bug reference: 6277 Logged by: Alexander LAW Email address: exclusion@gmail.com PostgreSQL version: 9.1.1 Operating system: Windows Description: Money datatype conversion wrong with Russian locale Details: When lc_monetary is set to Russian, money values having 4+ digits formatted incorrectly. E.g. following select: set lc_monetary='Russian'; select '1000'::money; returns a string that can't be displayed. It's caused by wrong mon_thousands_sep processing in backend/utils/adt/cash.c, cash_out function. The code assumes that the thousands separator fits in one character. But in Russian locale we have non-breakable space as the thousands separator (0xC2 0xA0 in UTF-8). The cash_out function uses only the first char (0xC2) of it and thus returns an invalid string: 1\xC2000.
"Alexander LAW" <exclusion@gmail.com> writes: > It's caused by wrong mon_thousands_sep processing in > backend/utils/adt/cash.c, cash_out function. > The code assumes that the thousands separator fits in one character. But in > Russian locale we have non-breakable space as the thousands separator (0xC2 > 0xA0 in UTF-8). Hmm ... looks like cash_out really needs a significant rewrite to make that work nicely. It's combining counting of digit positions with counting of output bytes, which was messy enough already, but gets worse fast if the thousands separator isn't a single byte. Does anyone know of locales where the decimal point isn't a single byte? I'm wondering if that assumption needs to be got rid of too. regards, tom lane
I think there is no need to leave such assumptions. I would propose the following fix: http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too) I can send it as a patch if you wish. Please notice a comments regarding regression tests. IMHO at least currency symbol separator should be processed as specified in lconv. And maybe mon_decimal_point, currency_symbol and negative_sign should be allowed to be empty too if it's defined by a locale. Best regards, Alexander 29.10.2011 20:17, Tom Lane writes: > "Alexander LAW"<exclusion@gmail.com> writes: >> It's caused by wrong mon_thousands_sep processing in >> backend/utils/adt/cash.c, cash_out function. >> The code assumes that the thousands separator fits in one character. But in >> Russian locale we have non-breakable space as the thousands separator (0xC2 >> 0xA0 in UTF-8). > Hmm ... looks like cash_out really needs a significant rewrite to make > that work nicely. It's combining counting of digit positions with > counting of output bytes, which was messy enough already, but gets > worse fast if the thousands separator isn't a single byte. > > Does anyone know of locales where the decimal point isn't a single byte? > I'm wondering if that assumption needs to be got rid of too. > > regards, tom lane
Alexander Lakhin <exclusion@gmail.com> writes: > I think there is no need to leave such assumptions. I would propose the > following fix: http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too) > I can send it as a patch if you wish. Please notice a comments regarding > regression tests. IMHO at least currency symbol separator should be > processed as specified in lconv. And maybe mon_decimal_point, > currency_symbol and negative_sign should be allowed to be empty too if > it's defined by a locale. I've committed a patch that improves the handling of cs_precedes, sign_posn et al. I don't agree with your proposal to slavishly follow the locale definition no matter how brain-dead it is, because that would destroy the ability to dump and reload data without data loss --- for example, if we obeyed an empty negative_sign setting, we'd lose the information that a value had been negative. AFAICT there are no such locales anyway, other than POSIX for which we definitely don't want to believe the empty-string setting. regards, tom lane
Thank you, Tom. I tried patched file and it works fine now (with Russian locale both in Linux and Windows). I've noticed that you strict mon_decimal_point to one char, but at least two locales (fa_IR and ps_AF) (I looked at glibc-2.14) define mon_decimal_point as "<U066B>" so it takes two bytes. And IMHO the locale sanity check better move to PGLC_localeconv and don't perform these checks with each number conversion. Best regards, Alexander. 30.10.2011 23:09, Tom Lane ïèøåò: > Alexander Lakhin<exclusion@gmail.com> writes: >> I think there is no need to leave such assumptions. I would propose the >> following fix:http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too) >> I can send it as a patch if you wish. Please notice a comments regarding >> regression tests. IMHO at least currency symbol separator should be >> processed as specified in lconv. And maybe mon_decimal_point, >> currency_symbol and negative_sign should be allowed to be empty too if >> it's defined by a locale. > I've committed a patch that improves the handling of cs_precedes, > sign_posn et al. I don't agree with your proposal to slavishly follow > the locale definition no matter how brain-dead it is, because that would > destroy the ability to dump and reload data without data loss --- for > example, if we obeyed an empty negative_sign setting, we'd lose the > information that a value had been negative. AFAICT there are no such > locales anyway, other than POSIX for which we definitely don't want to > believe the empty-string setting. > > regards, tom lane