Thread: Re: [HACKERS] wrong behavior using to_char() again

Re: [HACKERS] wrong behavior using to_char() again

From
Bruce Momjian
Date:
Euler Taveira de Oliveira wrote:
> Hi,
>
> Looking again at bug report [1], I agree that's a glibc bug. Numbers in
> pt_BR has its format 1.234.567,89; sometimes the format 1234567,89 is
> acceptable too, ie, the thousand separator is optional. I guess that
> some locales use the 'optional' thousand separator too (yep, they are
> all broken too).
>
> euler@harman:/a/pgsql$ ./a.out pt_BR
> decimal_point: ,
> thousands_sep:
> euler@harman:/a/pgsql$ ./a.out fr_FR
> decimal_point: ,
> thousands_sep:
> euler@harman:/a/pgsql$ ./a.out es_ES
> decimal_point: ,
> thousands_sep:
> euler@harman:/a/pgsql$ ./a.out de_DE
> decimal_point: ,
> thousands_sep: .
> euler@harman:/a/pgsql$ ./a.out C
> decimal_point: .
> thousands_sep:
>
> The actual behavior is set: (i) "," if the thousand separator is "" (ii)
> "." if the decimal point is "". It is not what glibc says (even in the C
> locale). I expect that PostgreSQL agrees with glibc (even it's the wrong
> behavior). Given this assumption, i propose the attached patch (it needs
> to adjust the regression tests).

OK, I researched this and realized it should have been obvious to me
when I added this code in 2006 that making the thousands separator
always "," for a locale of "" was going to cause a problem.

What I have done is to duplicate the logic we already had in
psql/print.c where the default thousands separator is either "," or "."
to make it different from the decimal separator.

I stated in the thread I would look at it for 8.3 but forgot:

    http://archives.postgresql.org/pgsql-bugs/2006-09/msg00079.php

I don't think there is any change needed for the C locale.  That part
seems fine, as Alvaro already pointed out.

Please test, patch attached and applied to CVS.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/utils/adt/formatting.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/formatting.c,v
retrieving revision 1.133
diff -c -c -r1.133 formatting.c
*** src/backend/utils/adt/formatting.c    21 Nov 2007 21:49:22 -0000    1.133
--- src/backend/utils/adt/formatting.c    21 Nov 2007 22:24:52 -0000
***************
*** 3915,3921 ****
           */
          if (lconv->decimal_point && *lconv->decimal_point)
              Np->decimal = lconv->decimal_point;
-
          else
              Np->decimal = ".";

--- 3915,3920 ----
***************
*** 3926,3938 ****
           * Number thousands separator
           *
           * Some locales (e.g. broken glibc pt_BR), have a comma for decimal,
!          * but "" for thousands_sep, so we make the thousands_sep comma
!          * too.  2007-02-12
           */
          if (lconv->thousands_sep && *lconv->thousands_sep)
              Np->L_thousands_sep = lconv->thousands_sep;
!         else
              Np->L_thousands_sep = ",";

          /*
           * Currency symbol
--- 3925,3938 ----
           * Number thousands separator
           *
           * Some locales (e.g. broken glibc pt_BR), have a comma for decimal,
!          * but "" for thousands_sep, so we set the thousands_sep too. 2007-02-12
           */
          if (lconv->thousands_sep && *lconv->thousands_sep)
              Np->L_thousands_sep = lconv->thousands_sep;
!         else if (strcmp(Np->decimal, ",") != 0)
              Np->L_thousands_sep = ",";
+         else
+             Np->L_thousands_sep = ".";

          /*
           * Currency symbol
Index: src/bin/psql/print.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/print.c,v
retrieving revision 1.91
diff -c -c -r1.91 print.c
*** src/bin/psql/print.c    5 Jan 2007 22:19:49 -0000    1.91
--- src/bin/psql/print.c    21 Nov 2007 22:24:53 -0000
***************
*** 2039,2045 ****
          grouping = "3";            /* most common */
      if (*extlconv->thousands_sep)
          thousands_sep = strdup(extlconv->thousands_sep);
!     else if (*decimal_point != ',')
          thousands_sep = ",";
      else
          thousands_sep = ".";
--- 2039,2045 ----
          grouping = "3";            /* most common */
      if (*extlconv->thousands_sep)
          thousands_sep = strdup(extlconv->thousands_sep);
!     else if (strcmp(decimal_point, ",") != 0)
          thousands_sep = ",";
      else
          thousands_sep = ".";

Re: [HACKERS] wrong behavior using to_char() again

From
Euler Taveira de Oliveira
Date:
Bruce Momjian wrote:

> OK, I researched this and realized it should have been obvious to me
> when I added this code in 2006 that making the thousands separator
> always "," for a locale of "" was going to cause a problem.
>
I tested your patch and IMHO it breaks the glibc behavior. I'm providing
a SQL script [1] and a diff [2] showing the differences between before
and after applying it. In [2], I see a lot of common used (pt_*, es_*,
and fr_*) locales that we'll be changed. Is it the behavior we want to
support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
this case, use should accept "" as a possible value for thousands_sep).


> I don't think there is any change needed for the C locale.  That part
> seems fine, as Alvaro already pointed out.
>
I don't know about C locale, but it's broken too. In PostgreSQL, it's
following the en_US behavior. Comments?

euler@harman:/a/pgsql$ ./a.out C
decimal_point: "."
thousands_sep: ""
euler@harman:/a/pgsql$ ./a.out en_US
decimal_point: "."
thousands_sep: ","

[1] http://timbira.com/tmp/lcn3.sql
[2] http://timbira.com/tmp/lcnumeric.diff


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

Re: [HACKERS] wrong behavior using to_char() again

From
Alvaro Herrera
Date:
Euler Taveira de Oliveira wrote:
> Bruce Momjian wrote:
>
> > OK, I researched this and realized it should have been obvious to me
> > when I added this code in 2006 that making the thousands separator
> > always "," for a locale of "" was going to cause a problem.
> >
> I tested your patch and IMHO it breaks the glibc behavior. I'm providing
> a SQL script [1] and a diff [2] showing the differences between before
> and after applying it. In [2], I see a lot of common used (pt_*, es_*,
> and fr_*) locales that we'll be changed. Is it the behavior we want to
> support?

Well, what I can say is that the behavior you show for es_* that we were
historically doing is quite wrong, and the corrected output looks
better.

   lc_numeric |        to_char
  ------------+------------------------
!  es_CL      |      123,456,789,01230
  (1 registro)

--- 379,397 ----

  SET
   lc_numeric |        to_char
  ------------+------------------------
!  es_CL      |      123.456.789,01230
  (1 registro)


The first output makes no sense whereas the second is correct (ISTM
we've been doing it wrong for a lot of locales and it has just been
fixed).

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)