Re: Inconsistency between TO_CHAR() and TO_NUMBER() - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Inconsistency between TO_CHAR() and TO_NUMBER()
Date
Msg-id 8415.1368200958@sss.pgh.pa.us
Whole thread Raw
In response to Re: Inconsistency between TO_CHAR() and TO_NUMBER()  (Patryk Kordylewski <pk@fooby.de>)
Responses Re: Inconsistency between TO_CHAR() and TO_NUMBER()  (Thomas Kellerer <spam_eater@gmx.net>)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()  (Bruce Momjian <bruce@momjian.us>)
List pgsql-bugs
Patryk Kordylewski <pk@fooby.de> writes:
> SET lc_numeric TO 'de_DE.UTF-8';
> SET

> SELECT
>    TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
>    TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
> 'FM99G999G999G999G999G999G999D000');
>       to_char     | to_number
> -----------------+-----------
>   123.456.789,123 |   123.456
> (1 row)

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*.  So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch.  I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale.  Perhaps we should only apply this to HEAD and not back-patch?

            regards, tom lane

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index db5dfca51d477d3e9b33b8d2c264495b3b2ec433..81e3329ef60ce4f835fedba50208b8d0f4b19d63 100644
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
*************** NUM_numpart_from_char(NUMProc *Np, int i
*** 4131,4137 ****
  #endif

      /*
!      * read digit
       */
      if (isdigit((unsigned char) *Np->inout_p))
      {
--- 4131,4137 ----
  #endif

      /*
!      * read digit or decimal point
       */
      if (isdigit((unsigned char) *Np->inout_p))
      {
*************** NUM_numpart_from_char(NUMProc *Np, int i
*** 4151,4190 ****
  #ifdef DEBUG_TO_FROM_CHAR
          elog(DEBUG_elog_output, "Read digit (%c)", *Np->inout_p);
  #endif
-
-         /*
-          * read decimal point
-          */
      }
      else if (IS_DECIMAL(Np->Num) && Np->read_dec == FALSE)
      {
  #ifdef DEBUG_TO_FROM_CHAR
!         elog(DEBUG_elog_output, "Try read decimal point (%c)", *Np->inout_p);
  #endif
!         if (*Np->inout_p == '.')
          {
              *Np->number_p = '.';
              Np->number_p++;
              Np->read_dec = TRUE;
              isread = TRUE;
          }
-         else
-         {
-             int            x = strlen(Np->decimal);
-
- #ifdef DEBUG_TO_FROM_CHAR
-             elog(DEBUG_elog_output, "Try read locale point (%c)",
-                  *Np->inout_p);
- #endif
-             if (x && AMOUNT_TEST(x) && strncmp(Np->inout_p, Np->decimal, x) == 0)
-             {
-                 Np->inout_p += x - 1;
-                 *Np->number_p = '.';
-                 Np->number_p++;
-                 Np->read_dec = TRUE;
-                 isread = TRUE;
-             }
-         }
      }

      if (OVERLOAD_TEST)
--- 4151,4178 ----
  #ifdef DEBUG_TO_FROM_CHAR
          elog(DEBUG_elog_output, "Read digit (%c)", *Np->inout_p);
  #endif
      }
      else if (IS_DECIMAL(Np->Num) && Np->read_dec == FALSE)
      {
+         /*
+          * We need not test IS_LDECIMAL(Np->Num) explicitly here, because
+          * Np->decimal is always just "." if we don't have a D format token.
+          * So we just unconditionally match to Np->decimal.
+          */
+         int            x = strlen(Np->decimal);
+
  #ifdef DEBUG_TO_FROM_CHAR
!         elog(DEBUG_elog_output, "Try read decimal point (%c)",
!              *Np->inout_p);
  #endif
!         if (x && AMOUNT_TEST(x) && strncmp(Np->inout_p, Np->decimal, x) == 0)
          {
+             Np->inout_p += x - 1;
              *Np->number_p = '.';
              Np->number_p++;
              Np->read_dec = TRUE;
              isread = TRUE;
          }
      }

      if (OVERLOAD_TEST)

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Strange time zone +00:53:28
Next
From: Thomas Kellerer
Date:
Subject: Re: Inconsistency between TO_CHAR() and TO_NUMBER()