Re: More message encoding woes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: More message encoding woes
Date
Msg-id 49DB2666.1050800@enterprisedb.com
Whole thread Raw
In response to Re: More message encoding woes  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: More message encoding woes  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Peter Eisentraut wrote:
> On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote:
>> Using the name for the latin1 encoding in the currently Windows-only
>> mapping table, "LATIN1", you get no translation because that name is not
>> recognized by the system. Using the other name "ISO-8859-1", it works.
>> "LATIN1" is not listed in the output of locale -m either.
>
> You are looking in the wrong place.  What we need is for iconv to recognize
> the encoding name used by PostgreSQL.  iconv --list is the primary hint for
> that.
>
> The locale names provided by the operating system are arbitrary and unrelated.

Oh, ok. I guess we can do the simple fix you proposed then.

Patch attached. Instead of checking for LC_CTYPE == C, I'm checking
"pg_get_encoding_from_locale(NULL) == encoding" which is more close to
what we actually want. The downside is that
pg_get_encoding_from_locale(NULL) isn't exactly free, but the upside is
that we don't need to keep this in sync with the rules we have in CREATE
DATABASE that enforce that locale matches encoding.

This doesn't include the cleanup to make the mapping table easier to
maintain that Magnus was going to have a look at before I started this
thread.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 890,896 **** cliplen(const char *str, int len, int limit)
      return l;
  }

! #if defined(ENABLE_NLS) && defined(WIN32)
  static const struct codeset_map {
      int    encoding;
      const char *codeset;
--- 890,896 ----
      return l;
  }

! #if defined(ENABLE_NLS)
  static const struct codeset_map {
      int    encoding;
      const char *codeset;
***************
*** 929,935 **** static const struct codeset_map {
      {PG_EUC_TW, "EUC-TW"},
      {PG_EUC_JIS_2004, "EUC-JP"}
  };
! #endif /* WIN32 */

  void
  SetDatabaseEncoding(int encoding)
--- 929,935 ----
      {PG_EUC_TW, "EUC-TW"},
      {PG_EUC_JIS_2004, "EUC-JP"}
  };
! #endif /* ENABLE_NLS */

  void
  SetDatabaseEncoding(int encoding)
***************
*** 946,960 **** SetDatabaseEncoding(int encoding)
  }

  /*
!  * On Windows, we need to explicitly bind gettext to the correct
!  * encoding, because gettext() tends to get confused.
   */
  void
  pg_bind_textdomain_codeset(const char *domainname, int encoding)
  {
! #if defined(ENABLE_NLS) && defined(WIN32)
      int     i;

      for (i = 0; i < lengthof(codeset_map_array); i++)
      {
          if (codeset_map_array[i].encoding == encoding)
--- 946,975 ----
  }

  /*
!  * Bind gettext to the correct encoding.
   */
  void
  pg_bind_textdomain_codeset(const char *domainname, int encoding)
  {
! #if defined(ENABLE_NLS)
      int     i;

+     /*
+      * gettext() uses the encoding specified by LC_CTYPE by default,
+      * so if that matches the database encoding, we don't need to do
+      * anything. This is not for performance, but because if
+      * bind_textdomain_codeset() doesn't recognize the codeset name we
+      * pass it, it will fall back to English and we don't want that to
+      * happen unnecessarily.
+      *
+      * On Windows, though, gettext() tends to get confused so we always
+      * bind it.
+      */
+ #ifndef WIN32
+     if (pg_get_encoding_from_locale(NULL) == encoding)
+         return;
+ #endif
+
      for (i = 0; i < lengthof(codeset_map_array); i++)
      {
          if (codeset_map_array[i].encoding == encoding)

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: More message encoding woes
Next
From: Hiroshi Inoue
Date:
Subject: Re: More message encoding woes