Thread: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

mha@postgresql.org (Magnus Hagander) writes:
> Explicitly bind gettext to the correct encoding on Windows.

I have a couple of objections to this patch.  First, what happens if
it fails to find a matching table entry?  (The existing answer is
"nothing", but that doesn't seem right.)  Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.
        regards, tom lane


Tom Lane wrote:
> mha@postgresql.org (Magnus Hagander) writes:
> > Explicitly bind gettext to the correct encoding on Windows.
> 
> I have a couple of objections to this patch.  First, what happens if
> it fails to find a matching table entry?  (The existing answer is
> "nothing", but that doesn't seem right.)  Second and more critical,
> it adds still another data structure that has to be maintained when
> the list of encodings changes, and it doesn't even live in the same
> file as any existing encoding-information table.
> 
> What makes more sense to me is to add a table to encnames.c that
> provides the gettext name of every encoding that we support.

Would someone please comment on Tom's questions above.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

From
Magnus Hagander
Date:
Tom Lane wrote:
> mha@postgresql.org (Magnus Hagander) writes:
>> Explicitly bind gettext to the correct encoding on Windows.
> 
> I have a couple of objections to this patch.  First, what happens if
> it fails to find a matching table entry?  (The existing answer is
> "nothing", but that doesn't seem right.)  Second and more critical,
> it adds still another data structure that has to be maintained when
> the list of encodings changes, and it doesn't even live in the same
> file as any existing encoding-information table.
> 
> What makes more sense to me is to add a table to encnames.c that
> provides the gettext name of every encoding that we support.

Do you mean a separate table there, or should we add a new column to one
of the existing tables?

//Magnus



Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> What makes more sense to me is to add a table to encnames.c that
>> provides the gettext name of every encoding that we support.

> Do you mean a separate table there, or should we add a new column to one
> of the existing tables?

Whichever seems to make more sense is fine with me.  I just don't want
add-an-encoding maintenance requirements spread across N different
source files.
        regards, tom lane


Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Tom Lane wrote:
>>> What makes more sense to me is to add a table to encnames.c that
>>> provides the gettext name of every encoding that we support.
> 
>> Do you mean a separate table there, or should we add a new column to one
>> of the existing tables?
> 
> Whichever seems to make more sense is fine with me.  I just don't want
> add-an-encoding maintenance requirements spread across N different
> source files.

I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?

//Magnus



Magnus Hagander wrote:
> Tom Lane wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>> Tom Lane wrote:
>>>> What makes more sense to me is to add a table to encnames.c that
>>>> provides the gettext name of every encoding that we support.
>>> Do you mean a separate table there, or should we add a new column to one
>>> of the existing tables?
>> Whichever seems to make more sense is fine with me.  I just don't want
>> add-an-encoding maintenance requirements spread across N different
>> source files.
> 
> I was about to start looking at this when that other thread
> (http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
> started about related issues on other platforms. Seems we should have a
> "coordinated fix" for this, so I'm going to want and see what come sout
> of that one. Unless I'm misunderstanding thigns and they're not related?

I've committed a fairly trivial patch per Peter's suggestion to fix the 
other thread's issue. I left the table as is, so whatever refactorings 
were planned can now be applied.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Heikki Linnakangas wrote:
> Magnus Hagander wrote:
>> Tom Lane wrote:
>>> Magnus Hagander <magnus@hagander.net> writes:
>>>> Tom Lane wrote:
>>>>> What makes more sense to me is to add a table to encnames.c that
>>>>> provides the gettext name of every encoding that we support.
>>>> Do you mean a separate table there, or should we add a new column to
>>>> one
>>>> of the existing tables?
>>> Whichever seems to make more sense is fine with me.  I just don't want
>>> add-an-encoding maintenance requirements spread across N different
>>> source files.
>>
>> I was about to start looking at this when that other thread
>> (http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
>> started about related issues on other platforms. Seems we should have a
>> "coordinated fix" for this, so I'm going to want and see what come sout
>> of that one. Unless I'm misunderstanding thigns and they're not related?
>
> I've committed a fairly trivial patch per Peter's suggestion to fix the
> other thread's issue. I left the table as is, so whatever refactorings
> were planned can now be applied.

Here's a patch that moves the table over to encnames.c, and renames it
to look like the others.

I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.

Comments?

//Magnus

*** a/src/backend/utils/mb/encnames.c
--- b/src/backend/utils/mb/encnames.c
***************
*** 431,436 **** pg_enc2name pg_enc2name_tbl[] =
--- 431,478 ----
  };

  /* ----------
+  * These are encoding names for gettext.
+  * ----------
+  */
+ pg_enc2gettext pg_enc2gettext_tbl[] =
+ {
+     {PG_UTF8, "UTF-8"},
+     {PG_LATIN1, "LATIN1"},
+     {PG_LATIN2, "LATIN2"},
+     {PG_LATIN3, "LATIN3"},
+     {PG_LATIN4, "LATIN4"},
+     {PG_ISO_8859_5, "ISO-8859-5"},
+     {PG_ISO_8859_6, "ISO_8859-6"},
+     {PG_ISO_8859_7, "ISO-8859-7"},
+     {PG_ISO_8859_8, "ISO-8859-8"},
+     {PG_LATIN5, "LATIN5"},
+     {PG_LATIN6, "LATIN6"},
+     {PG_LATIN7, "LATIN7"},
+     {PG_LATIN8, "LATIN8"},
+     {PG_LATIN9, "LATIN-9"},
+     {PG_LATIN10, "LATIN10"},
+     {PG_KOI8R, "KOI8-R"},
+     {PG_KOI8U, "KOI8-U"},
+     {PG_WIN1250, "CP1250"},
+     {PG_WIN1251, "CP1251"},
+     {PG_WIN1252, "CP1252"},
+     {PG_WIN1253, "CP1253"},
+     {PG_WIN1254, "CP1254"},
+     {PG_WIN1255, "CP1255"},
+     {PG_WIN1256, "CP1256"},
+     {PG_WIN1257, "CP1257"},
+     {PG_WIN1258, "CP1258"},
+     {PG_WIN866, "CP866"},
+     {PG_WIN874, "CP874"},
+     {PG_EUC_CN, "EUC-CN"},
+     {PG_EUC_JP, "EUC-JP"},
+     {PG_EUC_KR, "EUC-KR"},
+     {PG_EUC_TW, "EUC-TW"},
+     {PG_EUC_JIS_2004, "EUC-JP"}
+ };
+
+
+ /* ----------
   * Encoding checks, for error returns -1 else encoding id
   * ----------
   */
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 890,936 **** cliplen(const char *str, int len, int limit)
      return l;
  }

- #if defined(ENABLE_NLS)
- static const struct codeset_map {
-     int    encoding;
-     const char *codeset;
- } codeset_map_array[] = {
-     {PG_UTF8, "UTF-8"},
-     {PG_LATIN1, "LATIN1"},
-     {PG_LATIN2, "LATIN2"},
-     {PG_LATIN3, "LATIN3"},
-     {PG_LATIN4, "LATIN4"},
-     {PG_ISO_8859_5, "ISO-8859-5"},
-     {PG_ISO_8859_6, "ISO_8859-6"},
-     {PG_ISO_8859_7, "ISO-8859-7"},
-     {PG_ISO_8859_8, "ISO-8859-8"},
-     {PG_LATIN5, "LATIN5"},
-     {PG_LATIN6, "LATIN6"},
-     {PG_LATIN7, "LATIN7"},
-     {PG_LATIN8, "LATIN8"},
-     {PG_LATIN9, "LATIN-9"},
-     {PG_LATIN10, "LATIN10"},
-     {PG_KOI8R, "KOI8-R"},
-     {PG_KOI8U, "KOI8-U"},
-     {PG_WIN1250, "CP1250"},
-     {PG_WIN1251, "CP1251"},
-     {PG_WIN1252, "CP1252"},
-     {PG_WIN1253, "CP1253"},
-     {PG_WIN1254, "CP1254"},
-     {PG_WIN1255, "CP1255"},
-     {PG_WIN1256, "CP1256"},
-     {PG_WIN1257, "CP1257"},
-     {PG_WIN1258, "CP1258"},
-     {PG_WIN866, "CP866"},
-     {PG_WIN874, "CP874"},
-     {PG_EUC_CN, "EUC-CN"},
-     {PG_EUC_JP, "EUC-JP"},
-     {PG_EUC_KR, "EUC-KR"},
-     {PG_EUC_TW, "EUC-TW"},
-     {PG_EUC_JIS_2004, "EUC-JP"}
- };
- #endif /* ENABLE_NLS */
-
  void
  SetDatabaseEncoding(int encoding)
  {
--- 890,895 ----
***************
*** 969,980 **** pg_bind_textdomain_codeset(const char *domainname)
          return;
  #endif

!     for (i = 0; i < lengthof(codeset_map_array); i++)
      {
!         if (codeset_map_array[i].encoding == encoding)
          {
              if (bind_textdomain_codeset(domainname,
!                                         codeset_map_array[i].codeset) == NULL)
                  elog(LOG, "bind_textdomain_codeset failed");
              break;
          }
--- 928,939 ----
          return;
  #endif

!     for (i = 0; pg_enc2gettext_tbl[i].name != NULL; i++)
      {
!         if (pg_enc2gettext_tbl[i].encoding == encoding)
          {
              if (bind_textdomain_codeset(domainname,
!                                         pg_enc2gettext_tbl[i].name) == NULL)
                  elog(LOG, "bind_textdomain_codeset failed");
              break;
          }
*** a/src/include/mb/pg_wchar.h
--- b/src/include/mb/pg_wchar.h
***************
*** 262,267 **** typedef struct pg_enc2name
--- 262,278 ----
  extern pg_enc2name pg_enc2name_tbl[];

  /*
+  * Encoding names for gettext
+  */
+ typedef struct pg_enc2gettext
+ {
+     pg_enc        encoding;
+     const char *name;
+ } pg_enc2gettext;
+
+ extern pg_enc2gettext pg_enc2gettext_tbl[];
+
+ /*
   * pg_wchar stuff
   */
  typedef int (*mb2wchar_with_len_converter) (const unsigned char *from,

Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>>> What makes more sense to me is to add a table to encnames.c that
>>> provides the gettext name of every encoding that we support.

> Here's a patch that moves the table over to encnames.c, and renames it
> to look like the others.

I think you forgot to include the NULL terminating entry that the
loop seems to be expecting.  Also, why isn't the array "const"?

> I don't know what it should be doing if it can't find a match, so I
> haven't changed that behavior.

As things stand, it should throw error, except in the case of SQL_ASCII;
there is no excuse for any other database encoding to not be in the
table.  However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv.  Did we have a solution for that?

Anyway, this fixes my immediate concern about where the info is located,
so you may as well apply it with the array-terminator fix.
        regards, tom lane


Tom Lane wrote:
> However, what seems more worrisome to me is the prospect already
> discussed that the codeset name we have in the table is not actually
> recognized by gettext/iconv.  Did we have a solution for that?

You get English.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Tom Lane wrote:
>> I don't know what it should be doing if it can't find a match, so I
>> haven't changed that behavior.
> 
> As things stand, it should throw error, except in the case of SQL_ASCII;
> there is no excuse for any other database encoding to not be in the
> table.  However, what seems more worrisome to me is the prospect already
> discussed that the codeset name we have in the table is not actually
> recognized by gettext/iconv.  Did we have a solution for that?
> 
> Anyway, this fixes my immediate concern about where the info is located,
> so you may as well apply it with the array-terminator fix.

Done.

//Magnus