Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding - Mailing list pgsql-patches

From Tatsuo Ishii
Subject Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
Date
Msg-id 20051223.104207.129789350.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
List pgsql-patches
> Tom Lane wrote:
> > Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> > >> It looks like somebody rearranged the pg_enc enum without bothering to
> > >> fix the tables that are affected by this.
> >
> > > I will look into this.
> >
> > Thank you.  It might be worth adding a comment to pg_wchar.h listing all
> > the places that need to be fixed when enum pg_enc changes.
> >
>
> I have developed the following patch against CVS.  Tatsuo, you can use
> it as a starting point.  It adds a comment to encnames.c and reorders
> utf8_and_iso8859.c to match the existing order.  I also added the
> missing entries at the bottom.  I checked for pg_conv_map in the source
> code and only utf8_and_iso8859.c has that structure, so I assume it is
> the only one that also depends on the encnames.c ordering.

I think the current implementaion in utf8_and_iso8859.c is fast but
too fragile against rearranging of encoding id. I modify those functions
in utf8_and_iso8859.c to do a linear search with encoding id. With
this change developers feel free to rearrange encoding id, and this
kind of problems will be gone forever. The only penalty is the time of
searching 13 entries in the encoding map. We can do a quick sort but
it will need sorted entry by encoding id and may cause similar problem
in the future. So I'm not sure it's worth doing the quick sort.

Propsed patch attached.

> Looking at 8.0.X, it has the matching order, so we are OK there, but it
> doesn't have the trailing entries.  Tatsuo, are those needed?

I think it's OK, since the last missing entry will never be visited.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
retrieving revision 1.16
diff -u -r1.16 utf8_and_iso8859.c
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c    22 Nov 2005 18:17:26 -0000    1.16
+++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c    23 Dec 2005 01:43:38 -0000
@@ -68,15 +68,6 @@
 } pg_conv_map;

 static pg_conv_map maps[] = {
-    {PG_SQL_ASCII},                /* SQL/ASCII */
-    {PG_EUC_JP},                /* EUC for Japanese */
-    {PG_EUC_CN},                /* EUC for Chinese */
-    {PG_EUC_KR},                /* EUC for Korean */
-    {PG_EUC_TW},                /* EUC for Taiwan */
-    {PG_JOHAB},                    /* EUC for Korean JOHAB */
-    {PG_UTF8},                    /* Unicode UTF8 */
-    {PG_MULE_INTERNAL},            /* Mule internal code */
-    {PG_LATIN1},                /* ISO-8859-1 Latin 1 */
     {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
         sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
     sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)},    /* ISO-8859-2 Latin 2 */
@@ -104,12 +95,6 @@
     {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
         sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
     sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 Latin 10 */
-    {PG_WIN1256},                /* windows-1256 */
-    {PG_WIN1258},                /* Windows-1258 */
-    {PG_WIN874},                /* windows-874 */
-    {PG_KOI8R},                    /* KOI8-R */
-    {PG_WIN1251},                /* windows-1251 */
-    {PG_WIN866},                /* (MS-DOS CP866) */
     {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
         sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
     sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)},    /* ISO-8859-5 */
@@ -131,11 +116,23 @@
     unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
     unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
     int            len = PG_GETARG_INT32(4);
+    int    i;

     Assert(PG_GETARG_INT32(1) == PG_UTF8);
     Assert(len >= 0);

-    LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, encoding, len);
+    for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
+    {
+        if (encoding == maps[i].encoding)
+        {
+            LocalToUtf(src, dest, maps[i].map1, maps[i].size1, encoding, len);
+            PG_RETURN_VOID();
+        }
+    }
+
+    ereport(ERROR,
+            (errcode(ERRCODE_INTERNAL_ERROR),
+             errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));

     PG_RETURN_VOID();
 }
@@ -147,11 +144,23 @@
     unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
     unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
     int            len = PG_GETARG_INT32(4);
+    int    i;

     Assert(PG_GETARG_INT32(0) == PG_UTF8);
     Assert(len >= 0);

-    UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len);
+    for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
+    {
+        if (encoding == maps[i].encoding)
+        {
+            UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len);
+            PG_RETURN_VOID();
+        }
+    }
+
+    ereport(ERROR,
+            (errcode(ERRCODE_INTERNAL_ERROR),
+             errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));

     PG_RETURN_VOID();
 }

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Disparity in search_path SHOW and SET
Next
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion