Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion |
Date | |
Msg-id | 200512230153.jBN1r9810536@candle.pha.pa.us Whole thread Raw |
In response to | Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Responses |
Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
|
List | pgsql-patches |
That is a nice solution --- instead of listing all the encodings, you listed just the ones that need to be used. The list is shorter and clearer. It seems like the right approach. Thanks. --------------------------------------------------------------------------- Tatsuo Ishii wrote: > > Tom Lane wrote: > > > Tatsuo Ishii <ishii@sraoss.co.jp> writes: > > > >> It looks like somebody rearranged the pg_enc enum without bothering to > > > >> fix the tables that are affected by this. > > > > > > > I will look into this. > > > > > > Thank you. It might be worth adding a comment to pg_wchar.h listing all > > > the places that need to be fixed when enum pg_enc changes. > > > > > > > I have developed the following patch against CVS. Tatsuo, you can use > > it as a starting point. It adds a comment to encnames.c and reorders > > utf8_and_iso8859.c to match the existing order. I also added the > > missing entries at the bottom. I checked for pg_conv_map in the source > > code and only utf8_and_iso8859.c has that structure, so I assume it is > > the only one that also depends on the encnames.c ordering. > > I think the current implementaion in utf8_and_iso8859.c is fast but > too fragile against rearranging of encoding id. I modify those functions > in utf8_and_iso8859.c to do a linear search with encoding id. With > this change developers feel free to rearrange encoding id, and this > kind of problems will be gone forever. The only penalty is the time of > searching 13 entries in the encoding map. We can do a quick sort but > it will need sorted entry by encoding id and may cause similar problem > in the future. So I'm not sure it's worth doing the quick sort. > > Propsed patch attached. > > > Looking at 8.0.X, it has the matching order, so we are OK there, but it > > doesn't have the trailing entries. Tatsuo, are those needed? > > I think it's OK, since the last missing entry will never be visited. > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v > retrieving revision 1.16 > diff -u -r1.16 utf8_and_iso8859.c > --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 22 Nov 2005 18:17:26 -0000 1.16 > +++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 23 Dec 2005 01:43:38 -0000 > @@ -68,15 +68,6 @@ > } pg_conv_map; > > static pg_conv_map maps[] = { > - {PG_SQL_ASCII}, /* SQL/ASCII */ > - {PG_EUC_JP}, /* EUC for Japanese */ > - {PG_EUC_CN}, /* EUC for Chinese */ > - {PG_EUC_KR}, /* EUC for Korean */ > - {PG_EUC_TW}, /* EUC for Taiwan */ > - {PG_JOHAB}, /* EUC for Korean JOHAB */ > - {PG_UTF8}, /* Unicode UTF8 */ > - {PG_MULE_INTERNAL}, /* Mule internal code */ > - {PG_LATIN1}, /* ISO-8859-1 Latin 1 */ > {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2, > sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf), > sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)}, /* ISO-8859-2 Latin 2 */ > @@ -104,12 +95,6 @@ > {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16, > sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf), > sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 Latin 10 */ > - {PG_WIN1256}, /* windows-1256 */ > - {PG_WIN1258}, /* Windows-1258 */ > - {PG_WIN874}, /* windows-874 */ > - {PG_KOI8R}, /* KOI8-R */ > - {PG_WIN1251}, /* windows-1251 */ > - {PG_WIN866}, /* (MS-DOS CP866) */ > {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5, > sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf), > sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)}, /* ISO-8859-5 */ > @@ -131,11 +116,23 @@ > unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2); > unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3); > int len = PG_GETARG_INT32(4); > + int i; > > Assert(PG_GETARG_INT32(1) == PG_UTF8); > Assert(len >= 0); > > - LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, encoding, len); > + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++) > + { > + if (encoding == maps[i].encoding) > + { > + LocalToUtf(src, dest, maps[i].map1, maps[i].size1, encoding, len); > + PG_RETURN_VOID(); > + } > + } > + > + ereport(ERROR, > + (errcode(ERRCODE_INTERNAL_ERROR), > + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding))); > > PG_RETURN_VOID(); > } > @@ -147,11 +144,23 @@ > unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2); > unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3); > int len = PG_GETARG_INT32(4); > + int i; > > Assert(PG_GETARG_INT32(0) == PG_UTF8); > Assert(len >= 0); > > - UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len); > + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++) > + { > + if (encoding == maps[i].encoding) > + { > + UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len); > + PG_RETURN_VOID(); > + } > + } > + > + ereport(ERROR, > + (errcode(ERRCODE_INTERNAL_ERROR), > + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding))); > > PG_RETURN_VOID(); > } -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-patches by date: