fix for multi-byte partial truncating - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject fix for multi-byte partial truncating
Date
Msg-id 199809240920.SAA29742@srapc451.sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] pg_dump, problem with user defined types?  ("Thomas G. Lockhart" <lockhart@alumni.caltech.edu>)
Responses Re: [HACKERS] fix for multi-byte partial truncating  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
For varchar(n)/char(n) type, input string is silently truncated if it
is longer than n. A multi-byte letter consists of several bytes and
they should not be divided into pieces. Unconditional truncating
multi-byte letters would make partial multi-byte bytes.

Attached patches should fix the problem.

Index: backend/utils/adt/varchar.c
===================================================================
RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v
retrieving revision 1.39
diff -c -r1.39 varchar.c
*** varchar.c    1998/09/01 04:32:53    1.39
--- varchar.c    1998/09/24 09:03:37
***************
*** 147,153 ****
--- 147,160 ----
      if ((len == -1) || (len == VARSIZE(s)))
          return s;

+ #ifdef MULTIBYTE
+     /* truncate multi-byte string in a way not to break
+        multi-byte boundary */
+     rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
+     len = rlen + VARHDRSZ;
+ #else
      rlen = len - VARHDRSZ;
+ #endif

      if (rlen > 4096)
          elog(ERROR, "bpchar: length of char() must be less than 4096");
***************
*** 367,373 ****
--- 374,387 ----

      /* only reach here if we need to truncate string... */

+ #ifdef MULTIBYTE
+     /* truncate multi-byte string in a way not to break
+        multi-byte boundary */
+     len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ);
+     slen = len + VARHDRSZ;
+ #else
      len = slen - VARHDRSZ;
+ #endif

      if (len > 4096)
          elog(ERROR, "varchar: length of varchar() must be less than 4096");
Index: backend/utils/mb/mbutils.c
===================================================================
RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
retrieving revision 1.3
diff -c -r1.3 mbutils.c
*** mbutils.c    1998/09/01 04:33:22    1.3
--- mbutils.c    1998/09/24 09:03:38
***************
*** 202,207 ****
--- 202,235 ----
  }

  /*
+  * returns the length of a multi-byte string
+  * (not necessarily  NULL terminated)
+  * that is not longer than limit.
+  * this function does not break multi-byte word boundary.
+  */
+ int
+ pg_mbcliplen(const unsigned char *mbstr, int len, int limit)
+ {
+     int            clen = 0;
+     int            l;
+
+     while (*mbstr &&  len > 0)
+     {
+         l = pg_mblen(mbstr);
+         if ((clen + l) > limit) {
+             break;
+         }
+         clen += l;
+         if (clen == limit) {
+             break;
+         }
+         len -= l;
+         mbstr += l;
+     }
+     return (clen);
+ }
+
+ /*
   * fuctions for utils/init
   */
  static int    DatabaseEncoding = MULTIBYTE;
Index: include/mb/pg_wchar.h
===================================================================
RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v
retrieving revision 1.4
diff -c -r1.4 pg_wchar.h
*** pg_wchar.h    1998/09/01 04:36:34    1.4
--- pg_wchar.h    1998/09/24 09:03:42
***************
*** 103,108 ****
--- 103,109 ----
  extern int    pg_mic_mblen(const unsigned char *);
  extern int    pg_mbstrlen(const unsigned char *);
  extern int    pg_mbstrlen_with_len(const unsigned char *, int);
+ extern int    pg_mbcliplen(const unsigned char *, int, int);
  extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int);
  extern bool show_client_encoding(void);
  extern bool reset_client_encoding(void);

pgsql-hackers by date:

Previous
From: jwieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] SQL Triggers
Next
From: "Jose' Soares"
Date:
Subject: Re: [HACKERS] Re: [SQL] 2 questions.