fix for multi-byte partial truncating - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | fix for multi-byte partial truncating |
Date | |
Msg-id | 199809240920.SAA29742@srapc451.sra.co.jp Whole thread Raw |
In response to | Re: [HACKERS] pg_dump, problem with user defined types? ("Thomas G. Lockhart" <lockhart@alumni.caltech.edu>) |
Responses |
Re: [HACKERS] fix for multi-byte partial truncating
|
List | pgsql-hackers |
For varchar(n)/char(n) type, input string is silently truncated if it is longer than n. A multi-byte letter consists of several bytes and they should not be divided into pieces. Unconditional truncating multi-byte letters would make partial multi-byte bytes. Attached patches should fix the problem. Index: backend/utils/adt/varchar.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v retrieving revision 1.39 diff -c -r1.39 varchar.c *** varchar.c 1998/09/01 04:32:53 1.39 --- varchar.c 1998/09/24 09:03:37 *************** *** 147,153 **** --- 147,160 ---- if ((len == -1) || (len == VARSIZE(s))) return s; + #ifdef MULTIBYTE + /* truncate multi-byte string in a way not to break + multi-byte boundary */ + rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ); + len = rlen + VARHDRSZ; + #else rlen = len - VARHDRSZ; + #endif if (rlen > 4096) elog(ERROR, "bpchar: length of char() must be less than 4096"); *************** *** 367,373 **** --- 374,387 ---- /* only reach here if we need to truncate string... */ + #ifdef MULTIBYTE + /* truncate multi-byte string in a way not to break + multi-byte boundary */ + len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ); + slen = len + VARHDRSZ; + #else len = slen - VARHDRSZ; + #endif if (len > 4096) elog(ERROR, "varchar: length of varchar() must be less than 4096"); Index: backend/utils/mb/mbutils.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v retrieving revision 1.3 diff -c -r1.3 mbutils.c *** mbutils.c 1998/09/01 04:33:22 1.3 --- mbutils.c 1998/09/24 09:03:38 *************** *** 202,207 **** --- 202,235 ---- } /* + * returns the length of a multi-byte string + * (not necessarily NULL terminated) + * that is not longer than limit. + * this function does not break multi-byte word boundary. + */ + int + pg_mbcliplen(const unsigned char *mbstr, int len, int limit) + { + int clen = 0; + int l; + + while (*mbstr && len > 0) + { + l = pg_mblen(mbstr); + if ((clen + l) > limit) { + break; + } + clen += l; + if (clen == limit) { + break; + } + len -= l; + mbstr += l; + } + return (clen); + } + + /* * fuctions for utils/init */ static int DatabaseEncoding = MULTIBYTE; Index: include/mb/pg_wchar.h =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v retrieving revision 1.4 diff -c -r1.4 pg_wchar.h *** pg_wchar.h 1998/09/01 04:36:34 1.4 --- pg_wchar.h 1998/09/24 09:03:42 *************** *** 103,108 **** --- 103,109 ---- extern int pg_mic_mblen(const unsigned char *); extern int pg_mbstrlen(const unsigned char *); extern int pg_mbstrlen_with_len(const unsigned char *, int); + extern int pg_mbcliplen(const unsigned char *, int, int); extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int); extern bool show_client_encoding(void); extern bool reset_client_encoding(void);
pgsql-hackers by date: