Re: [HACKERS] fix for multi-byte partial truncating - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] fix for multi-byte partial truncating |
Date | |
Msg-id | 199809250147.VAA22725@candle.pha.pa.us Whole thread Raw |
In response to | fix for multi-byte partial truncating (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: [HACKERS] fix for multi-byte partial truncating
|
List | pgsql-hackers |
Applied, but for some reason patch did not like the normal cvs/rcs diff format. Not sure why. Please check to see it is OK. Looks OK here. > For varchar(n)/char(n) type, input string is silently truncated if it > is longer than n. A multi-byte letter consists of several bytes and > they should not be divided into pieces. Unconditional truncating > multi-byte letters would make partial multi-byte bytes. > > Attached patches should fix the problem. > > Index: backend/utils/adt/varchar.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v > retrieving revision 1.39 > diff -c -r1.39 varchar.c > *** varchar.c 1998/09/01 04:32:53 1.39 > --- varchar.c 1998/09/24 09:03:37 > *************** > *** 147,153 **** > --- 147,160 ---- > if ((len == -1) || (len == VARSIZE(s))) > return s; > > + #ifdef MULTIBYTE > + /* truncate multi-byte string in a way not to break > + multi-byte boundary */ > + rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ); > + len = rlen + VARHDRSZ; > + #else > rlen = len - VARHDRSZ; > + #endif > > if (rlen > 4096) > elog(ERROR, "bpchar: length of char() must be less than 4096"); > *************** > *** 367,373 **** > --- 374,387 ---- > > /* only reach here if we need to truncate string... */ > > + #ifdef MULTIBYTE > + /* truncate multi-byte string in a way not to break > + multi-byte boundary */ > + len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ); > + slen = len + VARHDRSZ; > + #else > len = slen - VARHDRSZ; > + #endif > > if (len > 4096) > elog(ERROR, "varchar: length of varchar() must be less than 4096"); > Index: backend/utils/mb/mbutils.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v > retrieving revision 1.3 > diff -c -r1.3 mbutils.c > *** mbutils.c 1998/09/01 04:33:22 1.3 > --- mbutils.c 1998/09/24 09:03:38 > *************** > *** 202,207 **** > --- 202,235 ---- > } > > /* > + * returns the length of a multi-byte string > + * (not necessarily NULL terminated) > + * that is not longer than limit. > + * this function does not break multi-byte word boundary. > + */ > + int > + pg_mbcliplen(const unsigned char *mbstr, int len, int limit) > + { > + int clen = 0; > + int l; > + > + while (*mbstr && len > 0) > + { > + l = pg_mblen(mbstr); > + if ((clen + l) > limit) { > + break; > + } > + clen += l; > + if (clen == limit) { > + break; > + } > + len -= l; > + mbstr += l; > + } > + return (clen); > + } > + > + /* > * fuctions for utils/init > */ > static int DatabaseEncoding = MULTIBYTE; > Index: include/mb/pg_wchar.h > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v > retrieving revision 1.4 > diff -c -r1.4 pg_wchar.h > *** pg_wchar.h 1998/09/01 04:36:34 1.4 > --- pg_wchar.h 1998/09/24 09:03:42 > *************** > *** 103,108 **** > --- 103,109 ---- > extern int pg_mic_mblen(const unsigned char *); > extern int pg_mbstrlen(const unsigned char *); > extern int pg_mbstrlen_with_len(const unsigned char *, int); > + extern int pg_mbcliplen(const unsigned char *, int, int); > extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int); > extern bool show_client_encoding(void); > extern bool reset_client_encoding(void); > > -- Bruce Momjian | maillist@candle.pha.pa.us 830 Blythe Avenue | http://www.op.net/~candle Drexel Hill, Pennsylvania 19026 | (610) 353-9879(w) + If your life is a hard drive, | (610) 853-3000(h) + Christ can be your backup. |
pgsql-hackers by date: