Thread: Re: [HACKERS] pg_dump, problem with user defined types?
Bruce, What I can't see from the postings, is what incident or problem prompted the change. Was there test or query that failed because it chose the wrong procedure? Do you have an example which failed with the old code and needed the changes to make it work? I see pg_proc has "pronargs" and "proargtypes" columns, is this not enough to ensure that the correct procedure is called? Or am I off at a tangent again!! Keith. > Bruce Momjian <maillist@candle.pha.pa.us> > > Bruce, > > > > If you have the relevant discussions in a neat bundle I'd like > > to have a read through them to see what the issues are. > > > > The other curiosity is the creation of an _<typname> type, which > > I believe is an array type, at the time of type creation. > > > > I can't remember seeing this in previous versions. > > > > Thanks, > > Keith. > > OK, here are the relivant postings. Please make a suggestion for a fix. >
> Bruce, > > What I can't see from the postings, is what incident or problem > prompted the change. > > Was there test or query that failed because it chose the wrong > procedure? > > Do you have an example which failed with the old code and needed > the changes to make it work? > > I see pg_proc has "pronargs" and "proargtypes" columns, is this > not enough to ensure that the correct procedure is called? > > Or am I off at a tangent again!! I found that the regprocin routine was doing a sequential scan of pg_proc, looking for a function matching the supplied name, and using the first matching entry of pg_proc.proname. Two problems: You can't use the system cache to look up the pg_proc value, because the cache only does unique lookups. Second, we have many functions that have multiple entries in the pg_proc table with the same name, but different arguments, so it was not really accurate. We certainly need to change what I have done, but I am not sure how to change it. We can put it back to the old code, or move everything to use only an oid, with no name, or we can somehow allow the user to supply the pg_proc.proname, and the argument types, and do a match that way. We could allow just the proname if there is only one entry with that proname. (Sequtial scan required, but not too bad. We could even use the existing index. It is not done that much.) If it is not unique, we would require the oid. I hope there is some good solution. The code currently supports input of oid, or proname_oid, and outputs proname_oid. -- Bruce Momjian | maillist@candle.pha.pa.us 830 Blythe Avenue | http://www.op.net/~candle Drexel Hill, Pennsylvania 19026 | (610) 353-9879(w) + If your life is a hard drive, | (610) 853-3000(h) + Christ can be your backup. |
> > What I can't see from the postings, is what incident or problem > > prompted the change. > > Was there test or query that failed because it chose the wrong > > procedure? > > Do you have an example which failed with the old code and needed > > the changes to make it work? > I found that the regprocin routine was doing a sequential scan of > pg_proc, looking for a function matching the supplied name, and using > the first matching entry of pg_proc.proname. > Two problems: You can't use the system cache to look up the pg_proc > value, because the cache only does unique lookups. Second, we have > many functions that have multiple entries in the pg_proc table with > the same name, but different arguments, so it was not really accurate. > We certainly need to change what I have done, but I am not sure how to > change it. I'm sorry Bruce for not keeping up, but could you please send me the old postings which summarize the symptoms of the problem? And any private response to Keith which answered his initial questions above? I understand that the regprocin support routine was apparently not using enough keys in the pg_proc lookup to find the unique entry match, but where is regprocin used? I assume that it is support code, but is it used internally only, or are you using it for some of the psql/pg_dump construction? From the code used in the parser to support functions and operators, I would have guessed that you can define multiple key fields for cache lookups; do those techniques fail in this case or are they not being used? Anyway, I'm concerned about the name mangling and the proximity to the v6.4 release, and would like to help work out the issues if I can... - Tom
For varchar(n)/char(n) type, input string is silently truncated if it is longer than n. A multi-byte letter consists of several bytes and they should not be divided into pieces. Unconditional truncating multi-byte letters would make partial multi-byte bytes. Attached patches should fix the problem. Index: backend/utils/adt/varchar.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v retrieving revision 1.39 diff -c -r1.39 varchar.c *** varchar.c 1998/09/01 04:32:53 1.39 --- varchar.c 1998/09/24 09:03:37 *************** *** 147,153 **** --- 147,160 ---- if ((len == -1) || (len == VARSIZE(s))) return s; + #ifdef MULTIBYTE + /* truncate multi-byte string in a way not to break + multi-byte boundary */ + rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ); + len = rlen + VARHDRSZ; + #else rlen = len - VARHDRSZ; + #endif if (rlen > 4096) elog(ERROR, "bpchar: length of char() must be less than 4096"); *************** *** 367,373 **** --- 374,387 ---- /* only reach here if we need to truncate string... */ + #ifdef MULTIBYTE + /* truncate multi-byte string in a way not to break + multi-byte boundary */ + len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ); + slen = len + VARHDRSZ; + #else len = slen - VARHDRSZ; + #endif if (len > 4096) elog(ERROR, "varchar: length of varchar() must be less than 4096"); Index: backend/utils/mb/mbutils.c =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v retrieving revision 1.3 diff -c -r1.3 mbutils.c *** mbutils.c 1998/09/01 04:33:22 1.3 --- mbutils.c 1998/09/24 09:03:38 *************** *** 202,207 **** --- 202,235 ---- } /* + * returns the length of a multi-byte string + * (not necessarily NULL terminated) + * that is not longer than limit. + * this function does not break multi-byte word boundary. + */ + int + pg_mbcliplen(const unsigned char *mbstr, int len, int limit) + { + int clen = 0; + int l; + + while (*mbstr && len > 0) + { + l = pg_mblen(mbstr); + if ((clen + l) > limit) { + break; + } + clen += l; + if (clen == limit) { + break; + } + len -= l; + mbstr += l; + } + return (clen); + } + + /* * fuctions for utils/init */ static int DatabaseEncoding = MULTIBYTE; Index: include/mb/pg_wchar.h =================================================================== RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v retrieving revision 1.4 diff -c -r1.4 pg_wchar.h *** pg_wchar.h 1998/09/01 04:36:34 1.4 --- pg_wchar.h 1998/09/24 09:03:42 *************** *** 103,108 **** --- 103,109 ---- extern int pg_mic_mblen(const unsigned char *); extern int pg_mbstrlen(const unsigned char *); extern int pg_mbstrlen_with_len(const unsigned char *, int); + extern int pg_mbcliplen(const unsigned char *, int, int); extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int); extern bool show_client_encoding(void); extern bool reset_client_encoding(void);
Applied, but for some reason patch did not like the normal cvs/rcs diff format. Not sure why. Please check to see it is OK. Looks OK here. > For varchar(n)/char(n) type, input string is silently truncated if it > is longer than n. A multi-byte letter consists of several bytes and > they should not be divided into pieces. Unconditional truncating > multi-byte letters would make partial multi-byte bytes. > > Attached patches should fix the problem. > > Index: backend/utils/adt/varchar.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v > retrieving revision 1.39 > diff -c -r1.39 varchar.c > *** varchar.c 1998/09/01 04:32:53 1.39 > --- varchar.c 1998/09/24 09:03:37 > *************** > *** 147,153 **** > --- 147,160 ---- > if ((len == -1) || (len == VARSIZE(s))) > return s; > > + #ifdef MULTIBYTE > + /* truncate multi-byte string in a way not to break > + multi-byte boundary */ > + rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ); > + len = rlen + VARHDRSZ; > + #else > rlen = len - VARHDRSZ; > + #endif > > if (rlen > 4096) > elog(ERROR, "bpchar: length of char() must be less than 4096"); > *************** > *** 367,373 **** > --- 374,387 ---- > > /* only reach here if we need to truncate string... */ > > + #ifdef MULTIBYTE > + /* truncate multi-byte string in a way not to break > + multi-byte boundary */ > + len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ); > + slen = len + VARHDRSZ; > + #else > len = slen - VARHDRSZ; > + #endif > > if (len > 4096) > elog(ERROR, "varchar: length of varchar() must be less than 4096"); > Index: backend/utils/mb/mbutils.c > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v > retrieving revision 1.3 > diff -c -r1.3 mbutils.c > *** mbutils.c 1998/09/01 04:33:22 1.3 > --- mbutils.c 1998/09/24 09:03:38 > *************** > *** 202,207 **** > --- 202,235 ---- > } > > /* > + * returns the length of a multi-byte string > + * (not necessarily NULL terminated) > + * that is not longer than limit. > + * this function does not break multi-byte word boundary. > + */ > + int > + pg_mbcliplen(const unsigned char *mbstr, int len, int limit) > + { > + int clen = 0; > + int l; > + > + while (*mbstr && len > 0) > + { > + l = pg_mblen(mbstr); > + if ((clen + l) > limit) { > + break; > + } > + clen += l; > + if (clen == limit) { > + break; > + } > + len -= l; > + mbstr += l; > + } > + return (clen); > + } > + > + /* > * fuctions for utils/init > */ > static int DatabaseEncoding = MULTIBYTE; > Index: include/mb/pg_wchar.h > =================================================================== > RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v > retrieving revision 1.4 > diff -c -r1.4 pg_wchar.h > *** pg_wchar.h 1998/09/01 04:36:34 1.4 > --- pg_wchar.h 1998/09/24 09:03:42 > *************** > *** 103,108 **** > --- 103,109 ---- > extern int pg_mic_mblen(const unsigned char *); > extern int pg_mbstrlen(const unsigned char *); > extern int pg_mbstrlen_with_len(const unsigned char *, int); > + extern int pg_mbcliplen(const unsigned char *, int, int); > extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int); > extern bool show_client_encoding(void); > extern bool reset_client_encoding(void); > > -- Bruce Momjian | maillist@candle.pha.pa.us 830 Blythe Avenue | http://www.op.net/~candle Drexel Hill, Pennsylvania 19026 | (610) 353-9879(w) + If your life is a hard drive, | (610) 853-3000(h) + Christ can be your backup. |
>Applied, but for some reason patch did not like the normal cvs/rcs diff >format. Not sure why. Please check to see it is OK. Looks OK here. Thank you, Bruce. Everything seems OK too. But I found a mistake with my patches. bpchar does not pad blanks anymore! Could you apply following patches to backend/utils/adt/varchar.c? (the diff is against the current source tree) *** varchar.c.orig Fri Sep 25 15:12:34 1998 --- varchar.c Fri Sep 25 17:59:47 1998 *************** *** 147,160 **** if ((len == -1) || (len == VARSIZE(s))) return s; - #ifdef MULTIBYTE - /* truncate multi-byte string in a way not to break - multi-byte boundary */ - rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ); - len = rlen + VARHDRSZ; - #else rlen = len - VARHDRSZ; - #endif if (rlen > 4096) elog(ERROR, "bpchar: length of char() must be less than 4096"); --- 147,153 ---- *************** *** 167,173 **** --- 160,172 ---- result = (char *) palloc(len); VARSIZE(result) = len; r = VARDATA(result); + #ifdef MULTIBYTE + /* truncate multi-byte string in a way not to break + multi-byte boundary */ + slen = pg_mbcliplen(VARDATA(s), rlen, rlen); + #else slen = VARSIZE(s) - VARHDRSZ; + #endif s = VARDATA(s); #ifdef STRINGDEBUG
Applied. > >Applied, but for some reason patch did not like the normal cvs/rcs diff > >format. Not sure why. Please check to see it is OK. Looks OK here. > > Thank you, Bruce. Everything seems OK too. > > But I found a mistake with my patches. bpchar does not pad blanks > anymore! Could you apply following patches to > backend/utils/adt/varchar.c? (the diff is against the current source > tree) > > *** varchar.c.orig Fri Sep 25 15:12:34 1998 > --- varchar.c Fri Sep 25 17:59:47 1998 > *************** > *** 147,160 **** > if ((len == -1) || (len == VARSIZE(s))) > return s; > > - #ifdef MULTIBYTE > - /* truncate multi-byte string in a way not to break > - multi-byte boundary */ > - rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ); > - len = rlen + VARHDRSZ; > - #else > rlen = len - VARHDRSZ; > - #endif > > if (rlen > 4096) > elog(ERROR, "bpchar: length of char() must be less than 4096"); > --- 147,153 ---- > *************** > *** 167,173 **** > --- 160,172 ---- > result = (char *) palloc(len); > VARSIZE(result) = len; > r = VARDATA(result); > + #ifdef MULTIBYTE > + /* truncate multi-byte string in a way not to break > + multi-byte boundary */ > + slen = pg_mbcliplen(VARDATA(s), rlen, rlen); > + #else > slen = VARSIZE(s) - VARHDRSZ; > + #endif > s = VARDATA(s); > > #ifdef STRINGDEBUG > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026