Thread: Re: [HACKERS] pg_dump, problem with user defined types?

Re: [HACKERS] pg_dump, problem with user defined types?

From
Keith Parks
Date:
Bruce,

What I can't see from the postings, is what incident or problem
prompted the change.

Was there test or query that failed because it chose the wrong
procedure?

Do you have an example which failed with the old code and needed
the changes to make it work?

I see pg_proc has "pronargs" and "proargtypes" columns, is this
not enough to ensure that the correct procedure is called?

Or am I off at a tangent again!!

Keith.

> Bruce Momjian <maillist@candle.pha.pa.us>
> > Bruce,
> >
> > If you have the relevant discussions in a neat bundle I'd like
> > to have a read through them to see what the issues are.
> >
> > The other curiosity is the creation of an _<typname> type, which
> > I believe is an array type, at the time of type creation.
> >
> > I can't remember seeing this in previous versions.
> >
> > Thanks,
> > Keith.
>
> OK, here are the relivant postings.  Please make a suggestion for a fix.
>


Re: [HACKERS] pg_dump, problem with user defined types?

From
Bruce Momjian
Date:
> Bruce,
>
> What I can't see from the postings, is what incident or problem
> prompted the change.
>
> Was there test or query that failed because it chose the wrong
> procedure?
>
> Do you have an example which failed with the old code and needed
> the changes to make it work?
>
> I see pg_proc has "pronargs" and "proargtypes" columns, is this
> not enough to ensure that the correct procedure is called?
>
> Or am I off at a tangent again!!

I found that the regprocin routine was doing a sequential scan of
pg_proc, looking for a function matching the supplied name, and using
the first matching entry of pg_proc.proname.

Two problems:  You can't use the system cache to look up the pg_proc
value, because the cache only does unique lookups.  Second, we have many
functions that have multiple entries in the pg_proc table with the same
name, but different arguments, so it was not really accurate.

We certainly need to change what I have done, but I am not sure how to
change it.

We can put it back to the old code, or move everything to use only an
oid, with no name, or we can somehow allow the user to supply the
pg_proc.proname, and the argument types, and do a match that way.  We
could allow just the proname if there is only one entry with that
proname.  (Sequtial scan required, but not too bad.  We could even use
the existing index.  It is not done that much.)  If it is not unique, we
would require the oid.

I hope there is some good solution.

The code currently supports input of oid, or proname_oid, and outputs
proname_oid.

--
  Bruce Momjian                        |  maillist@candle.pha.pa.us
  830 Blythe Avenue                    |  http://www.op.net/~candle
  Drexel Hill, Pennsylvania 19026      |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [HACKERS] pg_dump, problem with user defined types?

From
"Thomas G. Lockhart"
Date:
> > What I can't see from the postings, is what incident or problem
> > prompted the change.
> > Was there test or query that failed because it chose the wrong
> > procedure?
> > Do you have an example which failed with the old code and needed
> > the changes to make it work?
> I found that the regprocin routine was doing a sequential scan of
> pg_proc, looking for a function matching the supplied name, and using
> the first matching entry of pg_proc.proname.
> Two problems:  You can't use the system cache to look up the pg_proc
> value, because the cache only does unique lookups.  Second, we have
> many functions that have multiple entries in the pg_proc table with
> the same name, but different arguments, so it was not really accurate.
> We certainly need to change what I have done, but I am not sure how to
> change it.

I'm sorry Bruce for not keeping up, but could you please send me the old
postings which summarize the symptoms of the problem? And any private
response to Keith which answered his initial questions above?

I understand that the regprocin support routine was apparently not using
enough keys in the pg_proc lookup to find the unique entry match, but
where is regprocin used? I assume that it is support code, but is it
used internally only, or are you using it for some of the psql/pg_dump
construction?

From the code used in the parser to support functions and operators, I
would have guessed that you can define multiple key fields for cache
lookups; do those techniques fail in this case or are they not being
used?

Anyway, I'm concerned about the name mangling and the proximity to the
v6.4 release, and would like to help work out the issues if I can...

                       - Tom

fix for multi-byte partial truncating

From
Tatsuo Ishii
Date:
For varchar(n)/char(n) type, input string is silently truncated if it
is longer than n. A multi-byte letter consists of several bytes and
they should not be divided into pieces. Unconditional truncating
multi-byte letters would make partial multi-byte bytes.

Attached patches should fix the problem.

Index: backend/utils/adt/varchar.c
===================================================================
RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v
retrieving revision 1.39
diff -c -r1.39 varchar.c
*** varchar.c    1998/09/01 04:32:53    1.39
--- varchar.c    1998/09/24 09:03:37
***************
*** 147,153 ****
--- 147,160 ----
      if ((len == -1) || (len == VARSIZE(s)))
          return s;

+ #ifdef MULTIBYTE
+     /* truncate multi-byte string in a way not to break
+        multi-byte boundary */
+     rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
+     len = rlen + VARHDRSZ;
+ #else
      rlen = len - VARHDRSZ;
+ #endif

      if (rlen > 4096)
          elog(ERROR, "bpchar: length of char() must be less than 4096");
***************
*** 367,373 ****
--- 374,387 ----

      /* only reach here if we need to truncate string... */

+ #ifdef MULTIBYTE
+     /* truncate multi-byte string in a way not to break
+        multi-byte boundary */
+     len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ);
+     slen = len + VARHDRSZ;
+ #else
      len = slen - VARHDRSZ;
+ #endif

      if (len > 4096)
          elog(ERROR, "varchar: length of varchar() must be less than 4096");
Index: backend/utils/mb/mbutils.c
===================================================================
RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
retrieving revision 1.3
diff -c -r1.3 mbutils.c
*** mbutils.c    1998/09/01 04:33:22    1.3
--- mbutils.c    1998/09/24 09:03:38
***************
*** 202,207 ****
--- 202,235 ----
  }

  /*
+  * returns the length of a multi-byte string
+  * (not necessarily  NULL terminated)
+  * that is not longer than limit.
+  * this function does not break multi-byte word boundary.
+  */
+ int
+ pg_mbcliplen(const unsigned char *mbstr, int len, int limit)
+ {
+     int            clen = 0;
+     int            l;
+
+     while (*mbstr &&  len > 0)
+     {
+         l = pg_mblen(mbstr);
+         if ((clen + l) > limit) {
+             break;
+         }
+         clen += l;
+         if (clen == limit) {
+             break;
+         }
+         len -= l;
+         mbstr += l;
+     }
+     return (clen);
+ }
+
+ /*
   * fuctions for utils/init
   */
  static int    DatabaseEncoding = MULTIBYTE;
Index: include/mb/pg_wchar.h
===================================================================
RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v
retrieving revision 1.4
diff -c -r1.4 pg_wchar.h
*** pg_wchar.h    1998/09/01 04:36:34    1.4
--- pg_wchar.h    1998/09/24 09:03:42
***************
*** 103,108 ****
--- 103,109 ----
  extern int    pg_mic_mblen(const unsigned char *);
  extern int    pg_mbstrlen(const unsigned char *);
  extern int    pg_mbstrlen_with_len(const unsigned char *, int);
+ extern int    pg_mbcliplen(const unsigned char *, int, int);
  extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int);
  extern bool show_client_encoding(void);
  extern bool reset_client_encoding(void);

Re: [HACKERS] fix for multi-byte partial truncating

From
Bruce Momjian
Date:
Applied, but for some reason patch did not like the normal cvs/rcs diff
format.  Not sure why.  Please check to see it is OK.  Looks OK here.


> For varchar(n)/char(n) type, input string is silently truncated if it
> is longer than n. A multi-byte letter consists of several bytes and
> they should not be divided into pieces. Unconditional truncating
> multi-byte letters would make partial multi-byte bytes.
>
> Attached patches should fix the problem.
>
> Index: backend/utils/adt/varchar.c
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v
> retrieving revision 1.39
> diff -c -r1.39 varchar.c
> *** varchar.c    1998/09/01 04:32:53    1.39
> --- varchar.c    1998/09/24 09:03:37
> ***************
> *** 147,153 ****
> --- 147,160 ----
>       if ((len == -1) || (len == VARSIZE(s)))
>           return s;
>
> + #ifdef MULTIBYTE
> +     /* truncate multi-byte string in a way not to break
> +        multi-byte boundary */
> +     rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
> +     len = rlen + VARHDRSZ;
> + #else
>       rlen = len - VARHDRSZ;
> + #endif
>
>       if (rlen > 4096)
>           elog(ERROR, "bpchar: length of char() must be less than 4096");
> ***************
> *** 367,373 ****
> --- 374,387 ----
>
>       /* only reach here if we need to truncate string... */
>
> + #ifdef MULTIBYTE
> +     /* truncate multi-byte string in a way not to break
> +        multi-byte boundary */
> +     len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ);
> +     slen = len + VARHDRSZ;
> + #else
>       len = slen - VARHDRSZ;
> + #endif
>
>       if (len > 4096)
>           elog(ERROR, "varchar: length of varchar() must be less than 4096");
> Index: backend/utils/mb/mbutils.c
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
> retrieving revision 1.3
> diff -c -r1.3 mbutils.c
> *** mbutils.c    1998/09/01 04:33:22    1.3
> --- mbutils.c    1998/09/24 09:03:38
> ***************
> *** 202,207 ****
> --- 202,235 ----
>   }
>
>   /*
> +  * returns the length of a multi-byte string
> +  * (not necessarily  NULL terminated)
> +  * that is not longer than limit.
> +  * this function does not break multi-byte word boundary.
> +  */
> + int
> + pg_mbcliplen(const unsigned char *mbstr, int len, int limit)
> + {
> +     int            clen = 0;
> +     int            l;
> +
> +     while (*mbstr &&  len > 0)
> +     {
> +         l = pg_mblen(mbstr);
> +         if ((clen + l) > limit) {
> +             break;
> +         }
> +         clen += l;
> +         if (clen == limit) {
> +             break;
> +         }
> +         len -= l;
> +         mbstr += l;
> +     }
> +     return (clen);
> + }
> +
> + /*
>    * fuctions for utils/init
>    */
>   static int    DatabaseEncoding = MULTIBYTE;
> Index: include/mb/pg_wchar.h
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v
> retrieving revision 1.4
> diff -c -r1.4 pg_wchar.h
> *** pg_wchar.h    1998/09/01 04:36:34    1.4
> --- pg_wchar.h    1998/09/24 09:03:42
> ***************
> *** 103,108 ****
> --- 103,109 ----
>   extern int    pg_mic_mblen(const unsigned char *);
>   extern int    pg_mbstrlen(const unsigned char *);
>   extern int    pg_mbstrlen_with_len(const unsigned char *, int);
> + extern int    pg_mbcliplen(const unsigned char *, int, int);
>   extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int);
>   extern bool show_client_encoding(void);
>   extern bool reset_client_encoding(void);
>
>


--
  Bruce Momjian                        |  maillist@candle.pha.pa.us
  830 Blythe Avenue                    |  http://www.op.net/~candle
  Drexel Hill, Pennsylvania 19026      |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |

Re: [HACKERS] fix for multi-byte partial truncating

From
Tatsuo Ishii
Date:
>Applied, but for some reason patch did not like the normal cvs/rcs diff
>format.  Not sure why.  Please check to see it is OK.  Looks OK here.

Thank you, Bruce. Everything seems OK too.

But I found a mistake with my patches. bpchar does not pad blanks
anymore! Could you apply following patches to
backend/utils/adt/varchar.c? (the diff is against the current source
tree)

*** varchar.c.orig    Fri Sep 25 15:12:34 1998
--- varchar.c    Fri Sep 25 17:59:47 1998
***************
*** 147,160 ****
      if ((len == -1) || (len == VARSIZE(s)))
          return s;

- #ifdef MULTIBYTE
-     /* truncate multi-byte string in a way not to break
-        multi-byte boundary */
-     rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
-     len = rlen + VARHDRSZ;
- #else
      rlen = len - VARHDRSZ;
- #endif

      if (rlen > 4096)
          elog(ERROR, "bpchar: length of char() must be less than 4096");
--- 147,153 ----
***************
*** 167,173 ****
--- 160,172 ----
      result = (char *) palloc(len);
      VARSIZE(result) = len;
      r = VARDATA(result);
+ #ifdef MULTIBYTE
+     /* truncate multi-byte string in a way not to break
+        multi-byte boundary */
+     slen = pg_mbcliplen(VARDATA(s), rlen, rlen);
+ #else
      slen = VARSIZE(s) - VARHDRSZ;
+ #endif
      s = VARDATA(s);

  #ifdef STRINGDEBUG

Re: [HACKERS] fix for multi-byte partial truncating

From
Bruce Momjian
Date:
Applied.



> >Applied, but for some reason patch did not like the normal cvs/rcs diff
> >format.  Not sure why.  Please check to see it is OK.  Looks OK here.
>
> Thank you, Bruce. Everything seems OK too.
>
> But I found a mistake with my patches. bpchar does not pad blanks
> anymore! Could you apply following patches to
> backend/utils/adt/varchar.c? (the diff is against the current source
> tree)
>
> *** varchar.c.orig    Fri Sep 25 15:12:34 1998
> --- varchar.c    Fri Sep 25 17:59:47 1998
> ***************
> *** 147,160 ****
>       if ((len == -1) || (len == VARSIZE(s)))
>           return s;
>
> - #ifdef MULTIBYTE
> -     /* truncate multi-byte string in a way not to break
> -        multi-byte boundary */
> -     rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
> -     len = rlen + VARHDRSZ;
> - #else
>       rlen = len - VARHDRSZ;
> - #endif
>
>       if (rlen > 4096)
>           elog(ERROR, "bpchar: length of char() must be less than 4096");
> --- 147,153 ----
> ***************
> *** 167,173 ****
> --- 160,172 ----
>       result = (char *) palloc(len);
>       VARSIZE(result) = len;
>       r = VARDATA(result);
> + #ifdef MULTIBYTE
> +     /* truncate multi-byte string in a way not to break
> +        multi-byte boundary */
> +     slen = pg_mbcliplen(VARDATA(s), rlen, rlen);
> + #else
>       slen = VARSIZE(s) - VARHDRSZ;
> + #endif
>       s = VARDATA(s);
>
>   #ifdef STRINGDEBUG
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026