Thread: Re: [HACKERS] [GENERAL] workaround for lack of REPLACE() function
Tatsuo Ishii wrote: > Joe Conway wrote: >>Any objection if I rework this function to meet SQL92 and fix the bug? > > I don't object. > >>Or is the SQL92 part not desirable because it breaks backward >>compatability? > > I don't think so. > >>In any case, can the #ifdef MULTIBYTE's be removed now in favor of a >>test for encoding max length? > > Sure. <sorry so long-winded> Attached is a patch that implements the above items wrt text_substr(). I also modified textlen(), textoctetlen(), byteaoctetlen(), and bytea_substr(). Here's a summary of the change to each: - text_substr(): rewrite function to meet SQL92, fix MB related bug, and remove #ifdef MULTIBYTE. - bytea_substr(): same as text_substr() enc max len == 1. - textoctetlen(), byteaoctetlen(): use toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ) to avoid detoasting. - textlen(): same as textoctetlen() for enc max len == 1, and remove #ifdef MULTIBYTE. I did some benchmarking to ensure no performance degradation, and to help me understand MB and related performance issues. The results were very enlightening: =================================================================== First test - textlen() (already reported, repeated here for completeness): ------------------------------------------------------------------- create table strtest(f1 text); do 100 times insert into strtest values('12345....'); -- 100000 characters loop do 1000 times select length(f1) from strtest; loop ------------------------------------------------------------------- Results: SQL_ASCII database, new code 2 seconds SQL_ASCII database, old code 66 seconds EUC_JP database, new & old code 469 seconds =================================================================== Second test - short string test: ------------------------------------------------------------------- create table parts(partnum text); <fill with ~220000 rows, 8 to 12 characters each> do 300 times select substr(partnum, 3, 3) from parts; loop ------------------------------------------------------------------- Results: SQL_ASCII database, old code 352 seconds SQL_ASCII database, new code 350 seconds EUC_JP database, old code 461 seconds EUC_JP database, new code 422 seconds =================================================================== Third test - long string, EXTENDED storage (EXTERNAL+COMPRESSED): ------------------------------------------------------------------- create table strtest(f1 text); do 100 times insert into strtest values('12345....'); -- 100000 characters loop do 1000 times select substr(f1, 89000, 10000) from strtest; loop ------------------------------------------------------------------- Results: SQL_ASCII database, old code 59 seconds SQL_ASCII database, new code 58 seconds EUC_JP database, old code 915 seconds EUC_JP database, new code 912 seconds =================================================================== Forth test - long string, EXTERNAL storage (not COMPRESSED) ------------------------------------------------------------------- create table strtest(f1 text); do 100 times insert into strtest values('12345....'); -- 100000 characters loop do 1000 times select substr(f1, 89000, 10000) from strtest; loop ------------------------------------------------------------------- Results: SQL_ASCII database, old code 17 seconds SQL_ASCII database, new code 17 seconds EUC_JP database, old code 918 seconds EUC_JP database, new code 911 seconds The only remaining problem is that this causes opr_sanity to fail based on this query: -- Considering only built-in procs (prolang = 12), look for multiple -- uses of the same internal function (ie, matching prosrc fields). -- It's OK to have several entries with different pronames for the same -- internal function, but conflicts in the number of arguments and other -- critical items should be complained of. SELECT p1.oid, p1.proname, p2.oid, p2.proname FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid != p2.oid AND p1.prosrc = p2.prosrc AND p1.prolang = 12 AND p2.prolang = 12 AND (p1.prolang != p2.prolang OR p1.proisagg != p2.proisagg OR p1.prosecdef != p2.prosecdef OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); This fails because I implemented text_substr() and bytea_substr() to take either 2 or 3 args. This was necessary for SQL92 spec compliance. SQL92 requires L < 0 to throw an error, and L IS NULL to return NULL. It also requires that if L is not provided, the length to the end of the string is assumed. Current code handles L IS NULL correctly but not L < 0 -- it assumes L < 0 is the same as L is not provided. By allowing the function to determine if it was passed 2 or 3 args, this can be handled properly. So the question is, can/should I change opr_sanity to allow this case? I also still owe some additions to the strings regression test to make it cover toasted values. Other than those two issues, I think the patch is ready to go. I'm planning to take on the replace function next. Thanks, Joe Index: src/backend/utils/adt/varlena.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v retrieving revision 1.87 diff -c -r1.87 varlena.c *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 --- src/backend/utils/adt/varlena.c 14 Aug 2002 05:42:45 -0000 *************** *** 18,23 **** --- 18,24 ---- #include "mb/pg_wchar.h" #include "miscadmin.h" + #include "access/tuptoaster.h" #include "utils/builtins.h" #include "utils/pg_locale.h" *************** *** 285,303 **** Datum textlen(PG_FUNCTION_ARGS) { ! text *t = PG_GETARG_TEXT_P(0); ! #ifdef MULTIBYTE ! /* optimization for single byte encoding */ ! if (pg_database_encoding_max_length() <= 1) ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ! ! PG_RETURN_INT32( ! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ) ! ); ! #else ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ! #endif } /* --- 286,309 ---- Datum textlen(PG_FUNCTION_ARGS) { ! /* fastpath when max encoding length is one */ ! if (pg_database_encoding_max_length() == 1) ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); ! if (pg_database_encoding_max_length() > 1) ! { ! text *t = PG_GETARG_TEXT_P(0); ! ! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t), ! VARSIZE(t) - VARHDRSZ)); ! } ! ! /* should never get here */ ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! ! /* notreached: suppress compiler warning */ ! PG_RETURN_NULL(); } /* *************** *** 308,316 **** Datum textoctetlen(PG_FUNCTION_ARGS) { ! text *arg = PG_GETARG_TEXT_P(0); ! ! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ); } /* --- 314,320 ---- Datum textoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); } /* *************** *** 358,363 **** --- 362,375 ---- PG_RETURN_TEXT_P(result); } + + #define PG_TEXTARG_GET_STR(arg_) \ + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) + #define PG_TEXT_GET_STR(textp_) \ + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) + #define PG_STR_GET_TEXT(str_) \ + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) + /* * text_substr() * Return a substring starting at the specified position. *************** *** 382,471 **** * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 */ Datum text_substr(PG_FUNCTION_ARGS) { ! text *string; ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2); ! int32 sm; ! int32 sn; ! int eml = 1; ! #ifdef MULTIBYTE ! int i; ! int len; ! text *ret; ! char *p; ! #endif ! /* ! * starting position before the start of the string? then offset into ! * the string per SQL92 spec... ! */ ! if (m < 1) { ! n += (m - 1); ! m = 1; ! } ! /* Check for m > octet length is made in TOAST access routine */ ! /* m will now become a zero-based starting position */ ! sm = m - 1; ! sn = n; ! #ifdef MULTIBYTE ! eml = pg_database_encoding_max_length (); ! if (eml > 1) ! { ! sm = 0; ! if (n > -1) ! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */ else ! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */ ! } ! #endif ! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn); ! if (eml == 1) ! { ! PG_RETURN_TEXT_P (string); ! } ! #ifndef MULTIBYTE ! PG_RETURN_NULL(); /* notreached: suppress compiler warning */ ! #endif ! #ifdef MULTIBYTE ! if (n > -1) ! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3); ! else /* n < 0 is special-cased; need full string length */ ! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ); ! ! if (m > len) ! { ! m = 1; ! n = 0; ! } ! m--; ! if (((m + n) > len) || (n < 0)) ! n = (len - m); ! ! p = VARDATA(string); ! for (i = 0; i < m; i++) ! p += pg_mblen(p); ! m = p - VARDATA(string); ! for (i = 0; i < n; i++) ! p += pg_mblen(p); ! n = p - (VARDATA(string) + m); ! ret = (text *) palloc(VARHDRSZ + n); ! VARATT_SIZEP(ret) = VARHDRSZ + n; ! memcpy(VARDATA(ret), VARDATA(string) + m, n); ! PG_RETURN_TEXT_P(ret); ! #endif } /* --- 394,582 ---- * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw + * error; if E < 1, return '', not entire string). Fixed MB related bug when + * S > LC and < LC + 4 sometimes garbage characters are returned. + * - Joe Conway 2002-08-10 */ Datum text_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int eml = pg_database_encoding_max_length(); ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */ ! /* life is easy if the encoding max length is 1 */ ! if (eml == 1) { ! S1 = Max(S, 1); ! if (fcinfo->nargs == 2) ! { ! /* ! * Not passed a length - PG_GETARG_TEXT_P_SLICE() ! * grabs everything to the end of the string if we pass it ! * a negative value for length. ! */ ! L1 = -1; ! } ! else ! { ! /* end position */ ! int E = S + PG_GETARG_INT32(2); ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! ! L1 = E - S1; ! } ! ! /* ! * If the start position is past the end of the string, ! * SQL99 says to return a zero-length string -- ! * PG_GETARG_TEXT_P_SLICE() will do that for us. ! * Convert to zero-based starting position ! */ ! PG_RETURN_TEXT_P (PG_GETARG_TEXT_P_SLICE (0, S1 - 1, L1)); ! } ! else if (eml > 1) ! { ! /* ! * When encoding max length is > 1, we can't get LC without ! * detoasting, so we'll grab a conservatively large slice ! * now and go back later to do the right thing ! */ ! int slice_start; ! int slice_size; ! int slice_strlen; ! text *slice; ! int E1; ! int i; ! char *p; ! char *s; ! text *ret; ! ! /* ! * if S is past the end of the string, the tuple toaster ! * will return a zero-length string to us ! */ ! S1 = Max(S, 1); ! ! /* ! * We need to start at position zero because there is no ! * way to know in advance which byte offset corresponds to ! * the supplied start position. ! */ ! slice_start = 0; ! ! if (fcinfo->nargs == 2) ! { ! /* ! * If we were not passed a length, the spec says that ! * E = Max(LC + 1, S). Since we don't know LC yet, set ! * slice_size = -1 which will cause heap_tuple_untoast_attr_slice ! * to give use everything to the end of the string. ! * If S > LC + 1, we'll get back a zero length string anyway. ! */ ! slice_size = L1 = -1; ! } else ! { ! int E = S + PG_GETARG_INT32(2); ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! /* ! * if E is past the end of the string, the tuple toaster ! * will truncate the length for us ! */ ! L1 = E - S1; ! /* ! * Total slice size in bytes can't be any longer than the start ! * position plus substring length times the encoding max length. ! */ ! slice_size = (S1 + L1) * eml; ! } ! slice = PG_GETARG_TEXT_P_SLICE (0, slice_start, slice_size); ! /* see if we got back an empty string */ ! if ((VARSIZE(slice) - VARHDRSZ) == 0) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! ! /* Now we can get the actual length of the slice in MB characters */ ! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ); ! ! /* Check that the start position wasn't > slice_strlen. If so, ! * SQL99 says to return a zero-length string. ! */ ! if (S1 > slice_strlen) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! ! /* ! * Adjust L1 and E1 now that we know the slice string length. ! * Again remember that S1 is one based, and slice_start is zero based. ! */ ! if (L1 > -1) ! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen); ! else ! E1 = slice_start + 1 + slice_strlen; ! ! /* ! * Find the start position in the slice; ! * remember S1 is not zero based ! */ ! p = VARDATA(slice); ! for (i = 0; i < S1 - 1; i++) ! p += pg_mblen(p); ! ! /* hang onto a pointer to our start position */ ! s = p; ! ! /* ! * Count the actual bytes used by the substring of ! * the requested length. ! */ ! for (i = S1; i < E1; i++) ! p += pg_mblen(p); ! ! ret = (text *) palloc(VARHDRSZ + (p - s)); ! VARATT_SIZEP(ret) = VARHDRSZ + (p - s); ! memcpy(VARDATA(ret), s, (p - s)); ! ! PG_RETURN_TEXT_P(ret); ! } ! else ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! ! /* notreached: suppress compiler warning */ ! PG_RETURN_NULL(); } /* *************** *** 758,766 **** Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! bytea *v = PG_GETARG_BYTEA_P(0); ! ! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ); } /* --- 869,875 ---- Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); } /* *************** *** 805,810 **** --- 914,921 ---- PG_RETURN_BYTEA_P(result); } + #define PG_STR_GET_BYTEA(str_) \ + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) /* * bytea_substr() * Return a substring starting at the specified position. *************** *** 813,845 **** * Input: * - string * - starting position (is one-based) ! * - string length * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, return the remaining string. ! * */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2); ! ! /* ! * starting position before the start of the string? then offset into ! * the string per SQL92 spec... ! */ ! if (m < 1) ! { ! n += (m - 1); ! m = 1; } ! /* m will now become a zero-based starting position */ ! m--; ! ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n)); } /* --- 924,983 ---- * Input: * - string * - starting position (is one-based) ! * - string length (optional) * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, an ERROR is thrown. If no third argument ! * (length) is provided, the length to the end of the string is assumed. */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */ ! ! S1 = Max(S, 1); ! ! if (fcinfo->nargs == 2) ! { ! /* ! * Not passed a length - PG_GETARG_BYTEA_P_SLICE() ! * grabs everything to the end of the string if we pass it ! * a negative value for length. ! */ ! L1 = -1; } + else + { + /* end position */ + int E = S + PG_GETARG_INT32(2); ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA("")); ! ! L1 = E - S1; ! } ! ! /* ! * If the start position is past the end of the string, ! * SQL99 says to return a zero-length string -- ! * PG_GETARG_TEXT_P_SLICE() will do that for us. ! * Convert to zero-based starting position ! */ ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1)); } /* Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v retrieving revision 1.253 diff -c -r1.253 pg_proc.h *** src/include/catalog/pg_proc.h 9 Aug 2002 16:45:15 -0000 1.253 --- src/include/catalog/pg_proc.h 14 Aug 2002 02:34:38 -0000 *************** *** 2121,2127 **** DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); --- 2121,2127 ---- DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); *************** *** 2130,2136 **** DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_)); DESCR("return portion of string"); /* for multi-byte support */ --- 2130,2136 ---- DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr - _null_ )); DESCR("return portion of string"); /* for multi-byte support */ *************** *** 2778,2784 **** DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" -_null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); --- 2778,2784 ---- DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring");
Joe Conway <mail@joeconway.com> writes: > The only remaining problem is that this causes opr_sanity to fail based > on this query: ... > This fails because I implemented text_substr() and bytea_substr() to > take either 2 or 3 args. This was necessary for SQL92 spec compliance. Rather than loosening the opr_sanity test, I'd suggest setting this up as two separate builtin functions. They can call a common implementation routine if you like. But a runtime test on the number of arguments doesn't offer any attractive improvement. > I'm planning to take on the replace function next. Isn't Gavin on that already? regards, tom lane
Tom Lane wrote: >>I'm planning to take on the replace function next. > Isn't Gavin on that already? No, sorry for the confusion. I meant: replace(bigstring, substr, newsubstr) which I discussed (mainly with with Thomas) a few weeks ago. This current work was a warmup, since I wasn't comfortable with MB character handling, and noticed some issues whilst studying it. Joe
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: > >>The only remaining problem is that this causes opr_sanity to fail based >>on this query: ... >>This fails because I implemented text_substr() and bytea_substr() to >>take either 2 or 3 args. This was necessary for SQL92 spec compliance. > > > Rather than loosening the opr_sanity test, I'd suggest setting this > up as two separate builtin functions. They can call a common > implementation routine if you like. But a runtime test on the number > of arguments doesn't offer any attractive improvement. I took Tom's advice and added wrapper functions around text_substr() and bytea_substr() to cover the 2 argument case. I also added tests to strings.sql to cover substr() on toasted columns of both text and bytea. If there are no objections, please apply. Thanks, Joe Index: src/backend/utils/adt/varlena.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v retrieving revision 1.87 diff -c -r1.87 varlena.c *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 --- src/backend/utils/adt/varlena.c 14 Aug 2002 16:44:36 -0000 *************** *** 18,23 **** --- 18,24 ---- #include "mb/pg_wchar.h" #include "miscadmin.h" + #include "access/tuptoaster.h" #include "utils/builtins.h" #include "utils/pg_locale.h" *************** *** 285,303 **** Datum textlen(PG_FUNCTION_ARGS) { ! text *t = PG_GETARG_TEXT_P(0); ! #ifdef MULTIBYTE ! /* optimization for single byte encoding */ ! if (pg_database_encoding_max_length() <= 1) ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ! ! PG_RETURN_INT32( ! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ) ! ); ! #else ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ! #endif } /* --- 286,309 ---- Datum textlen(PG_FUNCTION_ARGS) { ! /* fastpath when max encoding length is one */ ! if (pg_database_encoding_max_length() == 1) ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); ! if (pg_database_encoding_max_length() > 1) ! { ! text *t = PG_GETARG_TEXT_P(0); ! ! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t), ! VARSIZE(t) - VARHDRSZ)); ! } ! ! /* should never get here */ ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! ! /* notreached: suppress compiler warning */ ! PG_RETURN_NULL(); } /* *************** *** 308,316 **** Datum textoctetlen(PG_FUNCTION_ARGS) { ! text *arg = PG_GETARG_TEXT_P(0); ! ! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ); } /* --- 314,320 ---- Datum textoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); } /* *************** *** 358,363 **** --- 362,375 ---- PG_RETURN_TEXT_P(result); } + + #define PG_TEXTARG_GET_STR(arg_) \ + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) + #define PG_TEXT_GET_STR(textp_) \ + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) + #define PG_STR_GET_TEXT(str_) \ + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) + /* * text_substr() * Return a substring starting at the specified position. *************** *** 382,471 **** * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 */ Datum text_substr(PG_FUNCTION_ARGS) { ! text *string; ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2); ! int32 sm; ! int32 sn; ! int eml = 1; ! #ifdef MULTIBYTE ! int i; ! int len; ! text *ret; ! char *p; ! #endif ! /* ! * starting position before the start of the string? then offset into ! * the string per SQL92 spec... ! */ ! if (m < 1) { ! n += (m - 1); ! m = 1; ! } ! /* Check for m > octet length is made in TOAST access routine */ ! /* m will now become a zero-based starting position */ ! sm = m - 1; ! sn = n; ! #ifdef MULTIBYTE ! eml = pg_database_encoding_max_length (); ! if (eml > 1) ! { ! sm = 0; ! if (n > -1) ! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */ else ! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */ ! } ! #endif ! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn); ! if (eml == 1) ! { ! PG_RETURN_TEXT_P (string); ! } ! #ifndef MULTIBYTE ! PG_RETURN_NULL(); /* notreached: suppress compiler warning */ ! #endif ! #ifdef MULTIBYTE ! if (n > -1) ! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3); ! else /* n < 0 is special-cased; need full string length */ ! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ); ! ! if (m > len) ! { ! m = 1; ! n = 0; ! } ! m--; ! if (((m + n) > len) || (n < 0)) ! n = (len - m); ! ! p = VARDATA(string); ! for (i = 0; i < m; i++) ! p += pg_mblen(p); ! m = p - VARDATA(string); ! for (i = 0; i < n; i++) ! p += pg_mblen(p); ! n = p - (VARDATA(string) + m); ! ret = (text *) palloc(VARHDRSZ + n); ! VARATT_SIZEP(ret) = VARHDRSZ + n; ! memcpy(VARDATA(ret), VARDATA(string) + m, n); ! PG_RETURN_TEXT_P(ret); ! #endif } /* --- 394,593 ---- * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw + * error; if E < 1, return '', not entire string). Fixed MB related bug when + * S > LC and < LC + 4 sometimes garbage characters are returned. + * - Joe Conway 2002-08-10 */ Datum text_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int eml = pg_database_encoding_max_length(); ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */ ! /* life is easy if the encoding max length is 1 */ ! if (eml == 1) { ! S1 = Max(S, 1); ! ! if (fcinfo->nargs == 2) ! { ! /* ! * Not passed a length - PG_GETARG_TEXT_P_SLICE() ! * grabs everything to the end of the string if we pass it ! * a negative value for length. ! */ ! L1 = -1; ! } ! else ! { ! /* end position */ ! int E = S + PG_GETARG_INT32(2); ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! L1 = E - S1; ! } ! ! /* ! * If the start position is past the end of the string, ! * SQL99 says to return a zero-length string -- ! * PG_GETARG_TEXT_P_SLICE() will do that for us. ! * Convert to zero-based starting position ! */ ! PG_RETURN_TEXT_P (PG_GETARG_TEXT_P_SLICE (0, S1 - 1, L1)); ! } ! else if (eml > 1) ! { ! /* ! * When encoding max length is > 1, we can't get LC without ! * detoasting, so we'll grab a conservatively large slice ! * now and go back later to do the right thing ! */ ! int slice_start; ! int slice_size; ! int slice_strlen; ! text *slice; ! int E1; ! int i; ! char *p; ! char *s; ! text *ret; ! ! /* ! * if S is past the end of the string, the tuple toaster ! * will return a zero-length string to us ! */ ! S1 = Max(S, 1); ! ! /* ! * We need to start at position zero because there is no ! * way to know in advance which byte offset corresponds to ! * the supplied start position. ! */ ! slice_start = 0; ! ! if (fcinfo->nargs == 2) ! { ! /* ! * If we were not passed a length, the spec says that ! * E = Max(LC + 1, S). Since we don't know LC yet, set ! * slice_size = -1 which will cause heap_tuple_untoast_attr_slice ! * to give use everything to the end of the string. ! * If S > LC + 1, we'll get back a zero length string anyway. ! */ ! slice_size = L1 = -1; ! } else ! { ! int E = S + PG_GETARG_INT32(2); ! ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! /* ! * if E is past the end of the string, the tuple toaster ! * will truncate the length for us ! */ ! L1 = E - S1; ! ! /* ! * Total slice size in bytes can't be any longer than the start ! * position plus substring length times the encoding max length. ! */ ! slice_size = (S1 + L1) * eml; ! } ! slice = PG_GETARG_TEXT_P_SLICE (0, slice_start, slice_size); ! /* see if we got back an empty string */ ! if ((VARSIZE(slice) - VARHDRSZ) == 0) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! ! /* Now we can get the actual length of the slice in MB characters */ ! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ); ! ! /* Check that the start position wasn't > slice_strlen. If so, ! * SQL99 says to return a zero-length string. ! */ ! if (S1 > slice_strlen) ! PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); ! ! /* ! * Adjust L1 and E1 now that we know the slice string length. ! * Again remember that S1 is one based, and slice_start is zero based. ! */ ! if (L1 > -1) ! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen); ! else ! E1 = slice_start + 1 + slice_strlen; ! /* ! * Find the start position in the slice; ! * remember S1 is not zero based ! */ ! p = VARDATA(slice); ! for (i = 0; i < S1 - 1; i++) ! p += pg_mblen(p); ! ! /* hang onto a pointer to our start position */ ! s = p; ! ! /* ! * Count the actual bytes used by the substring of ! * the requested length. ! */ ! for (i = S1; i < E1; i++) ! p += pg_mblen(p); ! ! ret = (text *) palloc(VARHDRSZ + (p - s)); ! VARATT_SIZEP(ret) = VARHDRSZ + (p - s); ! memcpy(VARDATA(ret), s, (p - s)); ! PG_RETURN_TEXT_P(ret); ! } ! else ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! ! /* notreached: suppress compiler warning */ ! PG_RETURN_NULL(); ! } ! ! /* ! * text_substr_no_len - ! * Wrapper to avoid opr_sanity failure due to ! * one function accepting a different number of args. ! */ ! Datum ! text_substr_no_len(PG_FUNCTION_ARGS) ! { ! return text_substr(fcinfo); } /* *************** *** 758,766 **** Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! bytea *v = PG_GETARG_BYTEA_P(0); ! ! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ); } /* --- 880,886 ---- Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); } /* *************** *** 805,810 **** --- 925,932 ---- PG_RETURN_BYTEA_P(result); } + #define PG_STR_GET_BYTEA(str_) \ + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) /* * bytea_substr() * Return a substring starting at the specified position. *************** *** 813,845 **** * Input: * - string * - starting position (is one-based) ! * - string length * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, return the remaining string. ! * */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2); ! ! /* ! * starting position before the start of the string? then offset into ! * the string per SQL92 spec... ! */ ! if (m < 1) ! { ! n += (m - 1); ! m = 1; } ! /* m will now become a zero-based starting position */ ! m--; ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n)); } /* --- 935,1005 ---- * Input: * - string * - starting position (is one-based) ! * - string length (optional) * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, an ERROR is thrown. If no third argument ! * (length) is provided, the length to the end of the string is assumed. */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */ ! ! S1 = Max(S, 1); ! ! if (fcinfo->nargs == 2) ! { ! /* ! * Not passed a length - PG_GETARG_BYTEA_P_SLICE() ! * grabs everything to the end of the string if we pass it ! * a negative value for length. ! */ ! L1 = -1; } + else + { + /* end position */ + int E = S + PG_GETARG_INT32(2); ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA("")); ! ! L1 = E - S1; ! } ! ! /* ! * If the start position is past the end of the string, ! * SQL99 says to return a zero-length string -- ! * PG_GETARG_TEXT_P_SLICE() will do that for us. ! * Convert to zero-based starting position ! */ ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1)); ! } ! /* ! * bytea_substr_no_len - ! * Wrapper to avoid opr_sanity failure due to ! * one function accepting a different number of args. ! */ ! Datum ! bytea_substr_no_len(PG_FUNCTION_ARGS) ! { ! return bytea_substr(fcinfo); } /* Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v retrieving revision 1.253 diff -c -r1.253 pg_proc.h *** src/include/catalog/pg_proc.h 9 Aug 2002 16:45:15 -0000 1.253 --- src/include/catalog/pg_proc.h 14 Aug 2002 17:44:01 -0000 *************** *** 2121,2127 **** DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); --- 2121,2127 ---- DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); *************** *** 2130,2136 **** DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_)); DESCR("return portion of string"); /* for multi-byte support */ --- 2130,2136 ---- DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); /* for multi-byte support */ *************** *** 2778,2784 **** DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" -_null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); --- 2778,2788 ---- DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2084 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); Index: src/include/utils/builtins.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v retrieving revision 1.190 diff -c -r1.190 builtins.h *** src/include/utils/builtins.h 9 Aug 2002 16:45:16 -0000 1.190 --- src/include/utils/builtins.h 14 Aug 2002 16:35:26 -0000 *************** *** 447,452 **** --- 447,453 ---- extern Datum textoctetlen(PG_FUNCTION_ARGS); extern Datum textpos(PG_FUNCTION_ARGS); extern Datum text_substr(PG_FUNCTION_ARGS); + extern Datum text_substr_no_len(PG_FUNCTION_ARGS); extern Datum name_text(PG_FUNCTION_ARGS); extern Datum text_name(PG_FUNCTION_ARGS); extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2); *************** *** 476,481 **** --- 477,483 ---- extern Datum byteacat(PG_FUNCTION_ARGS); extern Datum byteapos(PG_FUNCTION_ARGS); extern Datum bytea_substr(PG_FUNCTION_ARGS); + extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS); /* version.c */ extern Datum pgsql_version(PG_FUNCTION_ARGS); Index: src/test/regress/expected/strings.out =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v retrieving revision 1.12 diff -c -r1.12 strings.out *** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12 --- src/test/regress/expected/strings.out 14 Aug 2002 17:52:39 -0000 *************** *** 573,575 **** --- 573,649 ---- text and varchar (1 row) + -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; Index: src/test/regress/sql/strings.sql =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v retrieving revision 1.8 diff -c -r1.8 strings.sql *** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8 --- src/test/regress/sql/strings.sql 14 Aug 2002 17:42:36 -0000 *************** *** 197,199 **** --- 197,249 ---- SELECT text 'text' || char(20) ' and characters' AS "Concat text to char"; SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar"; + + -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest;
Joe Conway wrote: > I took Tom's advice and added wrapper functions around text_substr() and > bytea_substr() to cover the 2 argument case. > > I also added tests to strings.sql to cover substr() on toasted columns > of both text and bytea. > Please replace the original patch (substr.2002.08.14.1.patch) with the attached. It includes everything from the previous one, plus newly implemented builtin functions: replace(string, from, to) -- replaces all occurrences of "from" in "string" to "to" split(string, fldsep, column) -- splits "string" on "fldsep" and returns "column" number piece to_hex(int32_num) & to_hex(int64_num) -- takes integer number and returns as hex string All previously discussed on the list; see thread at: http://archives.postgresql.org/pgsql-hackers/2002-07/msg00511.php Examples: SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; ya123da123doo --------------- ya123da123doo (1 row) select split('joeuser@mydatabase','@',1) AS "joeuser"; joeuser --------- joeuser (1 row) select split('joeuser@mydatabase','@',2) AS "mydatabase"; mydatabase ------------ mydatabase (1 row) select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; ffffffff ---------- ffffffff (1 row) Tests have been added to the regression suite. Passes all regression tests. I've checked the strings.sql script in a multibyte database and it works fine also. I'd appreciate a good look by someone more familiar with multibyte related issues though. If it is OK, I'd like to hold off on docs until this is committed and after beta starts. If there are no objections, please apply. Thanks, Joe Index: src/backend/utils/adt/varlena.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v retrieving revision 1.87 diff -c -r1.87 varlena.c *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 --- src/backend/utils/adt/varlena.c 16 Aug 2002 19:54:03 -0000 *************** *** 18,23 **** --- 18,25 ---- #include "mb/pg_wchar.h" #include "miscadmin.h" + #include "access/tuptoaster.h" + #include "lib/stringinfo.h" #include "utils/builtins.h" #include "utils/pg_locale.h" *************** *** 27,34 **** --- 29,62 ---- #define DatumGetUnknownP(X) ((unknown *) PG_DETOAST_DATUM(X)) #define PG_GETARG_UNKNOWN_P(n) DatumGetUnknownP(PG_GETARG_DATUM(n)) #define PG_RETURN_UNKNOWN_P(x) PG_RETURN_POINTER(x) + #define PG_TEXTARG_GET_STR(arg_) \ + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) + #define PG_TEXT_GET_STR(textp_) \ + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) + #define PG_STR_GET_TEXT(str_) \ + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) + #define TEXTLEN(textp) \ + text_length(PointerGetDatum(textp)) + #define TEXTPOS(buf_text, from_sub_text) \ + text_position(PointerGetDatum(buf_text), PointerGetDatum(from_sub_text), 1) + #define TEXTDUP(textp) \ + DatumGetTextPCopy(PointerGetDatum(textp)) + #define LEFT(buf_text, from_sub_text) \ + text_substring(PointerGetDatum(buf_text), \ + 1, \ + TEXTPOS(buf_text, from_sub_text) - 1, false) + #define RIGHT(buf_text, from_sub_text, from_sub_text_len) \ + text_substring(PointerGetDatum(buf_text), \ + TEXTPOS(buf_text, from_sub_text) + from_sub_text_len, \ + -1, true) static int text_cmp(text *arg1, text *arg2); + static int32 text_length(Datum str); + static int32 text_position(Datum str, Datum search_str, int matchnum); + static text *text_substring(Datum str, + int32 start, + int32 length, + bool length_not_specified); /***************************************************************************** *************** *** 285,303 **** Datum textlen(PG_FUNCTION_ARGS) { ! text *t = PG_GETARG_TEXT_P(0); ! #ifdef MULTIBYTE ! /* optimization for single byte encoding */ ! if (pg_database_encoding_max_length() <= 1) ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ! ! PG_RETURN_INT32( ! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ) ! ); ! #else ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); ! #endif } /* --- 313,348 ---- Datum textlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(text_length(PG_GETARG_DATUM(0))); ! } ! /* ! * text_length - ! * Does the real work for textlen() ! * This is broken out so it can be called directly by other string processing ! * functions. ! */ ! static int32 ! text_length(Datum str) ! { ! /* fastpath when max encoding length is one */ ! if (pg_database_encoding_max_length() == 1) ! PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); ! ! if (pg_database_encoding_max_length() > 1) ! { ! text *t = DatumGetTextP(str); ! ! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t), ! VARSIZE(t) - VARHDRSZ)); ! } ! ! /* should never get here */ ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! ! /* not reached: suppress compiler warning */ ! return 0; } /* *************** *** 308,316 **** Datum textoctetlen(PG_FUNCTION_ARGS) { ! text *arg = PG_GETARG_TEXT_P(0); ! ! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ); } /* --- 353,359 ---- Datum textoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); } /* *************** *** 382,471 **** * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 */ Datum text_substr(PG_FUNCTION_ARGS) { ! text *string; ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2); ! int32 sm; ! int32 sn; ! int eml = 1; ! #ifdef MULTIBYTE ! int i; ! int len; ! text *ret; ! char *p; ! #endif ! /* ! * starting position before the start of the string? then offset into ! * the string per SQL92 spec... ! */ ! if (m < 1) { ! n += (m - 1); ! m = 1; ! } ! /* Check for m > octet length is made in TOAST access routine */ ! /* m will now become a zero-based starting position */ ! sm = m - 1; ! sn = n; ! #ifdef MULTIBYTE ! eml = pg_database_encoding_max_length (); ! if (eml > 1) { ! sm = 0; ! if (n > -1) ! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */ else ! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */ ! } ! #endif ! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn); ! if (eml == 1) ! { ! PG_RETURN_TEXT_P (string); ! } ! #ifndef MULTIBYTE ! PG_RETURN_NULL(); /* notreached: suppress compiler warning */ ! #endif ! #ifdef MULTIBYTE ! if (n > -1) ! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3); ! else /* n < 0 is special-cased; need full string length */ ! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ); ! ! if (m > len) ! { ! m = 1; ! n = 0; ! } ! m--; ! if (((m + n) > len) || (n < 0)) ! n = (len - m); ! ! p = VARDATA(string); ! for (i = 0; i < m; i++) ! p += pg_mblen(p); ! m = p - VARDATA(string); ! for (i = 0; i < n; i++) ! p += pg_mblen(p); ! n = p - (VARDATA(string) + m); ! ret = (text *) palloc(VARHDRSZ + n); ! VARATT_SIZEP(ret) = VARHDRSZ + n; ! memcpy(VARDATA(ret), VARDATA(string) + m, n); ! PG_RETURN_TEXT_P(ret); ! #endif } /* --- 425,625 ---- * - Thomas Lockhart 1998-12-10 * Now uses faster TOAST-slicing interface * - John Gray 2002-02-22 + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw + * error; if E < 1, return '', not entire string). Fixed MB related bug when + * S > LC and < LC + 4 sometimes garbage characters are returned. + * - Joe Conway 2002-08-10 */ Datum text_substr(PG_FUNCTION_ARGS) { ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), ! PG_GETARG_INT32(1), ! PG_GETARG_INT32(2), ! false)); ! } ! /* ! * text_substr_no_len - ! * Wrapper to avoid opr_sanity failure due to ! * one function accepting a different number of args. ! */ ! Datum ! text_substr_no_len(PG_FUNCTION_ARGS) ! { ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), ! PG_GETARG_INT32(1), ! -1, true)); ! } ! ! /* ! * text_substring - ! * Does the real work for text_substr() and text_substr_no_len() ! * This is broken out so it can be called directly by other string processing ! * functions. ! */ ! static text* ! text_substring(Datum str, int32 start, int32 length, bool length_not_specified) ! { ! int32 eml = pg_database_encoding_max_length(); ! int32 S = start; /* start position */ ! int32 S1; /* adjusted start position */ ! int32 L1; /* adjusted substring length */ ! ! /* life is easy if the encoding max length is 1 */ ! if (eml == 1) { ! S1 = Max(S, 1); ! if (length_not_specified) /* special case - get length to end of string */ ! L1 = -1; ! else ! { ! /* end position */ ! int E = S + length; ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! return PG_STR_GET_TEXT(""); ! ! L1 = E - S1; ! } ! ! /* ! * If the start position is past the end of the string, ! * SQL99 says to return a zero-length string -- ! * PG_GETARG_TEXT_P_SLICE() will do that for us. ! * Convert to zero-based starting position ! */ ! return DatumGetTextPSlice(str, S1 - 1, L1); ! } ! else if (eml > 1) { ! /* ! * When encoding max length is > 1, we can't get LC without ! * detoasting, so we'll grab a conservatively large slice ! * now and go back later to do the right thing ! */ ! int32 slice_start; ! int32 slice_size; ! int32 slice_strlen; ! text *slice; ! int32 E1; ! int32 i; ! char *p; ! char *s; ! text *ret; ! ! /* ! * if S is past the end of the string, the tuple toaster ! * will return a zero-length string to us ! */ ! S1 = Max(S, 1); ! ! /* ! * We need to start at position zero because there is no ! * way to know in advance which byte offset corresponds to ! * the supplied start position. ! */ ! slice_start = 0; ! ! if (length_not_specified) /* special case - get length to end of string */ ! slice_size = L1 = -1; else ! { ! int E = S + length; ! ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! return PG_STR_GET_TEXT(""); ! /* ! * if E is past the end of the string, the tuple toaster ! * will truncate the length for us ! */ ! L1 = E - S1; ! ! /* ! * Total slice size in bytes can't be any longer than the start ! * position plus substring length times the encoding max length. ! */ ! slice_size = (S1 + L1) * eml; ! } ! slice = DatumGetTextPSlice(str, slice_start, slice_size); ! /* see if we got back an empty string */ ! if ((VARSIZE(slice) - VARHDRSZ) == 0) ! return PG_STR_GET_TEXT(""); ! /* Now we can get the actual length of the slice in MB characters */ ! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ); ! /* Check that the start position wasn't > slice_strlen. If so, ! * SQL99 says to return a zero-length string. ! */ ! if (S1 > slice_strlen) ! return PG_STR_GET_TEXT(""); ! ! /* ! * Adjust L1 and E1 now that we know the slice string length. ! * Again remember that S1 is one based, and slice_start is zero based. ! */ ! if (L1 > -1) ! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen); ! else ! E1 = slice_start + 1 + slice_strlen; ! ! /* ! * Find the start position in the slice; ! * remember S1 is not zero based ! */ ! p = VARDATA(slice); ! for (i = 0; i < S1 - 1; i++) ! p += pg_mblen(p); ! ! /* hang onto a pointer to our start position */ ! s = p; ! ! /* ! * Count the actual bytes used by the substring of ! * the requested length. ! */ ! for (i = S1; i < E1; i++) ! p += pg_mblen(p); ! ! ret = (text *) palloc(VARHDRSZ + (p - s)); ! VARATT_SIZEP(ret) = VARHDRSZ + (p - s); ! memcpy(VARDATA(ret), s, (p - s)); ! ! return ret; ! } ! else ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! ! /* not reached: suppress compiler warning */ ! return PG_STR_GET_TEXT(""); } /* *************** *** 481,536 **** Datum textpos(PG_FUNCTION_ARGS) { ! text *t1 = PG_GETARG_TEXT_P(0); ! text *t2 = PG_GETARG_TEXT_P(1); ! int pos; ! int px, ! p; ! int len1, len2; - pg_wchar *p1, - *p2; ! #ifdef MULTIBYTE ! pg_wchar *ps1, ! *ps2; ! #endif if (VARSIZE(t2) <= VARHDRSZ) PG_RETURN_INT32(1); /* result for empty pattern */ len1 = (VARSIZE(t1) - VARHDRSZ); len2 = (VARSIZE(t2) - VARHDRSZ); ! #ifdef MULTIBYTE ! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar)); ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1); ! len1 = pg_wchar_strlen(p1); ! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar)); ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2); ! len2 = pg_wchar_strlen(p2); ! #else ! p1 = VARDATA(t1); ! p2 = VARDATA(t2); ! #endif ! pos = 0; px = (len1 - len2); ! for (p = 0; p <= px; p++) { ! #ifdef MULTIBYTE ! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0)) ! #else ! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) ! #endif { ! pos = p + 1; ! break; ! }; ! p1++; ! }; ! #ifdef MULTIBYTE ! pfree(ps1); ! pfree(ps2); ! #endif PG_RETURN_INT32(pos); } --- 635,729 ---- Datum textpos(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(text_position(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1), 1)); ! } ! ! /* ! * text_position - ! * Does the real work for textpos() ! * This is broken out so it can be called directly by other string processing ! * functions. ! */ ! static int32 ! text_position(Datum str, Datum search_str, int matchnum) ! { ! int eml = pg_database_encoding_max_length(); ! text *t1 = DatumGetTextP(str); ! text *t2 = DatumGetTextP(search_str); ! int match = 0, ! pos = 0, ! p = 0, ! px, ! len1, len2; ! if(matchnum == 0) ! return 0; /* result for 0th match */ if (VARSIZE(t2) <= VARHDRSZ) PG_RETURN_INT32(1); /* result for empty pattern */ len1 = (VARSIZE(t1) - VARHDRSZ); len2 = (VARSIZE(t2) - VARHDRSZ); ! ! /* no use in searching str past point where search_str will fit */ px = (len1 - len2); ! ! if (eml == 1) /* simple case - single byte encoding */ { ! char *p1, ! *p2; ! ! p1 = VARDATA(t1); ! p2 = VARDATA(t2); ! ! for (p = 0; p <= px; p++) { ! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) ! { ! if (++match == matchnum) ! { ! pos = p + 1; ! break; ! } ! } ! p1++; ! } ! } ! else if (eml > 1) /* not as simple - multibyte encoding */ ! { ! pg_wchar *p1, ! *p2, ! *ps1, ! *ps2; ! ! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar)); ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1); ! len1 = pg_wchar_strlen(p1); ! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar)); ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2); ! len2 = pg_wchar_strlen(p2); ! ! for (p = 0; p <= px; p++) ! { ! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0)) ! { ! if (++match == matchnum) ! { ! pos = p + 1; ! break; ! } ! } ! p1++; ! } ! ! pfree(ps1); ! pfree(ps2); ! } ! else ! elog(ERROR, "Invalid backend encoding; encoding max length " ! "is less than one."); ! PG_RETURN_INT32(pos); } *************** *** 758,766 **** Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! bytea *v = PG_GETARG_BYTEA_P(0); ! ! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ); } /* --- 951,957 ---- Datum byteaoctetlen(PG_FUNCTION_ARGS) { ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); } /* *************** *** 805,810 **** --- 996,1003 ---- PG_RETURN_BYTEA_P(result); } + #define PG_STR_GET_BYTEA(str_) \ + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) /* * bytea_substr() * Return a substring starting at the specified position. *************** *** 813,845 **** * Input: * - string * - starting position (is one-based) ! * - string length * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, return the remaining string. ! * */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int32 m = PG_GETARG_INT32(1); ! int32 n = PG_GETARG_INT32(2); ! /* ! * starting position before the start of the string? then offset into ! * the string per SQL92 spec... ! */ ! if (m < 1) { ! n += (m - 1); ! m = 1; } ! /* m will now become a zero-based starting position */ ! m--; ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n)); } /* --- 1006,1076 ---- * Input: * - string * - starting position (is one-based) ! * - string length (optional) * * If the starting position is zero or less, then return from the start of the string * adjusting the length to be consistent with the "negative start" per SQL92. ! * If the length is less than zero, an ERROR is thrown. If no third argument ! * (length) is provided, the length to the end of the string is assumed. */ Datum bytea_substr(PG_FUNCTION_ARGS) { ! int S = PG_GETARG_INT32(1); /* start position */ ! int S1; /* adjusted start position */ ! int L1; /* adjusted substring length */ ! S1 = Max(S, 1); ! ! if (fcinfo->nargs == 2) ! { ! /* ! * Not passed a length - PG_GETARG_BYTEA_P_SLICE() ! * grabs everything to the end of the string if we pass it ! * a negative value for length. ! */ ! L1 = -1; ! } ! else { ! /* end position */ ! int E = S + PG_GETARG_INT32(2); ! ! /* ! * A negative value for L is the only way for the end position ! * to be before the start. SQL99 says to throw an error. ! */ ! if (E < S) ! elog(ERROR, "negative substring length not allowed"); ! ! /* ! * A zero or negative value for the end position can happen if the start ! * was negative or one. SQL99 says to return a zero-length string. ! */ ! if (E < 1) ! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA("")); ! ! L1 = E - S1; } ! /* ! * If the start position is past the end of the string, ! * SQL99 says to return a zero-length string -- ! * PG_GETARG_TEXT_P_SLICE() will do that for us. ! * Convert to zero-based starting position ! */ ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1)); ! } ! /* ! * bytea_substr_no_len - ! * Wrapper to avoid opr_sanity failure due to ! * one function accepting a different number of args. ! */ ! Datum ! bytea_substr_no_len(PG_FUNCTION_ARGS) ! { ! return bytea_substr(fcinfo); } /* *************** *** 1422,1424 **** --- 1653,1834 ---- PG_RETURN_INT32(cmp); } + + /* + * replace_text + * replace all occurences of 'old_sub_str' in 'orig_str' + * with 'new_sub_str' to form 'new_str' + * + * returns 'orig_str' if 'old_sub_str' == '' or 'orig_str' == '' + * otherwise returns 'new_str' + */ + Datum + replace_text(PG_FUNCTION_ARGS) + { + text *left_text; + text *right_text; + text *buf_text; + text *ret_text; + int curr_posn; + text *src_text = PG_GETARG_TEXT_P(0); + int src_text_len = TEXTLEN(src_text); + text *from_sub_text = PG_GETARG_TEXT_P(1); + int from_sub_text_len = TEXTLEN(from_sub_text); + text *to_sub_text = PG_GETARG_TEXT_P(2); + char *to_sub_str = PG_TEXT_GET_STR(to_sub_text); + StringInfo str = makeStringInfo(); + + if (src_text_len == 0 || from_sub_text_len == 0) + PG_RETURN_TEXT_P(src_text); + + buf_text = TEXTDUP(src_text); + curr_posn = TEXTPOS(buf_text, from_sub_text); + + while (curr_posn > 0) + { + left_text = LEFT(buf_text, from_sub_text); + right_text = RIGHT(buf_text, from_sub_text, from_sub_text_len); + + appendStringInfo(str, PG_TEXT_GET_STR(left_text)); + appendStringInfo(str, to_sub_str); + + pfree(buf_text); + pfree(left_text); + buf_text = right_text; + curr_posn = TEXTPOS(buf_text, from_sub_text); + } + + appendStringInfo(str, PG_TEXT_GET_STR(buf_text)); + pfree(buf_text); + + ret_text = PG_STR_GET_TEXT(str->data); + pfree(str->data); + pfree(str); + + PG_RETURN_TEXT_P(ret_text); + } + + /* + * split_text + * parse input string + * return ord item (1 based) + * based on provided field separator + */ + Datum + split_text(PG_FUNCTION_ARGS) + { + text *inputstring = PG_GETARG_TEXT_P(0); + int inputstring_len = TEXTLEN(inputstring); + text *fldsep = PG_GETARG_TEXT_P(1); + int fldsep_len = TEXTLEN(fldsep); + int fldnum = PG_GETARG_INT32(2); + int start_posn = 0; + int end_posn = 0; + text *result_text; + + /* return empty string for empty input string */ + if (inputstring_len < 1) + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + + /* empty field separator */ + if (fldsep_len < 1) + { + if (fldnum == 1) /* first field - just return the input string */ + PG_RETURN_TEXT_P(inputstring); + else /* otherwise return an empty string */ + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + } + + /* field number is 1 based */ + if (fldnum < 1) + elog(ERROR, "field position must be > 0"); + + start_posn = text_position(PointerGetDatum(inputstring), + PointerGetDatum(fldsep), + fldnum - 1); + end_posn = text_position(PointerGetDatum(inputstring), + PointerGetDatum(fldsep), + fldnum); + + if ((start_posn == 0) && (end_posn == 0)) /* fldsep not found */ + { + if (fldnum == 1) /* first field - just return the input string */ + PG_RETURN_TEXT_P(inputstring); + else /* otherwise return an empty string */ + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); + } + else if ((start_posn != 0) && (end_posn == 0)) + { + /* last field requested */ + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, -1, true); + PG_RETURN_TEXT_P(result_text); + } + else if ((start_posn == 0) && (end_posn != 0)) + { + /* first field requested */ + result_text = LEFT(inputstring, fldsep); + PG_RETURN_TEXT_P(result_text); + } + else + { + /* prior to last field requested */ + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, end_posn - start_posn - fldsep_len,false); + PG_RETURN_TEXT_P(result_text); + } + } + + #define HEXBASE 16 + /* + * Convert a int32 to a string containing a base 16 (hex) representation of + * the number. + */ + Datum + to_hex32(PG_FUNCTION_ARGS) + { + static char digits[] = "0123456789abcdef"; + char buf[32]; /* bigger than needed, but reasonable */ + char *ptr, + *end; + text *result_text; + int32 value = PG_GETARG_INT32(0); + + end = ptr = buf + sizeof(buf) - 1; + *ptr = '\0'; + + do + { + *--ptr = digits[value % HEXBASE]; + value /= HEXBASE; + } while (ptr > buf && value); + + result_text = PG_STR_GET_TEXT(ptr); + PG_RETURN_TEXT_P(result_text); + } + + /* + * Convert a int64 to a string containing a base 16 (hex) representation of + * the number. + */ + Datum + to_hex64(PG_FUNCTION_ARGS) + { + static char digits[] = "0123456789abcdef"; + char buf[32]; /* bigger than needed, but reasonable */ + char *ptr, + *end; + text *result_text; + int64 value = PG_GETARG_INT64(0); + + end = ptr = buf + sizeof(buf) - 1; + *ptr = '\0'; + + do + { + *--ptr = digits[value % HEXBASE]; + value /= HEXBASE; + } while (ptr > buf && value); + + result_text = PG_STR_GET_TEXT(ptr); + PG_RETURN_TEXT_P(result_text); + } + Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v retrieving revision 1.254 diff -c -r1.254 pg_proc.h *** src/include/catalog/pg_proc.h 15 Aug 2002 02:51:27 -0000 1.254 --- src/include/catalog/pg_proc.h 16 Aug 2002 18:53:13 -0000 *************** *** 2121,2127 **** DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); --- 2121,2127 ---- DESCR("remove initial characters from string"); DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); DESCR("remove trailing characters from string"); ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); DESCR("trim both ends of string"); *************** *** 2130,2137 **** DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_)); DESCR("return portion of string"); /* for multi-byte support */ --- 2130,2145 ---- DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); DESCR("return portion of string"); + DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25 25" replace_text - _null_ )); + DESCR("replace all occurrences of old_substr with new_substr in string"); + DATA(insert OID = 2088 ( split PGNSP PGUID 12 f f t f i 3 25 "25 25 23" split_text - _null_ )); + DESCR("split string by field_sep and return field_num"); + DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23" to_hex32 - _null_ )); + DESCR("convert int32 number to hex"); + DATA(insert OID = 2090 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "20" to_hex64 - _null_ )); + DESCR("convert int64 number to hex"); /* for multi-byte support */ *************** *** 2778,2784 **** DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)" -_null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); --- 2786,2796 ---- DESCR("concatenate"); DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); DESCR("return portion of string"); ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); ! DESCR("return portion of string"); ! DATA(insert OID = 2086 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); DESCR("return portion of string"); DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); DESCR("return position of substring"); Index: src/include/utils/builtins.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v retrieving revision 1.191 diff -c -r1.191 builtins.h *** src/include/utils/builtins.h 15 Aug 2002 02:51:27 -0000 1.191 --- src/include/utils/builtins.h 16 Aug 2002 18:53:13 -0000 *************** *** 447,458 **** --- 447,463 ---- extern Datum textoctetlen(PG_FUNCTION_ARGS); extern Datum textpos(PG_FUNCTION_ARGS); extern Datum text_substr(PG_FUNCTION_ARGS); + extern Datum text_substr_no_len(PG_FUNCTION_ARGS); extern Datum name_text(PG_FUNCTION_ARGS); extern Datum text_name(PG_FUNCTION_ARGS); extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2); extern List *textToQualifiedNameList(text *textval, const char *caller); extern bool SplitIdentifierString(char *rawstring, char separator, List **namelist); + extern Datum replace_text(PG_FUNCTION_ARGS); + extern Datum split_text(PG_FUNCTION_ARGS); + extern Datum to_hex32(PG_FUNCTION_ARGS); + extern Datum to_hex64(PG_FUNCTION_ARGS); extern Datum unknownin(PG_FUNCTION_ARGS); extern Datum unknownout(PG_FUNCTION_ARGS); *************** *** 476,481 **** --- 481,487 ---- extern Datum byteacat(PG_FUNCTION_ARGS); extern Datum byteapos(PG_FUNCTION_ARGS); extern Datum bytea_substr(PG_FUNCTION_ARGS); + extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS); /* version.c */ extern Datum pgsql_version(PG_FUNCTION_ARGS); Index: src/test/regress/expected/strings.out =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v retrieving revision 1.12 diff -c -r1.12 strings.out *** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12 --- src/test/regress/expected/strings.out 16 Aug 2002 18:53:13 -0000 *************** *** 573,575 **** --- 573,738 ---- text and varchar (1 row) + -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + substr + -------- + 123 + 123 + (2 rows) + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + ERROR: negative substring length not allowed + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + substr + -------- + 567890 + 567890 + (2 rows) + + DROP TABLE toasttest; + -- + -- test length + -- + SELECT length('abcdef') AS "length_6"; + length_6 + ---------- + 6 + (1 row) + + -- + -- test strpos + -- + SELECT strpos('abcdef', 'cd') AS "pos_3"; + pos_3 + ------- + 3 + (1 row) + + SELECT strpos('abcdef', 'xy') AS "pos_0"; + pos_0 + ------- + 0 + (1 row) + + -- + -- test replace + -- + SELECT replace('abcdef', 'de', '45') AS "abc45f"; + abc45f + -------- + abc45f + (1 row) + + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; + ya123da123doo + --------------- + ya123da123doo + (1 row) + + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; + yaoo + ------ + yaoo + (1 row) + + -- + -- test split + -- + select split('joeuser@mydatabase','@',0) AS "an error"; + ERROR: field position must be > 0 + select split('joeuser@mydatabase','@',1) AS "joeuser"; + joeuser + --------- + joeuser + (1 row) + + select split('joeuser@mydatabase','@',2) AS "mydatabase"; + mydatabase + ------------ + mydatabase + (1 row) + + select split('joeuser@mydatabase','@',3) AS "empty string"; + empty string + -------------- + + (1 row) + + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; + joeuser + --------- + joeuser + (1 row) + + -- + -- test to_hex + -- + select to_hex(256*256*256 - 1) AS "ffffff"; + ffffff + -------- + ffffff + (1 row) + + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; + ffffffff + ---------- + ffffffff + (1 row) + Index: src/test/regress/sql/strings.sql =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v retrieving revision 1.8 diff -c -r1.8 strings.sql *** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8 --- src/test/regress/sql/strings.sql 16 Aug 2002 18:53:13 -0000 *************** *** 197,199 **** --- 197,292 ---- SELECT text 'text' || char(20) ' and characters' AS "Concat text to char"; SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar"; + + -- + -- test substr with toasted text values + -- + CREATE TABLE toasttest(f1 text); + + insert into toasttest values(repeat('1234567890',10000)); + insert into toasttest values(repeat('1234567890',10000)); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test substr with toasted bytea values + -- + CREATE TABLE toasttest(f1 bytea); + + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); + + -- If the starting position is zero or less, then return from the start of the string + -- adjusting the length to be consistent with the "negative start" per SQL92. + SELECT substr(f1, -1, 5) from toasttest; + + -- If the length is less than zero, an ERROR is thrown. + SELECT substr(f1, 5, -1) from toasttest; + + -- If no third argument (length) is provided, the length to the end of the + -- string is assumed. + SELECT substr(f1, 99995) from toasttest; + + -- If start plus length is > string length, the result is truncated to + -- string length + SELECT substr(f1, 99995, 10) from toasttest; + + DROP TABLE toasttest; + + -- + -- test length + -- + + SELECT length('abcdef') AS "length_6"; + + -- + -- test strpos + -- + + SELECT strpos('abcdef', 'cd') AS "pos_3"; + + SELECT strpos('abcdef', 'xy') AS "pos_0"; + + -- + -- test replace + -- + SELECT replace('abcdef', 'de', '45') AS "abc45f"; + + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; + + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; + + -- + -- test split + -- + select split('joeuser@mydatabase','@',0) AS "an error"; + + select split('joeuser@mydatabase','@',1) AS "joeuser"; + + select split('joeuser@mydatabase','@',2) AS "mydatabase"; + + select split('joeuser@mydatabase','@',3) AS "empty string"; + + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; + + -- + -- test to_hex + -- + select to_hex(256*256*256 - 1) AS "ffffff"; + + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff";
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Joe Conway wrote: > Joe Conway wrote: > > I took Tom's advice and added wrapper functions around text_substr() and > > bytea_substr() to cover the 2 argument case. > > > > I also added tests to strings.sql to cover substr() on toasted columns > > of both text and bytea. > > > > Please replace the original patch (substr.2002.08.14.1.patch) with the > attached. It includes everything from the previous one, plus newly > implemented builtin functions: > > replace(string, from, to) > -- replaces all occurrences of "from" in "string" to "to" > split(string, fldsep, column) > -- splits "string" on "fldsep" and returns "column" number piece > to_hex(int32_num) & to_hex(int64_num) > -- takes integer number and returns as hex string > > All previously discussed on the list; see thread at: > http://archives.postgresql.org/pgsql-hackers/2002-07/msg00511.php > > Examples: > > SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; > ya123da123doo > --------------- > ya123da123doo > (1 row) > > select split('joeuser@mydatabase','@',1) AS "joeuser"; > joeuser > --------- > joeuser > (1 row) > > select split('joeuser@mydatabase','@',2) AS "mydatabase"; > mydatabase > ------------ > mydatabase > (1 row) > > select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS > "ffffffff"; > ffffffff > ---------- > ffffffff > (1 row) > > Tests have been added to the regression suite. > > Passes all regression tests. I've checked the strings.sql script in a > multibyte database and it works fine also. I'd appreciate a good look by > someone more familiar with multibyte related issues though. > > If it is OK, I'd like to hold off on docs until this is committed and > after beta starts. > > If there are no objections, please apply. > > Thanks, > > Joe > Index: src/backend/utils/adt/varlena.c > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v > retrieving revision 1.87 > diff -c -r1.87 varlena.c > *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 > --- src/backend/utils/adt/varlena.c 16 Aug 2002 19:54:03 -0000 > *************** > *** 18,23 **** > --- 18,25 ---- > > #include "mb/pg_wchar.h" > #include "miscadmin.h" > + #include "access/tuptoaster.h" > + #include "lib/stringinfo.h" > #include "utils/builtins.h" > #include "utils/pg_locale.h" > > *************** > *** 27,34 **** > --- 29,62 ---- > #define DatumGetUnknownP(X) ((unknown *) PG_DETOAST_DATUM(X)) > #define PG_GETARG_UNKNOWN_P(n) DatumGetUnknownP(PG_GETARG_DATUM(n)) > #define PG_RETURN_UNKNOWN_P(x) PG_RETURN_POINTER(x) > + #define PG_TEXTARG_GET_STR(arg_) \ > + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) > + #define PG_TEXT_GET_STR(textp_) \ > + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) > + #define PG_STR_GET_TEXT(str_) \ > + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) > + #define TEXTLEN(textp) \ > + text_length(PointerGetDatum(textp)) > + #define TEXTPOS(buf_text, from_sub_text) \ > + text_position(PointerGetDatum(buf_text), PointerGetDatum(from_sub_text), 1) > + #define TEXTDUP(textp) \ > + DatumGetTextPCopy(PointerGetDatum(textp)) > + #define LEFT(buf_text, from_sub_text) \ > + text_substring(PointerGetDatum(buf_text), \ > + 1, \ > + TEXTPOS(buf_text, from_sub_text) - 1, false) > + #define RIGHT(buf_text, from_sub_text, from_sub_text_len) \ > + text_substring(PointerGetDatum(buf_text), \ > + TEXTPOS(buf_text, from_sub_text) + from_sub_text_len, \ > + -1, true) > > static int text_cmp(text *arg1, text *arg2); > + static int32 text_length(Datum str); > + static int32 text_position(Datum str, Datum search_str, int matchnum); > + static text *text_substring(Datum str, > + int32 start, > + int32 length, > + bool length_not_specified); > > > /***************************************************************************** > *************** > *** 285,303 **** > Datum > textlen(PG_FUNCTION_ARGS) > { > ! text *t = PG_GETARG_TEXT_P(0); > > ! #ifdef MULTIBYTE > ! /* optimization for single byte encoding */ > ! if (pg_database_encoding_max_length() <= 1) > ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); > ! > ! PG_RETURN_INT32( > ! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ) > ! ); > ! #else > ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); > ! #endif > } > > /* > --- 313,348 ---- > Datum > textlen(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(text_length(PG_GETARG_DATUM(0))); > ! } > > ! /* > ! * text_length - > ! * Does the real work for textlen() > ! * This is broken out so it can be called directly by other string processing > ! * functions. > ! */ > ! static int32 > ! text_length(Datum str) > ! { > ! /* fastpath when max encoding length is one */ > ! if (pg_database_encoding_max_length() == 1) > ! PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); > ! > ! if (pg_database_encoding_max_length() > 1) > ! { > ! text *t = DatumGetTextP(str); > ! > ! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t), > ! VARSIZE(t) - VARHDRSZ)); > ! } > ! > ! /* should never get here */ > ! elog(ERROR, "Invalid backend encoding; encoding max length " > ! "is less than one."); > ! > ! /* not reached: suppress compiler warning */ > ! return 0; > } > > /* > *************** > *** 308,316 **** > Datum > textoctetlen(PG_FUNCTION_ARGS) > { > ! text *arg = PG_GETARG_TEXT_P(0); > ! > ! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ); > } > > /* > --- 353,359 ---- > Datum > textoctetlen(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); > } > > /* > *************** > *** 382,471 **** > * - Thomas Lockhart 1998-12-10 > * Now uses faster TOAST-slicing interface > * - John Gray 2002-02-22 > */ > Datum > text_substr(PG_FUNCTION_ARGS) > { > ! text *string; > ! int32 m = PG_GETARG_INT32(1); > ! int32 n = PG_GETARG_INT32(2); > ! int32 sm; > ! int32 sn; > ! int eml = 1; > ! #ifdef MULTIBYTE > ! int i; > ! int len; > ! text *ret; > ! char *p; > ! #endif > > ! /* > ! * starting position before the start of the string? then offset into > ! * the string per SQL92 spec... > ! */ > ! if (m < 1) > { > ! n += (m - 1); > ! m = 1; > ! } > ! /* Check for m > octet length is made in TOAST access routine */ > > ! /* m will now become a zero-based starting position */ > ! sm = m - 1; > ! sn = n; > > ! #ifdef MULTIBYTE > ! eml = pg_database_encoding_max_length (); > > ! if (eml > 1) > { > ! sm = 0; > ! if (n > -1) > ! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */ > else > ! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */ > ! } > ! #endif > > ! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn); > > ! if (eml == 1) > ! { > ! PG_RETURN_TEXT_P (string); > ! } > ! #ifndef MULTIBYTE > ! PG_RETURN_NULL(); /* notreached: suppress compiler warning */ > ! #endif > ! #ifdef MULTIBYTE > ! if (n > -1) > ! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3); > ! else /* n < 0 is special-cased; need full string length */ > ! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ); > ! > ! if (m > len) > ! { > ! m = 1; > ! n = 0; > ! } > ! m--; > ! if (((m + n) > len) || (n < 0)) > ! n = (len - m); > ! > ! p = VARDATA(string); > ! for (i = 0; i < m; i++) > ! p += pg_mblen(p); > ! m = p - VARDATA(string); > ! for (i = 0; i < n; i++) > ! p += pg_mblen(p); > ! n = p - (VARDATA(string) + m); > > ! ret = (text *) palloc(VARHDRSZ + n); > ! VARATT_SIZEP(ret) = VARHDRSZ + n; > > ! memcpy(VARDATA(ret), VARDATA(string) + m, n); > > ! PG_RETURN_TEXT_P(ret); > ! #endif > } > > /* > --- 425,625 ---- > * - Thomas Lockhart 1998-12-10 > * Now uses faster TOAST-slicing interface > * - John Gray 2002-02-22 > + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change > + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw > + * error; if E < 1, return '', not entire string). Fixed MB related bug when > + * S > LC and < LC + 4 sometimes garbage characters are returned. > + * - Joe Conway 2002-08-10 > */ > Datum > text_substr(PG_FUNCTION_ARGS) > { > ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), > ! PG_GETARG_INT32(1), > ! PG_GETARG_INT32(2), > ! false)); > ! } > > ! /* > ! * text_substr_no_len - > ! * Wrapper to avoid opr_sanity failure due to > ! * one function accepting a different number of args. > ! */ > ! Datum > ! text_substr_no_len(PG_FUNCTION_ARGS) > ! { > ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), > ! PG_GETARG_INT32(1), > ! -1, true)); > ! } > ! > ! /* > ! * text_substring - > ! * Does the real work for text_substr() and text_substr_no_len() > ! * This is broken out so it can be called directly by other string processing > ! * functions. > ! */ > ! static text* > ! text_substring(Datum str, int32 start, int32 length, bool length_not_specified) > ! { > ! int32 eml = pg_database_encoding_max_length(); > ! int32 S = start; /* start position */ > ! int32 S1; /* adjusted start position */ > ! int32 L1; /* adjusted substring length */ > ! > ! /* life is easy if the encoding max length is 1 */ > ! if (eml == 1) > { > ! S1 = Max(S, 1); > > ! if (length_not_specified) /* special case - get length to end of string */ > ! L1 = -1; > ! else > ! { > ! /* end position */ > ! int E = S + length; > > ! /* > ! * A negative value for L is the only way for the end position > ! * to be before the start. SQL99 says to throw an error. > ! */ > ! if (E < S) > ! elog(ERROR, "negative substring length not allowed"); > > ! /* > ! * A zero or negative value for the end position can happen if the start > ! * was negative or one. SQL99 says to return a zero-length string. > ! */ > ! if (E < 1) > ! return PG_STR_GET_TEXT(""); > ! > ! L1 = E - S1; > ! } > ! > ! /* > ! * If the start position is past the end of the string, > ! * SQL99 says to return a zero-length string -- > ! * PG_GETARG_TEXT_P_SLICE() will do that for us. > ! * Convert to zero-based starting position > ! */ > ! return DatumGetTextPSlice(str, S1 - 1, L1); > ! } > ! else if (eml > 1) > { > ! /* > ! * When encoding max length is > 1, we can't get LC without > ! * detoasting, so we'll grab a conservatively large slice > ! * now and go back later to do the right thing > ! */ > ! int32 slice_start; > ! int32 slice_size; > ! int32 slice_strlen; > ! text *slice; > ! int32 E1; > ! int32 i; > ! char *p; > ! char *s; > ! text *ret; > ! > ! /* > ! * if S is past the end of the string, the tuple toaster > ! * will return a zero-length string to us > ! */ > ! S1 = Max(S, 1); > ! > ! /* > ! * We need to start at position zero because there is no > ! * way to know in advance which byte offset corresponds to > ! * the supplied start position. > ! */ > ! slice_start = 0; > ! > ! if (length_not_specified) /* special case - get length to end of string */ > ! slice_size = L1 = -1; > else > ! { > ! int E = S + length; > ! > ! /* > ! * A negative value for L is the only way for the end position > ! * to be before the start. SQL99 says to throw an error. > ! */ > ! if (E < S) > ! elog(ERROR, "negative substring length not allowed"); > > ! /* > ! * A zero or negative value for the end position can happen if the start > ! * was negative or one. SQL99 says to return a zero-length string. > ! */ > ! if (E < 1) > ! return PG_STR_GET_TEXT(""); > > ! /* > ! * if E is past the end of the string, the tuple toaster > ! * will truncate the length for us > ! */ > ! L1 = E - S1; > ! > ! /* > ! * Total slice size in bytes can't be any longer than the start > ! * position plus substring length times the encoding max length. > ! */ > ! slice_size = (S1 + L1) * eml; > ! } > ! slice = DatumGetTextPSlice(str, slice_start, slice_size); > > ! /* see if we got back an empty string */ > ! if ((VARSIZE(slice) - VARHDRSZ) == 0) > ! return PG_STR_GET_TEXT(""); > > ! /* Now we can get the actual length of the slice in MB characters */ > ! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ); > > ! /* Check that the start position wasn't > slice_strlen. If so, > ! * SQL99 says to return a zero-length string. > ! */ > ! if (S1 > slice_strlen) > ! return PG_STR_GET_TEXT(""); > ! > ! /* > ! * Adjust L1 and E1 now that we know the slice string length. > ! * Again remember that S1 is one based, and slice_start is zero based. > ! */ > ! if (L1 > -1) > ! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen); > ! else > ! E1 = slice_start + 1 + slice_strlen; > ! > ! /* > ! * Find the start position in the slice; > ! * remember S1 is not zero based > ! */ > ! p = VARDATA(slice); > ! for (i = 0; i < S1 - 1; i++) > ! p += pg_mblen(p); > ! > ! /* hang onto a pointer to our start position */ > ! s = p; > ! > ! /* > ! * Count the actual bytes used by the substring of > ! * the requested length. > ! */ > ! for (i = S1; i < E1; i++) > ! p += pg_mblen(p); > ! > ! ret = (text *) palloc(VARHDRSZ + (p - s)); > ! VARATT_SIZEP(ret) = VARHDRSZ + (p - s); > ! memcpy(VARDATA(ret), s, (p - s)); > ! > ! return ret; > ! } > ! else > ! elog(ERROR, "Invalid backend encoding; encoding max length " > ! "is less than one."); > ! > ! /* not reached: suppress compiler warning */ > ! return PG_STR_GET_TEXT(""); > } > > /* > *************** > *** 481,536 **** > Datum > textpos(PG_FUNCTION_ARGS) > { > ! text *t1 = PG_GETARG_TEXT_P(0); > ! text *t2 = PG_GETARG_TEXT_P(1); > ! int pos; > ! int px, > ! p; > ! int len1, > len2; > - pg_wchar *p1, > - *p2; > > ! #ifdef MULTIBYTE > ! pg_wchar *ps1, > ! *ps2; > ! #endif > > if (VARSIZE(t2) <= VARHDRSZ) > PG_RETURN_INT32(1); /* result for empty pattern */ > > len1 = (VARSIZE(t1) - VARHDRSZ); > len2 = (VARSIZE(t2) - VARHDRSZ); > ! #ifdef MULTIBYTE > ! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1); > ! len1 = pg_wchar_strlen(p1); > ! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2); > ! len2 = pg_wchar_strlen(p2); > ! #else > ! p1 = VARDATA(t1); > ! p2 = VARDATA(t2); > ! #endif > ! pos = 0; > px = (len1 - len2); > ! for (p = 0; p <= px; p++) > { > ! #ifdef MULTIBYTE > ! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0)) > ! #else > ! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) > ! #endif > { > ! pos = p + 1; > ! break; > ! }; > ! p1++; > ! }; > ! #ifdef MULTIBYTE > ! pfree(ps1); > ! pfree(ps2); > ! #endif > PG_RETURN_INT32(pos); > } > > --- 635,729 ---- > Datum > textpos(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(text_position(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1), 1)); > ! } > ! > ! /* > ! * text_position - > ! * Does the real work for textpos() > ! * This is broken out so it can be called directly by other string processing > ! * functions. > ! */ > ! static int32 > ! text_position(Datum str, Datum search_str, int matchnum) > ! { > ! int eml = pg_database_encoding_max_length(); > ! text *t1 = DatumGetTextP(str); > ! text *t2 = DatumGetTextP(search_str); > ! int match = 0, > ! pos = 0, > ! p = 0, > ! px, > ! len1, > len2; > > ! if(matchnum == 0) > ! return 0; /* result for 0th match */ > > if (VARSIZE(t2) <= VARHDRSZ) > PG_RETURN_INT32(1); /* result for empty pattern */ > > len1 = (VARSIZE(t1) - VARHDRSZ); > len2 = (VARSIZE(t2) - VARHDRSZ); > ! > ! /* no use in searching str past point where search_str will fit */ > px = (len1 - len2); > ! > ! if (eml == 1) /* simple case - single byte encoding */ > { > ! char *p1, > ! *p2; > ! > ! p1 = VARDATA(t1); > ! p2 = VARDATA(t2); > ! > ! for (p = 0; p <= px; p++) > { > ! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) > ! { > ! if (++match == matchnum) > ! { > ! pos = p + 1; > ! break; > ! } > ! } > ! p1++; > ! } > ! } > ! else if (eml > 1) /* not as simple - multibyte encoding */ > ! { > ! pg_wchar *p1, > ! *p2, > ! *ps1, > ! *ps2; > ! > ! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1); > ! len1 = pg_wchar_strlen(p1); > ! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2); > ! len2 = pg_wchar_strlen(p2); > ! > ! for (p = 0; p <= px; p++) > ! { > ! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0)) > ! { > ! if (++match == matchnum) > ! { > ! pos = p + 1; > ! break; > ! } > ! } > ! p1++; > ! } > ! > ! pfree(ps1); > ! pfree(ps2); > ! } > ! else > ! elog(ERROR, "Invalid backend encoding; encoding max length " > ! "is less than one."); > ! > PG_RETURN_INT32(pos); > } > > *************** > *** 758,766 **** > Datum > byteaoctetlen(PG_FUNCTION_ARGS) > { > ! bytea *v = PG_GETARG_BYTEA_P(0); > ! > ! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ); > } > > /* > --- 951,957 ---- > Datum > byteaoctetlen(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); > } > > /* > *************** > *** 805,810 **** > --- 996,1003 ---- > PG_RETURN_BYTEA_P(result); > } > > + #define PG_STR_GET_BYTEA(str_) \ > + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) > /* > * bytea_substr() > * Return a substring starting at the specified position. > *************** > *** 813,845 **** > * Input: > * - string > * - starting position (is one-based) > ! * - string length > * > * If the starting position is zero or less, then return from the start of the string > * adjusting the length to be consistent with the "negative start" per SQL92. > ! * If the length is less than zero, return the remaining string. > ! * > */ > Datum > bytea_substr(PG_FUNCTION_ARGS) > { > ! int32 m = PG_GETARG_INT32(1); > ! int32 n = PG_GETARG_INT32(2); > > ! /* > ! * starting position before the start of the string? then offset into > ! * the string per SQL92 spec... > ! */ > ! if (m < 1) > { > ! n += (m - 1); > ! m = 1; > } > > ! /* m will now become a zero-based starting position */ > ! m--; > > ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n)); > } > > /* > --- 1006,1076 ---- > * Input: > * - string > * - starting position (is one-based) > ! * - string length (optional) > * > * If the starting position is zero or less, then return from the start of the string > * adjusting the length to be consistent with the "negative start" per SQL92. > ! * If the length is less than zero, an ERROR is thrown. If no third argument > ! * (length) is provided, the length to the end of the string is assumed. > */ > Datum > bytea_substr(PG_FUNCTION_ARGS) > { > ! int S = PG_GETARG_INT32(1); /* start position */ > ! int S1; /* adjusted start position */ > ! int L1; /* adjusted substring length */ > > ! S1 = Max(S, 1); > ! > ! if (fcinfo->nargs == 2) > ! { > ! /* > ! * Not passed a length - PG_GETARG_BYTEA_P_SLICE() > ! * grabs everything to the end of the string if we pass it > ! * a negative value for length. > ! */ > ! L1 = -1; > ! } > ! else > { > ! /* end position */ > ! int E = S + PG_GETARG_INT32(2); > ! > ! /* > ! * A negative value for L is the only way for the end position > ! * to be before the start. SQL99 says to throw an error. > ! */ > ! if (E < S) > ! elog(ERROR, "negative substring length not allowed"); > ! > ! /* > ! * A zero or negative value for the end position can happen if the start > ! * was negative or one. SQL99 says to return a zero-length string. > ! */ > ! if (E < 1) > ! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA("")); > ! > ! L1 = E - S1; > } > > ! /* > ! * If the start position is past the end of the string, > ! * SQL99 says to return a zero-length string -- > ! * PG_GETARG_TEXT_P_SLICE() will do that for us. > ! * Convert to zero-based starting position > ! */ > ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1)); > ! } > > ! /* > ! * bytea_substr_no_len - > ! * Wrapper to avoid opr_sanity failure due to > ! * one function accepting a different number of args. > ! */ > ! Datum > ! bytea_substr_no_len(PG_FUNCTION_ARGS) > ! { > ! return bytea_substr(fcinfo); > } > > /* > *************** > *** 1422,1424 **** > --- 1653,1834 ---- > > PG_RETURN_INT32(cmp); > } > + > + /* > + * replace_text > + * replace all occurences of 'old_sub_str' in 'orig_str' > + * with 'new_sub_str' to form 'new_str' > + * > + * returns 'orig_str' if 'old_sub_str' == '' or 'orig_str' == '' > + * otherwise returns 'new_str' > + */ > + Datum > + replace_text(PG_FUNCTION_ARGS) > + { > + text *left_text; > + text *right_text; > + text *buf_text; > + text *ret_text; > + int curr_posn; > + text *src_text = PG_GETARG_TEXT_P(0); > + int src_text_len = TEXTLEN(src_text); > + text *from_sub_text = PG_GETARG_TEXT_P(1); > + int from_sub_text_len = TEXTLEN(from_sub_text); > + text *to_sub_text = PG_GETARG_TEXT_P(2); > + char *to_sub_str = PG_TEXT_GET_STR(to_sub_text); > + StringInfo str = makeStringInfo(); > + > + if (src_text_len == 0 || from_sub_text_len == 0) > + PG_RETURN_TEXT_P(src_text); > + > + buf_text = TEXTDUP(src_text); > + curr_posn = TEXTPOS(buf_text, from_sub_text); > + > + while (curr_posn > 0) > + { > + left_text = LEFT(buf_text, from_sub_text); > + right_text = RIGHT(buf_text, from_sub_text, from_sub_text_len); > + > + appendStringInfo(str, PG_TEXT_GET_STR(left_text)); > + appendStringInfo(str, to_sub_str); > + > + pfree(buf_text); > + pfree(left_text); > + buf_text = right_text; > + curr_posn = TEXTPOS(buf_text, from_sub_text); > + } > + > + appendStringInfo(str, PG_TEXT_GET_STR(buf_text)); > + pfree(buf_text); > + > + ret_text = PG_STR_GET_TEXT(str->data); > + pfree(str->data); > + pfree(str); > + > + PG_RETURN_TEXT_P(ret_text); > + } > + > + /* > + * split_text > + * parse input string > + * return ord item (1 based) > + * based on provided field separator > + */ > + Datum > + split_text(PG_FUNCTION_ARGS) > + { > + text *inputstring = PG_GETARG_TEXT_P(0); > + int inputstring_len = TEXTLEN(inputstring); > + text *fldsep = PG_GETARG_TEXT_P(1); > + int fldsep_len = TEXTLEN(fldsep); > + int fldnum = PG_GETARG_INT32(2); > + int start_posn = 0; > + int end_posn = 0; > + text *result_text; > + > + /* return empty string for empty input string */ > + if (inputstring_len < 1) > + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); > + > + /* empty field separator */ > + if (fldsep_len < 1) > + { > + if (fldnum == 1) /* first field - just return the input string */ > + PG_RETURN_TEXT_P(inputstring); > + else /* otherwise return an empty string */ > + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); > + } > + > + /* field number is 1 based */ > + if (fldnum < 1) > + elog(ERROR, "field position must be > 0"); > + > + start_posn = text_position(PointerGetDatum(inputstring), > + PointerGetDatum(fldsep), > + fldnum - 1); > + end_posn = text_position(PointerGetDatum(inputstring), > + PointerGetDatum(fldsep), > + fldnum); > + > + if ((start_posn == 0) && (end_posn == 0)) /* fldsep not found */ > + { > + if (fldnum == 1) /* first field - just return the input string */ > + PG_RETURN_TEXT_P(inputstring); > + else /* otherwise return an empty string */ > + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); > + } > + else if ((start_posn != 0) && (end_posn == 0)) > + { > + /* last field requested */ > + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, -1, true); > + PG_RETURN_TEXT_P(result_text); > + } > + else if ((start_posn == 0) && (end_posn != 0)) > + { > + /* first field requested */ > + result_text = LEFT(inputstring, fldsep); > + PG_RETURN_TEXT_P(result_text); > + } > + else > + { > + /* prior to last field requested */ > + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, end_posn - start_posn - fldsep_len,false); > + PG_RETURN_TEXT_P(result_text); > + } > + } > + > + #define HEXBASE 16 > + /* > + * Convert a int32 to a string containing a base 16 (hex) representation of > + * the number. > + */ > + Datum > + to_hex32(PG_FUNCTION_ARGS) > + { > + static char digits[] = "0123456789abcdef"; > + char buf[32]; /* bigger than needed, but reasonable */ > + char *ptr, > + *end; > + text *result_text; > + int32 value = PG_GETARG_INT32(0); > + > + end = ptr = buf + sizeof(buf) - 1; > + *ptr = '\0'; > + > + do > + { > + *--ptr = digits[value % HEXBASE]; > + value /= HEXBASE; > + } while (ptr > buf && value); > + > + result_text = PG_STR_GET_TEXT(ptr); > + PG_RETURN_TEXT_P(result_text); > + } > + > + /* > + * Convert a int64 to a string containing a base 16 (hex) representation of > + * the number. > + */ > + Datum > + to_hex64(PG_FUNCTION_ARGS) > + { > + static char digits[] = "0123456789abcdef"; > + char buf[32]; /* bigger than needed, but reasonable */ > + char *ptr, > + *end; > + text *result_text; > + int64 value = PG_GETARG_INT64(0); > + > + end = ptr = buf + sizeof(buf) - 1; > + *ptr = '\0'; > + > + do > + { > + *--ptr = digits[value % HEXBASE]; > + value /= HEXBASE; > + } while (ptr > buf && value); > + > + result_text = PG_STR_GET_TEXT(ptr); > + PG_RETURN_TEXT_P(result_text); > + } > + > Index: src/include/catalog/pg_proc.h > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v > retrieving revision 1.254 > diff -c -r1.254 pg_proc.h > *** src/include/catalog/pg_proc.h 15 Aug 2002 02:51:27 -0000 1.254 > --- src/include/catalog/pg_proc.h 16 Aug 2002 18:53:13 -0000 > *************** > *** 2121,2127 **** > DESCR("remove initial characters from string"); > DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); > DESCR("remove trailing characters from string"); > ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_)); > DESCR("return portion of string"); > DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); > DESCR("trim both ends of string"); > --- 2121,2127 ---- > DESCR("remove initial characters from string"); > DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); > DESCR("remove trailing characters from string"); > ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); > DESCR("return portion of string"); > DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); > DESCR("trim both ends of string"); > *************** > *** 2130,2137 **** > > DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_)); > DESCR("return portion of string"); > > /* for multi-byte support */ > > --- 2130,2145 ---- > > DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); > DESCR("return portion of string"); > + DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25 25" replace_text - _null_ )); > + DESCR("replace all occurrences of old_substr with new_substr in string"); > + DATA(insert OID = 2088 ( split PGNSP PGUID 12 f f t f i 3 25 "25 25 23" split_text - _null_ )); > + DESCR("split string by field_sep and return field_num"); > + DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23" to_hex32 - _null_ )); > + DESCR("convert int32 number to hex"); > + DATA(insert OID = 2090 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "20" to_hex64 - _null_ )); > + DESCR("convert int64 number to hex"); > > /* for multi-byte support */ > > *************** > *** 2778,2784 **** > DESCR("concatenate"); > DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)"- _null_ )); > DESCR("return portion of string"); > DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); > DESCR("return position of substring"); > --- 2786,2796 ---- > DESCR("concatenate"); > DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); > ! DESCR("return portion of string"); > ! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); > ! DESCR("return portion of string"); > ! DATA(insert OID = 2086 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); > DESCR("return portion of string"); > DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); > DESCR("return position of substring"); > Index: src/include/utils/builtins.h > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v > retrieving revision 1.191 > diff -c -r1.191 builtins.h > *** src/include/utils/builtins.h 15 Aug 2002 02:51:27 -0000 1.191 > --- src/include/utils/builtins.h 16 Aug 2002 18:53:13 -0000 > *************** > *** 447,458 **** > --- 447,463 ---- > extern Datum textoctetlen(PG_FUNCTION_ARGS); > extern Datum textpos(PG_FUNCTION_ARGS); > extern Datum text_substr(PG_FUNCTION_ARGS); > + extern Datum text_substr_no_len(PG_FUNCTION_ARGS); > extern Datum name_text(PG_FUNCTION_ARGS); > extern Datum text_name(PG_FUNCTION_ARGS); > extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2); > extern List *textToQualifiedNameList(text *textval, const char *caller); > extern bool SplitIdentifierString(char *rawstring, char separator, > List **namelist); > + extern Datum replace_text(PG_FUNCTION_ARGS); > + extern Datum split_text(PG_FUNCTION_ARGS); > + extern Datum to_hex32(PG_FUNCTION_ARGS); > + extern Datum to_hex64(PG_FUNCTION_ARGS); > > extern Datum unknownin(PG_FUNCTION_ARGS); > extern Datum unknownout(PG_FUNCTION_ARGS); > *************** > *** 476,481 **** > --- 481,487 ---- > extern Datum byteacat(PG_FUNCTION_ARGS); > extern Datum byteapos(PG_FUNCTION_ARGS); > extern Datum bytea_substr(PG_FUNCTION_ARGS); > + extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS); > > /* version.c */ > extern Datum pgsql_version(PG_FUNCTION_ARGS); > Index: src/test/regress/expected/strings.out > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v > retrieving revision 1.12 > diff -c -r1.12 strings.out > *** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12 > --- src/test/regress/expected/strings.out 16 Aug 2002 18:53:13 -0000 > *************** > *** 573,575 **** > --- 573,738 ---- > text and varchar > (1 row) > > + -- > + -- test substr with toasted text values > + -- > + CREATE TABLE toasttest(f1 text); > + insert into toasttest values(repeat('1234567890',10000)); > + insert into toasttest values(repeat('1234567890',10000)); > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + substr > + -------- > + 123 > + 123 > + (2 rows) > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + ERROR: negative substring length not allowed > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + DROP TABLE toasttest; > + -- > + -- test substr with toasted bytea values > + -- > + CREATE TABLE toasttest(f1 bytea); > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + substr > + -------- > + 123 > + 123 > + (2 rows) > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + ERROR: negative substring length not allowed > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + DROP TABLE toasttest; > + -- > + -- test length > + -- > + SELECT length('abcdef') AS "length_6"; > + length_6 > + ---------- > + 6 > + (1 row) > + > + -- > + -- test strpos > + -- > + SELECT strpos('abcdef', 'cd') AS "pos_3"; > + pos_3 > + ------- > + 3 > + (1 row) > + > + SELECT strpos('abcdef', 'xy') AS "pos_0"; > + pos_0 > + ------- > + 0 > + (1 row) > + > + -- > + -- test replace > + -- > + SELECT replace('abcdef', 'de', '45') AS "abc45f"; > + abc45f > + -------- > + abc45f > + (1 row) > + > + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; > + ya123da123doo > + --------------- > + ya123da123doo > + (1 row) > + > + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; > + yaoo > + ------ > + yaoo > + (1 row) > + > + -- > + -- test split > + -- > + select split('joeuser@mydatabase','@',0) AS "an error"; > + ERROR: field position must be > 0 > + select split('joeuser@mydatabase','@',1) AS "joeuser"; > + joeuser > + --------- > + joeuser > + (1 row) > + > + select split('joeuser@mydatabase','@',2) AS "mydatabase"; > + mydatabase > + ------------ > + mydatabase > + (1 row) > + > + select split('joeuser@mydatabase','@',3) AS "empty string"; > + empty string > + -------------- > + > + (1 row) > + > + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; > + joeuser > + --------- > + joeuser > + (1 row) > + > + -- > + -- test to_hex > + -- > + select to_hex(256*256*256 - 1) AS "ffffff"; > + ffffff > + -------- > + ffffff > + (1 row) > + > + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; > + ffffffff > + ---------- > + ffffffff > + (1 row) > + > Index: src/test/regress/sql/strings.sql > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v > retrieving revision 1.8 > diff -c -r1.8 strings.sql > *** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8 > --- src/test/regress/sql/strings.sql 16 Aug 2002 18:53:13 -0000 > *************** > *** 197,199 **** > --- 197,292 ---- > SELECT text 'text' || char(20) ' and characters' AS "Concat text to char"; > > SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar"; > + > + -- > + -- test substr with toasted text values > + -- > + CREATE TABLE toasttest(f1 text); > + > + insert into toasttest values(repeat('1234567890',10000)); > + insert into toasttest values(repeat('1234567890',10000)); > + > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + > + DROP TABLE toasttest; > + > + -- > + -- test substr with toasted bytea values > + -- > + CREATE TABLE toasttest(f1 bytea); > + > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + > + DROP TABLE toasttest; > + > + -- > + -- test length > + -- > + > + SELECT length('abcdef') AS "length_6"; > + > + -- > + -- test strpos > + -- > + > + SELECT strpos('abcdef', 'cd') AS "pos_3"; > + > + SELECT strpos('abcdef', 'xy') AS "pos_0"; > + > + -- > + -- test replace > + -- > + SELECT replace('abcdef', 'de', '45') AS "abc45f"; > + > + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; > + > + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; > + > + -- > + -- test split > + -- > + select split('joeuser@mydatabase','@',0) AS "an error"; > + > + select split('joeuser@mydatabase','@',1) AS "joeuser"; > + > + select split('joeuser@mydatabase','@',2) AS "mydatabase"; > + > + select split('joeuser@mydatabase','@',3) AS "empty string"; > + > + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; > + > + -- > + -- test to_hex > + -- > + select to_hex(256*256*256 - 1) AS "ffffff"; > + > + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Patch applied. Thanks. --------------------------------------------------------------------------- Joe Conway wrote: > Joe Conway wrote: > > I took Tom's advice and added wrapper functions around text_substr() and > > bytea_substr() to cover the 2 argument case. > > > > I also added tests to strings.sql to cover substr() on toasted columns > > of both text and bytea. > > > > Please replace the original patch (substr.2002.08.14.1.patch) with the > attached. It includes everything from the previous one, plus newly > implemented builtin functions: > > replace(string, from, to) > -- replaces all occurrences of "from" in "string" to "to" > split(string, fldsep, column) > -- splits "string" on "fldsep" and returns "column" number piece > to_hex(int32_num) & to_hex(int64_num) > -- takes integer number and returns as hex string > > All previously discussed on the list; see thread at: > http://archives.postgresql.org/pgsql-hackers/2002-07/msg00511.php > > Examples: > > SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; > ya123da123doo > --------------- > ya123da123doo > (1 row) > > select split('joeuser@mydatabase','@',1) AS "joeuser"; > joeuser > --------- > joeuser > (1 row) > > select split('joeuser@mydatabase','@',2) AS "mydatabase"; > mydatabase > ------------ > mydatabase > (1 row) > > select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS > "ffffffff"; > ffffffff > ---------- > ffffffff > (1 row) > > Tests have been added to the regression suite. > > Passes all regression tests. I've checked the strings.sql script in a > multibyte database and it works fine also. I'd appreciate a good look by > someone more familiar with multibyte related issues though. > > If it is OK, I'd like to hold off on docs until this is committed and > after beta starts. > > If there are no objections, please apply. > > Thanks, > > Joe > Index: src/backend/utils/adt/varlena.c > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/varlena.c,v > retrieving revision 1.87 > diff -c -r1.87 varlena.c > *** src/backend/utils/adt/varlena.c 4 Aug 2002 06:44:47 -0000 1.87 > --- src/backend/utils/adt/varlena.c 16 Aug 2002 19:54:03 -0000 > *************** > *** 18,23 **** > --- 18,25 ---- > > #include "mb/pg_wchar.h" > #include "miscadmin.h" > + #include "access/tuptoaster.h" > + #include "lib/stringinfo.h" > #include "utils/builtins.h" > #include "utils/pg_locale.h" > > *************** > *** 27,34 **** > --- 29,62 ---- > #define DatumGetUnknownP(X) ((unknown *) PG_DETOAST_DATUM(X)) > #define PG_GETARG_UNKNOWN_P(n) DatumGetUnknownP(PG_GETARG_DATUM(n)) > #define PG_RETURN_UNKNOWN_P(x) PG_RETURN_POINTER(x) > + #define PG_TEXTARG_GET_STR(arg_) \ > + DatumGetCString(DirectFunctionCall1(textout, PG_GETARG_DATUM(arg_))) > + #define PG_TEXT_GET_STR(textp_) \ > + DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp_))) > + #define PG_STR_GET_TEXT(str_) \ > + DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum(str_))) > + #define TEXTLEN(textp) \ > + text_length(PointerGetDatum(textp)) > + #define TEXTPOS(buf_text, from_sub_text) \ > + text_position(PointerGetDatum(buf_text), PointerGetDatum(from_sub_text), 1) > + #define TEXTDUP(textp) \ > + DatumGetTextPCopy(PointerGetDatum(textp)) > + #define LEFT(buf_text, from_sub_text) \ > + text_substring(PointerGetDatum(buf_text), \ > + 1, \ > + TEXTPOS(buf_text, from_sub_text) - 1, false) > + #define RIGHT(buf_text, from_sub_text, from_sub_text_len) \ > + text_substring(PointerGetDatum(buf_text), \ > + TEXTPOS(buf_text, from_sub_text) + from_sub_text_len, \ > + -1, true) > > static int text_cmp(text *arg1, text *arg2); > + static int32 text_length(Datum str); > + static int32 text_position(Datum str, Datum search_str, int matchnum); > + static text *text_substring(Datum str, > + int32 start, > + int32 length, > + bool length_not_specified); > > > /***************************************************************************** > *************** > *** 285,303 **** > Datum > textlen(PG_FUNCTION_ARGS) > { > ! text *t = PG_GETARG_TEXT_P(0); > > ! #ifdef MULTIBYTE > ! /* optimization for single byte encoding */ > ! if (pg_database_encoding_max_length() <= 1) > ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); > ! > ! PG_RETURN_INT32( > ! pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ) > ! ); > ! #else > ! PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ); > ! #endif > } > > /* > --- 313,348 ---- > Datum > textlen(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(text_length(PG_GETARG_DATUM(0))); > ! } > > ! /* > ! * text_length - > ! * Does the real work for textlen() > ! * This is broken out so it can be called directly by other string processing > ! * functions. > ! */ > ! static int32 > ! text_length(Datum str) > ! { > ! /* fastpath when max encoding length is one */ > ! if (pg_database_encoding_max_length() == 1) > ! PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); > ! > ! if (pg_database_encoding_max_length() > 1) > ! { > ! text *t = DatumGetTextP(str); > ! > ! PG_RETURN_INT32(pg_mbstrlen_with_len(VARDATA(t), > ! VARSIZE(t) - VARHDRSZ)); > ! } > ! > ! /* should never get here */ > ! elog(ERROR, "Invalid backend encoding; encoding max length " > ! "is less than one."); > ! > ! /* not reached: suppress compiler warning */ > ! return 0; > } > > /* > *************** > *** 308,316 **** > Datum > textoctetlen(PG_FUNCTION_ARGS) > { > ! text *arg = PG_GETARG_TEXT_P(0); > ! > ! PG_RETURN_INT32(VARSIZE(arg) - VARHDRSZ); > } > > /* > --- 353,359 ---- > Datum > textoctetlen(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); > } > > /* > *************** > *** 382,471 **** > * - Thomas Lockhart 1998-12-10 > * Now uses faster TOAST-slicing interface > * - John Gray 2002-02-22 > */ > Datum > text_substr(PG_FUNCTION_ARGS) > { > ! text *string; > ! int32 m = PG_GETARG_INT32(1); > ! int32 n = PG_GETARG_INT32(2); > ! int32 sm; > ! int32 sn; > ! int eml = 1; > ! #ifdef MULTIBYTE > ! int i; > ! int len; > ! text *ret; > ! char *p; > ! #endif > > ! /* > ! * starting position before the start of the string? then offset into > ! * the string per SQL92 spec... > ! */ > ! if (m < 1) > { > ! n += (m - 1); > ! m = 1; > ! } > ! /* Check for m > octet length is made in TOAST access routine */ > > ! /* m will now become a zero-based starting position */ > ! sm = m - 1; > ! sn = n; > > ! #ifdef MULTIBYTE > ! eml = pg_database_encoding_max_length (); > > ! if (eml > 1) > { > ! sm = 0; > ! if (n > -1) > ! sn = (m + n) * eml + 3; /* +3 to avoid mb characters overhanging slice end */ > else > ! sn = n; /* n < 0 is special-cased by heap_tuple_untoast_attr_slice */ > ! } > ! #endif > > ! string = PG_GETARG_TEXT_P_SLICE (0, sm, sn); > > ! if (eml == 1) > ! { > ! PG_RETURN_TEXT_P (string); > ! } > ! #ifndef MULTIBYTE > ! PG_RETURN_NULL(); /* notreached: suppress compiler warning */ > ! #endif > ! #ifdef MULTIBYTE > ! if (n > -1) > ! len = pg_mbstrlen_with_len (VARDATA (string), sn - 3); > ! else /* n < 0 is special-cased; need full string length */ > ! len = pg_mbstrlen_with_len (VARDATA (string), VARSIZE(string)-VARHDRSZ); > ! > ! if (m > len) > ! { > ! m = 1; > ! n = 0; > ! } > ! m--; > ! if (((m + n) > len) || (n < 0)) > ! n = (len - m); > ! > ! p = VARDATA(string); > ! for (i = 0; i < m; i++) > ! p += pg_mblen(p); > ! m = p - VARDATA(string); > ! for (i = 0; i < n; i++) > ! p += pg_mblen(p); > ! n = p - (VARDATA(string) + m); > > ! ret = (text *) palloc(VARHDRSZ + n); > ! VARATT_SIZEP(ret) = VARHDRSZ + n; > > ! memcpy(VARDATA(ret), VARDATA(string) + m, n); > > ! PG_RETURN_TEXT_P(ret); > ! #endif > } > > /* > --- 425,625 ---- > * - Thomas Lockhart 1998-12-10 > * Now uses faster TOAST-slicing interface > * - John Gray 2002-02-22 > + * Remove "#ifdef MULTIBYTE" and test for encoding_max_length instead. Change > + * behaviors conflicting with SQL92 to meet SQL92 (if E = S + L < S throw > + * error; if E < 1, return '', not entire string). Fixed MB related bug when > + * S > LC and < LC + 4 sometimes garbage characters are returned. > + * - Joe Conway 2002-08-10 > */ > Datum > text_substr(PG_FUNCTION_ARGS) > { > ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), > ! PG_GETARG_INT32(1), > ! PG_GETARG_INT32(2), > ! false)); > ! } > > ! /* > ! * text_substr_no_len - > ! * Wrapper to avoid opr_sanity failure due to > ! * one function accepting a different number of args. > ! */ > ! Datum > ! text_substr_no_len(PG_FUNCTION_ARGS) > ! { > ! PG_RETURN_TEXT_P(text_substring(PG_GETARG_DATUM(0), > ! PG_GETARG_INT32(1), > ! -1, true)); > ! } > ! > ! /* > ! * text_substring - > ! * Does the real work for text_substr() and text_substr_no_len() > ! * This is broken out so it can be called directly by other string processing > ! * functions. > ! */ > ! static text* > ! text_substring(Datum str, int32 start, int32 length, bool length_not_specified) > ! { > ! int32 eml = pg_database_encoding_max_length(); > ! int32 S = start; /* start position */ > ! int32 S1; /* adjusted start position */ > ! int32 L1; /* adjusted substring length */ > ! > ! /* life is easy if the encoding max length is 1 */ > ! if (eml == 1) > { > ! S1 = Max(S, 1); > > ! if (length_not_specified) /* special case - get length to end of string */ > ! L1 = -1; > ! else > ! { > ! /* end position */ > ! int E = S + length; > > ! /* > ! * A negative value for L is the only way for the end position > ! * to be before the start. SQL99 says to throw an error. > ! */ > ! if (E < S) > ! elog(ERROR, "negative substring length not allowed"); > > ! /* > ! * A zero or negative value for the end position can happen if the start > ! * was negative or one. SQL99 says to return a zero-length string. > ! */ > ! if (E < 1) > ! return PG_STR_GET_TEXT(""); > ! > ! L1 = E - S1; > ! } > ! > ! /* > ! * If the start position is past the end of the string, > ! * SQL99 says to return a zero-length string -- > ! * PG_GETARG_TEXT_P_SLICE() will do that for us. > ! * Convert to zero-based starting position > ! */ > ! return DatumGetTextPSlice(str, S1 - 1, L1); > ! } > ! else if (eml > 1) > { > ! /* > ! * When encoding max length is > 1, we can't get LC without > ! * detoasting, so we'll grab a conservatively large slice > ! * now and go back later to do the right thing > ! */ > ! int32 slice_start; > ! int32 slice_size; > ! int32 slice_strlen; > ! text *slice; > ! int32 E1; > ! int32 i; > ! char *p; > ! char *s; > ! text *ret; > ! > ! /* > ! * if S is past the end of the string, the tuple toaster > ! * will return a zero-length string to us > ! */ > ! S1 = Max(S, 1); > ! > ! /* > ! * We need to start at position zero because there is no > ! * way to know in advance which byte offset corresponds to > ! * the supplied start position. > ! */ > ! slice_start = 0; > ! > ! if (length_not_specified) /* special case - get length to end of string */ > ! slice_size = L1 = -1; > else > ! { > ! int E = S + length; > ! > ! /* > ! * A negative value for L is the only way for the end position > ! * to be before the start. SQL99 says to throw an error. > ! */ > ! if (E < S) > ! elog(ERROR, "negative substring length not allowed"); > > ! /* > ! * A zero or negative value for the end position can happen if the start > ! * was negative or one. SQL99 says to return a zero-length string. > ! */ > ! if (E < 1) > ! return PG_STR_GET_TEXT(""); > > ! /* > ! * if E is past the end of the string, the tuple toaster > ! * will truncate the length for us > ! */ > ! L1 = E - S1; > ! > ! /* > ! * Total slice size in bytes can't be any longer than the start > ! * position plus substring length times the encoding max length. > ! */ > ! slice_size = (S1 + L1) * eml; > ! } > ! slice = DatumGetTextPSlice(str, slice_start, slice_size); > > ! /* see if we got back an empty string */ > ! if ((VARSIZE(slice) - VARHDRSZ) == 0) > ! return PG_STR_GET_TEXT(""); > > ! /* Now we can get the actual length of the slice in MB characters */ > ! slice_strlen = pg_mbstrlen_with_len (VARDATA(slice), VARSIZE(slice) - VARHDRSZ); > > ! /* Check that the start position wasn't > slice_strlen. If so, > ! * SQL99 says to return a zero-length string. > ! */ > ! if (S1 > slice_strlen) > ! return PG_STR_GET_TEXT(""); > ! > ! /* > ! * Adjust L1 and E1 now that we know the slice string length. > ! * Again remember that S1 is one based, and slice_start is zero based. > ! */ > ! if (L1 > -1) > ! E1 = Min(S1 + L1 , slice_start + 1 + slice_strlen); > ! else > ! E1 = slice_start + 1 + slice_strlen; > ! > ! /* > ! * Find the start position in the slice; > ! * remember S1 is not zero based > ! */ > ! p = VARDATA(slice); > ! for (i = 0; i < S1 - 1; i++) > ! p += pg_mblen(p); > ! > ! /* hang onto a pointer to our start position */ > ! s = p; > ! > ! /* > ! * Count the actual bytes used by the substring of > ! * the requested length. > ! */ > ! for (i = S1; i < E1; i++) > ! p += pg_mblen(p); > ! > ! ret = (text *) palloc(VARHDRSZ + (p - s)); > ! VARATT_SIZEP(ret) = VARHDRSZ + (p - s); > ! memcpy(VARDATA(ret), s, (p - s)); > ! > ! return ret; > ! } > ! else > ! elog(ERROR, "Invalid backend encoding; encoding max length " > ! "is less than one."); > ! > ! /* not reached: suppress compiler warning */ > ! return PG_STR_GET_TEXT(""); > } > > /* > *************** > *** 481,536 **** > Datum > textpos(PG_FUNCTION_ARGS) > { > ! text *t1 = PG_GETARG_TEXT_P(0); > ! text *t2 = PG_GETARG_TEXT_P(1); > ! int pos; > ! int px, > ! p; > ! int len1, > len2; > - pg_wchar *p1, > - *p2; > > ! #ifdef MULTIBYTE > ! pg_wchar *ps1, > ! *ps2; > ! #endif > > if (VARSIZE(t2) <= VARHDRSZ) > PG_RETURN_INT32(1); /* result for empty pattern */ > > len1 = (VARSIZE(t1) - VARHDRSZ); > len2 = (VARSIZE(t2) - VARHDRSZ); > ! #ifdef MULTIBYTE > ! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1); > ! len1 = pg_wchar_strlen(p1); > ! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2); > ! len2 = pg_wchar_strlen(p2); > ! #else > ! p1 = VARDATA(t1); > ! p2 = VARDATA(t2); > ! #endif > ! pos = 0; > px = (len1 - len2); > ! for (p = 0; p <= px; p++) > { > ! #ifdef MULTIBYTE > ! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0)) > ! #else > ! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) > ! #endif > { > ! pos = p + 1; > ! break; > ! }; > ! p1++; > ! }; > ! #ifdef MULTIBYTE > ! pfree(ps1); > ! pfree(ps2); > ! #endif > PG_RETURN_INT32(pos); > } > > --- 635,729 ---- > Datum > textpos(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(text_position(PG_GETARG_DATUM(0), PG_GETARG_DATUM(1), 1)); > ! } > ! > ! /* > ! * text_position - > ! * Does the real work for textpos() > ! * This is broken out so it can be called directly by other string processing > ! * functions. > ! */ > ! static int32 > ! text_position(Datum str, Datum search_str, int matchnum) > ! { > ! int eml = pg_database_encoding_max_length(); > ! text *t1 = DatumGetTextP(str); > ! text *t2 = DatumGetTextP(search_str); > ! int match = 0, > ! pos = 0, > ! p = 0, > ! px, > ! len1, > len2; > > ! if(matchnum == 0) > ! return 0; /* result for 0th match */ > > if (VARSIZE(t2) <= VARHDRSZ) > PG_RETURN_INT32(1); /* result for empty pattern */ > > len1 = (VARSIZE(t1) - VARHDRSZ); > len2 = (VARSIZE(t2) - VARHDRSZ); > ! > ! /* no use in searching str past point where search_str will fit */ > px = (len1 - len2); > ! > ! if (eml == 1) /* simple case - single byte encoding */ > { > ! char *p1, > ! *p2; > ! > ! p1 = VARDATA(t1); > ! p2 = VARDATA(t2); > ! > ! for (p = 0; p <= px; p++) > { > ! if ((*p2 == *p1) && (strncmp(p1, p2, len2) == 0)) > ! { > ! if (++match == matchnum) > ! { > ! pos = p + 1; > ! break; > ! } > ! } > ! p1++; > ! } > ! } > ! else if (eml > 1) /* not as simple - multibyte encoding */ > ! { > ! pg_wchar *p1, > ! *p2, > ! *ps1, > ! *ps2; > ! > ! ps1 = p1 = (pg_wchar *) palloc((len1 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t1), p1, len1); > ! len1 = pg_wchar_strlen(p1); > ! ps2 = p2 = (pg_wchar *) palloc((len2 + 1) * sizeof(pg_wchar)); > ! (void) pg_mb2wchar_with_len((unsigned char *) VARDATA(t2), p2, len2); > ! len2 = pg_wchar_strlen(p2); > ! > ! for (p = 0; p <= px; p++) > ! { > ! if ((*p2 == *p1) && (pg_wchar_strncmp(p1, p2, len2) == 0)) > ! { > ! if (++match == matchnum) > ! { > ! pos = p + 1; > ! break; > ! } > ! } > ! p1++; > ! } > ! > ! pfree(ps1); > ! pfree(ps2); > ! } > ! else > ! elog(ERROR, "Invalid backend encoding; encoding max length " > ! "is less than one."); > ! > PG_RETURN_INT32(pos); > } > > *************** > *** 758,766 **** > Datum > byteaoctetlen(PG_FUNCTION_ARGS) > { > ! bytea *v = PG_GETARG_BYTEA_P(0); > ! > ! PG_RETURN_INT32(VARSIZE(v) - VARHDRSZ); > } > > /* > --- 951,957 ---- > Datum > byteaoctetlen(PG_FUNCTION_ARGS) > { > ! PG_RETURN_INT32(toast_raw_datum_size(PG_GETARG_DATUM(0)) - VARHDRSZ); > } > > /* > *************** > *** 805,810 **** > --- 996,1003 ---- > PG_RETURN_BYTEA_P(result); > } > > + #define PG_STR_GET_BYTEA(str_) \ > + DatumGetByteaP(DirectFunctionCall1(byteain, CStringGetDatum(str_))) > /* > * bytea_substr() > * Return a substring starting at the specified position. > *************** > *** 813,845 **** > * Input: > * - string > * - starting position (is one-based) > ! * - string length > * > * If the starting position is zero or less, then return from the start of the string > * adjusting the length to be consistent with the "negative start" per SQL92. > ! * If the length is less than zero, return the remaining string. > ! * > */ > Datum > bytea_substr(PG_FUNCTION_ARGS) > { > ! int32 m = PG_GETARG_INT32(1); > ! int32 n = PG_GETARG_INT32(2); > > ! /* > ! * starting position before the start of the string? then offset into > ! * the string per SQL92 spec... > ! */ > ! if (m < 1) > { > ! n += (m - 1); > ! m = 1; > } > > ! /* m will now become a zero-based starting position */ > ! m--; > > ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, m, n)); > } > > /* > --- 1006,1076 ---- > * Input: > * - string > * - starting position (is one-based) > ! * - string length (optional) > * > * If the starting position is zero or less, then return from the start of the string > * adjusting the length to be consistent with the "negative start" per SQL92. > ! * If the length is less than zero, an ERROR is thrown. If no third argument > ! * (length) is provided, the length to the end of the string is assumed. > */ > Datum > bytea_substr(PG_FUNCTION_ARGS) > { > ! int S = PG_GETARG_INT32(1); /* start position */ > ! int S1; /* adjusted start position */ > ! int L1; /* adjusted substring length */ > > ! S1 = Max(S, 1); > ! > ! if (fcinfo->nargs == 2) > ! { > ! /* > ! * Not passed a length - PG_GETARG_BYTEA_P_SLICE() > ! * grabs everything to the end of the string if we pass it > ! * a negative value for length. > ! */ > ! L1 = -1; > ! } > ! else > { > ! /* end position */ > ! int E = S + PG_GETARG_INT32(2); > ! > ! /* > ! * A negative value for L is the only way for the end position > ! * to be before the start. SQL99 says to throw an error. > ! */ > ! if (E < S) > ! elog(ERROR, "negative substring length not allowed"); > ! > ! /* > ! * A zero or negative value for the end position can happen if the start > ! * was negative or one. SQL99 says to return a zero-length string. > ! */ > ! if (E < 1) > ! PG_RETURN_BYTEA_P(PG_STR_GET_BYTEA("")); > ! > ! L1 = E - S1; > } > > ! /* > ! * If the start position is past the end of the string, > ! * SQL99 says to return a zero-length string -- > ! * PG_GETARG_TEXT_P_SLICE() will do that for us. > ! * Convert to zero-based starting position > ! */ > ! PG_RETURN_BYTEA_P(PG_GETARG_BYTEA_P_SLICE (0, S1 - 1, L1)); > ! } > > ! /* > ! * bytea_substr_no_len - > ! * Wrapper to avoid opr_sanity failure due to > ! * one function accepting a different number of args. > ! */ > ! Datum > ! bytea_substr_no_len(PG_FUNCTION_ARGS) > ! { > ! return bytea_substr(fcinfo); > } > > /* > *************** > *** 1422,1424 **** > --- 1653,1834 ---- > > PG_RETURN_INT32(cmp); > } > + > + /* > + * replace_text > + * replace all occurences of 'old_sub_str' in 'orig_str' > + * with 'new_sub_str' to form 'new_str' > + * > + * returns 'orig_str' if 'old_sub_str' == '' or 'orig_str' == '' > + * otherwise returns 'new_str' > + */ > + Datum > + replace_text(PG_FUNCTION_ARGS) > + { > + text *left_text; > + text *right_text; > + text *buf_text; > + text *ret_text; > + int curr_posn; > + text *src_text = PG_GETARG_TEXT_P(0); > + int src_text_len = TEXTLEN(src_text); > + text *from_sub_text = PG_GETARG_TEXT_P(1); > + int from_sub_text_len = TEXTLEN(from_sub_text); > + text *to_sub_text = PG_GETARG_TEXT_P(2); > + char *to_sub_str = PG_TEXT_GET_STR(to_sub_text); > + StringInfo str = makeStringInfo(); > + > + if (src_text_len == 0 || from_sub_text_len == 0) > + PG_RETURN_TEXT_P(src_text); > + > + buf_text = TEXTDUP(src_text); > + curr_posn = TEXTPOS(buf_text, from_sub_text); > + > + while (curr_posn > 0) > + { > + left_text = LEFT(buf_text, from_sub_text); > + right_text = RIGHT(buf_text, from_sub_text, from_sub_text_len); > + > + appendStringInfo(str, PG_TEXT_GET_STR(left_text)); > + appendStringInfo(str, to_sub_str); > + > + pfree(buf_text); > + pfree(left_text); > + buf_text = right_text; > + curr_posn = TEXTPOS(buf_text, from_sub_text); > + } > + > + appendStringInfo(str, PG_TEXT_GET_STR(buf_text)); > + pfree(buf_text); > + > + ret_text = PG_STR_GET_TEXT(str->data); > + pfree(str->data); > + pfree(str); > + > + PG_RETURN_TEXT_P(ret_text); > + } > + > + /* > + * split_text > + * parse input string > + * return ord item (1 based) > + * based on provided field separator > + */ > + Datum > + split_text(PG_FUNCTION_ARGS) > + { > + text *inputstring = PG_GETARG_TEXT_P(0); > + int inputstring_len = TEXTLEN(inputstring); > + text *fldsep = PG_GETARG_TEXT_P(1); > + int fldsep_len = TEXTLEN(fldsep); > + int fldnum = PG_GETARG_INT32(2); > + int start_posn = 0; > + int end_posn = 0; > + text *result_text; > + > + /* return empty string for empty input string */ > + if (inputstring_len < 1) > + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); > + > + /* empty field separator */ > + if (fldsep_len < 1) > + { > + if (fldnum == 1) /* first field - just return the input string */ > + PG_RETURN_TEXT_P(inputstring); > + else /* otherwise return an empty string */ > + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); > + } > + > + /* field number is 1 based */ > + if (fldnum < 1) > + elog(ERROR, "field position must be > 0"); > + > + start_posn = text_position(PointerGetDatum(inputstring), > + PointerGetDatum(fldsep), > + fldnum - 1); > + end_posn = text_position(PointerGetDatum(inputstring), > + PointerGetDatum(fldsep), > + fldnum); > + > + if ((start_posn == 0) && (end_posn == 0)) /* fldsep not found */ > + { > + if (fldnum == 1) /* first field - just return the input string */ > + PG_RETURN_TEXT_P(inputstring); > + else /* otherwise return an empty string */ > + PG_RETURN_TEXT_P(PG_STR_GET_TEXT("")); > + } > + else if ((start_posn != 0) && (end_posn == 0)) > + { > + /* last field requested */ > + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, -1, true); > + PG_RETURN_TEXT_P(result_text); > + } > + else if ((start_posn == 0) && (end_posn != 0)) > + { > + /* first field requested */ > + result_text = LEFT(inputstring, fldsep); > + PG_RETURN_TEXT_P(result_text); > + } > + else > + { > + /* prior to last field requested */ > + result_text = text_substring(PointerGetDatum(inputstring), start_posn + fldsep_len, end_posn - start_posn - fldsep_len,false); > + PG_RETURN_TEXT_P(result_text); > + } > + } > + > + #define HEXBASE 16 > + /* > + * Convert a int32 to a string containing a base 16 (hex) representation of > + * the number. > + */ > + Datum > + to_hex32(PG_FUNCTION_ARGS) > + { > + static char digits[] = "0123456789abcdef"; > + char buf[32]; /* bigger than needed, but reasonable */ > + char *ptr, > + *end; > + text *result_text; > + int32 value = PG_GETARG_INT32(0); > + > + end = ptr = buf + sizeof(buf) - 1; > + *ptr = '\0'; > + > + do > + { > + *--ptr = digits[value % HEXBASE]; > + value /= HEXBASE; > + } while (ptr > buf && value); > + > + result_text = PG_STR_GET_TEXT(ptr); > + PG_RETURN_TEXT_P(result_text); > + } > + > + /* > + * Convert a int64 to a string containing a base 16 (hex) representation of > + * the number. > + */ > + Datum > + to_hex64(PG_FUNCTION_ARGS) > + { > + static char digits[] = "0123456789abcdef"; > + char buf[32]; /* bigger than needed, but reasonable */ > + char *ptr, > + *end; > + text *result_text; > + int64 value = PG_GETARG_INT64(0); > + > + end = ptr = buf + sizeof(buf) - 1; > + *ptr = '\0'; > + > + do > + { > + *--ptr = digits[value % HEXBASE]; > + value /= HEXBASE; > + } while (ptr > buf && value); > + > + result_text = PG_STR_GET_TEXT(ptr); > + PG_RETURN_TEXT_P(result_text); > + } > + > Index: src/include/catalog/pg_proc.h > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/include/catalog/pg_proc.h,v > retrieving revision 1.254 > diff -c -r1.254 pg_proc.h > *** src/include/catalog/pg_proc.h 15 Aug 2002 02:51:27 -0000 1.254 > --- src/include/catalog/pg_proc.h 16 Aug 2002 18:53:13 -0000 > *************** > *** 2121,2127 **** > DESCR("remove initial characters from string"); > DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); > DESCR("remove trailing characters from string"); > ! DATA(insert OID = 883 ( substr PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substr($1, $2, -1)" - _null_)); > DESCR("return portion of string"); > DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); > DESCR("trim both ends of string"); > --- 2121,2127 ---- > DESCR("remove initial characters from string"); > DATA(insert OID = 882 ( rtrim PGNSP PGUID 14 f f t f i 1 25 "25" "select rtrim($1, \' \')" - _null_ )); > DESCR("remove trailing characters from string"); > ! DATA(insert OID = 883 ( substr PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); > DESCR("return portion of string"); > DATA(insert OID = 884 ( btrim PGNSP PGUID 12 f f t f i 2 25 "25 25" btrim - _null_ )); > DESCR("trim both ends of string"); > *************** > *** 2130,2137 **** > > DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 937 ( substring PGNSP PGUID 14 f f t f i 2 25 "25 23" "select substring($1, $2, -1)" - _null_)); > DESCR("return portion of string"); > > /* for multi-byte support */ > > --- 2130,2145 ---- > > DATA(insert OID = 936 ( substring PGNSP PGUID 12 f f t f i 3 25 "25 23 23" text_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 937 ( substring PGNSP PGUID 12 f f t f i 2 25 "25 23" text_substr_no_len - _null_ )); > DESCR("return portion of string"); > + DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25 25" replace_text - _null_ )); > + DESCR("replace all occurrences of old_substr with new_substr in string"); > + DATA(insert OID = 2088 ( split PGNSP PGUID 12 f f t f i 3 25 "25 25 23" split_text - _null_ )); > + DESCR("split string by field_sep and return field_num"); > + DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23" to_hex32 - _null_ )); > + DESCR("convert int32 number to hex"); > + DATA(insert OID = 2090 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "20" to_hex64 - _null_ )); > + DESCR("convert int64 number to hex"); > > /* for multi-byte support */ > > *************** > *** 2778,2784 **** > DESCR("concatenate"); > DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 2013 ( substring PGNSP PGUID 14 f f t f i 2 17 "17 23" "select substring($1, $2, -1)"- _null_ )); > DESCR("return portion of string"); > DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); > DESCR("return position of substring"); > --- 2786,2796 ---- > DESCR("concatenate"); > DATA(insert OID = 2012 ( substring PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); > DESCR("return portion of string"); > ! DATA(insert OID = 2013 ( substring PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); > ! DESCR("return portion of string"); > ! DATA(insert OID = 2085 ( substr PGNSP PGUID 12 f f t f i 3 17 "17 23 23" bytea_substr - _null_ )); > ! DESCR("return portion of string"); > ! DATA(insert OID = 2086 ( substr PGNSP PGUID 12 f f t f i 2 17 "17 23" bytea_substr_no_len - _null_ )); > DESCR("return portion of string"); > DATA(insert OID = 2014 ( position PGNSP PGUID 12 f f t f i 2 23 "17 17" byteapos - _null_ )); > DESCR("return position of substring"); > Index: src/include/utils/builtins.h > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/include/utils/builtins.h,v > retrieving revision 1.191 > diff -c -r1.191 builtins.h > *** src/include/utils/builtins.h 15 Aug 2002 02:51:27 -0000 1.191 > --- src/include/utils/builtins.h 16 Aug 2002 18:53:13 -0000 > *************** > *** 447,458 **** > --- 447,463 ---- > extern Datum textoctetlen(PG_FUNCTION_ARGS); > extern Datum textpos(PG_FUNCTION_ARGS); > extern Datum text_substr(PG_FUNCTION_ARGS); > + extern Datum text_substr_no_len(PG_FUNCTION_ARGS); > extern Datum name_text(PG_FUNCTION_ARGS); > extern Datum text_name(PG_FUNCTION_ARGS); > extern int varstr_cmp(char *arg1, int len1, char *arg2, int len2); > extern List *textToQualifiedNameList(text *textval, const char *caller); > extern bool SplitIdentifierString(char *rawstring, char separator, > List **namelist); > + extern Datum replace_text(PG_FUNCTION_ARGS); > + extern Datum split_text(PG_FUNCTION_ARGS); > + extern Datum to_hex32(PG_FUNCTION_ARGS); > + extern Datum to_hex64(PG_FUNCTION_ARGS); > > extern Datum unknownin(PG_FUNCTION_ARGS); > extern Datum unknownout(PG_FUNCTION_ARGS); > *************** > *** 476,481 **** > --- 481,487 ---- > extern Datum byteacat(PG_FUNCTION_ARGS); > extern Datum byteapos(PG_FUNCTION_ARGS); > extern Datum bytea_substr(PG_FUNCTION_ARGS); > + extern Datum bytea_substr_no_len(PG_FUNCTION_ARGS); > > /* version.c */ > extern Datum pgsql_version(PG_FUNCTION_ARGS); > Index: src/test/regress/expected/strings.out > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/test/regress/expected/strings.out,v > retrieving revision 1.12 > diff -c -r1.12 strings.out > *** src/test/regress/expected/strings.out 11 Jun 2002 15:41:38 -0000 1.12 > --- src/test/regress/expected/strings.out 16 Aug 2002 18:53:13 -0000 > *************** > *** 573,575 **** > --- 573,738 ---- > text and varchar > (1 row) > > + -- > + -- test substr with toasted text values > + -- > + CREATE TABLE toasttest(f1 text); > + insert into toasttest values(repeat('1234567890',10000)); > + insert into toasttest values(repeat('1234567890',10000)); > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + substr > + -------- > + 123 > + 123 > + (2 rows) > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + ERROR: negative substring length not allowed > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + DROP TABLE toasttest; > + -- > + -- test substr with toasted bytea values > + -- > + CREATE TABLE toasttest(f1 bytea); > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + substr > + -------- > + 123 > + 123 > + (2 rows) > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + ERROR: negative substring length not allowed > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + substr > + -------- > + 567890 > + 567890 > + (2 rows) > + > + DROP TABLE toasttest; > + -- > + -- test length > + -- > + SELECT length('abcdef') AS "length_6"; > + length_6 > + ---------- > + 6 > + (1 row) > + > + -- > + -- test strpos > + -- > + SELECT strpos('abcdef', 'cd') AS "pos_3"; > + pos_3 > + ------- > + 3 > + (1 row) > + > + SELECT strpos('abcdef', 'xy') AS "pos_0"; > + pos_0 > + ------- > + 0 > + (1 row) > + > + -- > + -- test replace > + -- > + SELECT replace('abcdef', 'de', '45') AS "abc45f"; > + abc45f > + -------- > + abc45f > + (1 row) > + > + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; > + ya123da123doo > + --------------- > + ya123da123doo > + (1 row) > + > + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; > + yaoo > + ------ > + yaoo > + (1 row) > + > + -- > + -- test split > + -- > + select split('joeuser@mydatabase','@',0) AS "an error"; > + ERROR: field position must be > 0 > + select split('joeuser@mydatabase','@',1) AS "joeuser"; > + joeuser > + --------- > + joeuser > + (1 row) > + > + select split('joeuser@mydatabase','@',2) AS "mydatabase"; > + mydatabase > + ------------ > + mydatabase > + (1 row) > + > + select split('joeuser@mydatabase','@',3) AS "empty string"; > + empty string > + -------------- > + > + (1 row) > + > + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; > + joeuser > + --------- > + joeuser > + (1 row) > + > + -- > + -- test to_hex > + -- > + select to_hex(256*256*256 - 1) AS "ffffff"; > + ffffff > + -------- > + ffffff > + (1 row) > + > + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; > + ffffffff > + ---------- > + ffffffff > + (1 row) > + > Index: src/test/regress/sql/strings.sql > =================================================================== > RCS file: /opt/src/cvs/pgsql-server/src/test/regress/sql/strings.sql,v > retrieving revision 1.8 > diff -c -r1.8 strings.sql > *** src/test/regress/sql/strings.sql 11 Jun 2002 15:41:38 -0000 1.8 > --- src/test/regress/sql/strings.sql 16 Aug 2002 18:53:13 -0000 > *************** > *** 197,199 **** > --- 197,292 ---- > SELECT text 'text' || char(20) ' and characters' AS "Concat text to char"; > > SELECT text 'text' || varchar ' and varchar' AS "Concat text to varchar"; > + > + -- > + -- test substr with toasted text values > + -- > + CREATE TABLE toasttest(f1 text); > + > + insert into toasttest values(repeat('1234567890',10000)); > + insert into toasttest values(repeat('1234567890',10000)); > + > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + > + DROP TABLE toasttest; > + > + -- > + -- test substr with toasted bytea values > + -- > + CREATE TABLE toasttest(f1 bytea); > + > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + insert into toasttest values(decode(repeat('1234567890',10000),'escape')); > + > + -- If the starting position is zero or less, then return from the start of the string > + -- adjusting the length to be consistent with the "negative start" per SQL92. > + SELECT substr(f1, -1, 5) from toasttest; > + > + -- If the length is less than zero, an ERROR is thrown. > + SELECT substr(f1, 5, -1) from toasttest; > + > + -- If no third argument (length) is provided, the length to the end of the > + -- string is assumed. > + SELECT substr(f1, 99995) from toasttest; > + > + -- If start plus length is > string length, the result is truncated to > + -- string length > + SELECT substr(f1, 99995, 10) from toasttest; > + > + DROP TABLE toasttest; > + > + -- > + -- test length > + -- > + > + SELECT length('abcdef') AS "length_6"; > + > + -- > + -- test strpos > + -- > + > + SELECT strpos('abcdef', 'cd') AS "pos_3"; > + > + SELECT strpos('abcdef', 'xy') AS "pos_0"; > + > + -- > + -- test replace > + -- > + SELECT replace('abcdef', 'de', '45') AS "abc45f"; > + > + SELECT replace('yabadabadoo', 'ba', '123') AS "ya123da123doo"; > + > + SELECT replace('yabadoo', 'bad', '') AS "yaoo"; > + > + -- > + -- test split > + -- > + select split('joeuser@mydatabase','@',0) AS "an error"; > + > + select split('joeuser@mydatabase','@',1) AS "joeuser"; > + > + select split('joeuser@mydatabase','@',2) AS "mydatabase"; > + > + select split('joeuser@mydatabase','@',3) AS "empty string"; > + > + select split('@joeuser@mydatabase@','@',2) AS "joeuser"; > + > + -- > + -- test to_hex > + -- > + select to_hex(256*256*256 - 1) AS "ffffff"; > + > + select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS "ffffffff"; > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073