Thread: 8.0.3 regexp_replace()...
I have a stupid problem. My server is running an old version of postgres (8.0.3) and therefore lacks the regexp_replace() function. It does however support substring and replace functions. So what I am trying to do is emulate the regexp_replace() function by creating a function which finds each matching substring and replaces them by hand in a loop. The loop is supposed to exit when there are no more matches to replace. The problem is that the function enters and infinate loop which brings down the server. The faulty code follows: -- PRE-PostgreSQL 8.1 regexp_replace() function called regexp_replacex() CREATE OR REPLACE FUNCTION "regexp_replacex" (source varchar, pattern varchar, replacement varchar) RETURNS varchar AS $body$ DECLARE retvalue VARCHAR; BEGIN retvalue = "source"; LOOP retvalue = REPLACE(retvalue, SUBSTRING(retvalue FROM "pattern"), "replacement"); EXIT WHEN retvalue = REPLACE(retvalue, SUBSTRING(retvalue FROM "pattern"), "replacement"); END LOOP; RETURN retvalue; END; $body$ LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER; Now first of all I realize that my code is stupid and the problem is stupid and if it were up to be I'd be running 8.1 anyway. But its not up to me but I really need this functionality. The parameters might be in a different order than the real 8.1 regexp_replace so pay attention if you test it and do expect to have to kill the server when it enters the infinate loop. I can not beleave that nobody has done this before and yet I can't find it anywhere on the net. I've been searching Google and Google Groups for "postgres replace substring return function" and nothing. Can anyone spot my folly?
I did get the code working. The function DDL follows: CREATE OR REPLACE FUNCTION "webadmin"."regexp_replacex" (source varchar, pattern varchar, replacement varchar) RETURNS varchar AS $body$ DECLARE retvalue VARCHAR; BEGIN retvalue = "source"; LOOP retvalue = REPLACE(retvalue, COALESCE(SUBSTRING(retvalue FROM "pattern"), ''), "replacement"); EXIT WHEN retvalue = REPLACE(retvalue, COALESCE(SUBSTRING(retvalue FROM "pattern"), ''), "replacement"); END LOOP; RETURN retvalue; END; $body$ LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER; The problem was that SUBSTRING returns NULL if it cannot find any matches for the pattern and when the second parameter to REPLACE returns NULL, REPLACE returns NULL (which is idiotic). Using COALESCE I ensure that is SUBSTRING cannot find a match that '' (empty string) is sent to REPLACE. REPLACE then behaves as expected and replaces nothing. Debugging PostgreSQL's retarded behaviour around NULL values can be a real pain. But at least I learned to use EMS PostgreSQL Manager for Windows' function debugger, which can step through a function while reporting the values of all variables, parameters and return values. Very handy. -Robert
"rlee0001" <robeddielee@hotmail.com> writes: > CREATE OR REPLACE FUNCTION "regexp_replacex" (source varchar, pattern > varchar, replacement varchar) RETURNS varchar AS > $body$ > DECLARE > retvalue VARCHAR; > BEGIN > retvalue = "source"; > LOOP > retvalue = REPLACE(retvalue, SUBSTRING(retvalue FROM "pattern"), > "replacement"); > EXIT WHEN retvalue = REPLACE(retvalue, SUBSTRING(retvalue FROM > "pattern"), "replacement"); > END LOOP; > RETURN retvalue; > END; > $body$ > LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER; You probably don't want to be double-quoting the variable names. Also, I think this should be marked STRICT IMMUTABLE rather than the options you have chosen. > if you test it and do expect to have to kill the server when it enters > the infinate loop. Control-C cancels the query just fine for me ... regards, tom lane
On Mon, Jan 30, 2006 at 11:27:23AM -0800, rlee0001 wrote: > The problem was that SUBSTRING returns NULL if it cannot find any > matches for the pattern and when the second parameter to REPLACE > returns NULL, REPLACE returns NULL (which is idiotic). Using COALESCE I > ensure that is SUBSTRING cannot find a match that '' (empty string) is > sent to REPLACE. REPLACE then behaves as expected and replaces nothing. Well, the rule for STRICT functions (which replace is) is that if any of the arguments are NULL, the result is NULL. Most of the time this is what you want. IMHO the problem above is substring returning null. NULL should generally mean "unknown" and a substr that doesn't match certainly isn't unknown. Question is, what should it return then? In SQL2003 standard terms this is a "null-call" function: 4.27 SQL-invoked routines ... A null-call function is an SQL-invoked function that is defined to return the null value if any of its input arguments is the null value. A null-call function is an SQL-invoked function whose <null-call clause> specifies RETURNS NULL ON NULL INPUT. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Mon, 30 Jan 2006, rlee0001 wrote: > I did get the code working. The function DDL follows: > > CREATE OR REPLACE FUNCTION "webadmin"."regexp_replacex" (source > varchar, pattern varchar, replacement varchar) RETURNS varchar AS > $body$ > DECLARE > retvalue VARCHAR; > BEGIN > retvalue = "source"; > LOOP > retvalue = REPLACE(retvalue, COALESCE(SUBSTRING(retvalue FROM > "pattern"), ''), "replacement"); > EXIT WHEN retvalue = REPLACE(retvalue, COALESCE(SUBSTRING(retvalue > FROM "pattern"), ''), "replacement"); > END LOOP; > RETURN retvalue; > END; > $body$ > LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER; > > The problem was that SUBSTRING returns NULL if it cannot find any > matches for the pattern and when the second parameter to REPLACE > returns NULL, REPLACE returns NULL (which is idiotic). Why do you say that? I'd say that's precisely the most reasonable answer. You're asking to replace an unknown portion of a string (since NULL is unknown, it might match some portion of the string, it might not) with something else. The answer to that seems pretty unknown to me.
On Tue, 31 Jan 2006, Martijn van Oosterhout wrote: > On Mon, Jan 30, 2006 at 11:27:23AM -0800, rlee0001 wrote: > > The problem was that SUBSTRING returns NULL if it cannot find any > > matches for the pattern and when the second parameter to REPLACE > > returns NULL, REPLACE returns NULL (which is idiotic). Using COALESCE I > > ensure that is SUBSTRING cannot find a match that '' (empty string) is > > sent to REPLACE. REPLACE then behaves as expected and replaces nothing. > > Well, the rule for STRICT functions (which replace is) is that if any > of the arguments are NULL, the result is NULL. Most of the time this is > what you want. IMHO the problem above is substring returning null. NULL > should generally mean "unknown" and a substr that doesn't match > certainly isn't unknown. Question is, what should it return then? Sadly, that seems to me to match the SQL2003 semantics for its regular expression substring search. Or at least I believe that's what 6.29 GR5g is implying.
Martijn, (Warning: This post contains somewhat of a long rant followed by a question.) I realize that NULL is the unknown value in SQL and that (most) functions therefore treat it as such. I have no problem with "RETURNS NULL ON NULL INPUT" except when a function returns NULL for no good reason. If I were the ruler of the world I would declare that: 1) All functions correctly treat NULL input as "unknown" and if this prevents them from returning a logical value they must return NULL. (SQL has this already) 2) If an error occurs within a function which prevents the function from returning a logical value (such as invalid inputs) the function must raise an exception and not return. IE: No value is returned at all -- not even NULL. PostgreSQL seems to already do this. 3) If a function can not calculate a return value due to some ambiguity it should return NULL. This is really just an extention of rule 1. 4) If a function can calculate a return value (no processing error, NULL input or other ambiguity exists) it MUST NOT return NULL. My problem is with SUBSTRING. When it fails to find a match for the regexp pattern within the source string it returns NULL. Why?! There is no ambiguity nor NULL inputs! The result of the function is known to logically by "No Results". So the only question is this: How do you logically return that from a function? a) An empty string is logically incorrect because an empty string is a value and returning it would imply that it was matched. b) NULL is logically incorrect because the result was known. NULL is supposed to be returned when the result is unknown. Does Postgres have a "EMPTY" or "NOTHING" return value? What does a SELECT return when there are no matching records? NULL? Hell no! Then what? Because whatever SELECT returns when it finds no matching records is what SUBSTRING should return when there are no matching substrings. Of course SELECT was never intended to return a SCALAR value either while SUBSTRING was. Maybe "EMPTY" is something that SCALAR values simply are no capable of specifying? I am suggesting that the behaviour of SUBSTRING returning NULL when no matches is found is either a bug in PostgreSQL or a flaw in the SQL specification. It is not logical. Another words I wanted by function to do this (in english): Replace the next occurance of <pattern> in <sourcestring> with <replacementstring>; if found. Repeat for all additional occurances; if any. At no point should any ambiguity ever be introduced into this function so long as none of the three inputs are NULL since no external data source is used. Therefore nothing should ever return NULL during the operation of the function. In my experience having to code "special cases" around the NULL value is one of the biggest annoyances in SQL. Special cases should never have to be coded for specific values that have no bearing on the business logic of the application. As a programmer I want to be reasonably assured that I can anticipate NULL popping up and that anywhere that it might pop up it will be handled gracefully. Obviously I'm somewhat new to SQL (less than 2 years) and especially to PostgreSQL (about 2 months). In addition to returning null on null input, what are the best programming practices for dealing with NULL within a procedure/query? Was coalesce the best way to handle this? Should I have just checked for NULL in my EXIT statement instead? -Robert
"rlee0001" <robeddielee@hotmail.com> writes: > My problem is with SUBSTRING. When it fails to find a match for the > regexp pattern within the source string it returns NULL. Why?! Because the SQL standard says so. Of course, you're free to wrap the built-in function in your own function that has behavior you like better for this case... regards, tom lane
On Tue, 31 Jan 2006, rlee0001 wrote: > I am suggesting that the behaviour of SUBSTRING returning NULL when no > matches is found is either a bug in PostgreSQL or a flaw in the SQL > specification. It is not logical. No, but sadly it seems to be what the SQL spec wants for its similar construct. --- In general, SQL's handling of NULLs is badly designed. Sometimes it's misused (like the substring case). Sometimes it's confusing (like the IN and NOT IN cases). Sometimes it's just bizarre. If cardinality of a table expression is important (ie, count(*) is meaningful), why is DISTINCT defined in a way that basically does not give results consistent with NULL being unknown. UNIQUE(q) should return NULL in the presence of NULLs rather than true, since the real result is well, unknown (the two of these together have the side effect of UNIQUE being true not guaranteeing that the cardinality of a subquery and the subquery with distinct being the same). ---
Stephan, How do IN and NOT IN treat NULLs? Don't these functions search an array for a specified value returning true or false? I guess the intuitive thing for IN and NOT IN to do would be to return NULL if NULL appears anywhere in the array since those elements values are "unknown". Personally I think treating NULL as "unknown" is rediculous. NULL is a value and its value is known to be NULL. I know what NULL is: NULL. How many NULLs do I have here {NULL, 'hi', NULL, NULL}? NULL NULLs? No, three NULLs. How many NULLs are in this string: 'hi'? NULL NULLs? No, zero NULLs. How many occurances of 'yo' are in 'hey'? NULL occurances? No, zero occurances. How many NULLs are in this paragraph? You better count them yourself because if you ask SQL you know what it will say. NULL. Or will it tell you? Who the hell knows!? Oh yeah, my favorite is this: NULL::VARCHAR? Nope, you can't do it. Not without creating your own CAST. Seems to me that an obvious value would be 'NULL'. Or maybe '' (empty string). I hate having to use COALESCE. It just reaks of a bad programming practice. Thats the way I see it. But if NULL has to mean "unknown" then all the functions should treat it as such. Also several other values including UNSPECIFIED and EMPTY should be provided. EMPTY should return an empty array {} and UNSPECIFIED should do what NULL is often used to mean (nothing). Then NULL should be renamed to UNKNOWN to clear up any confusion. :o) But I'm not really the ruler of the world. At least not yet. But maybe someday... -Robert
On Wed, 1 Feb 2006, rlee0001 wrote: > How do IN and NOT IN treat NULLs? Don't these functions search an array > for a specified value returning true or false? I guess the intuitive > thing for IN and NOT IN to do would be to return NULL if NULL appears > anywhere in the array since those elements values are "unknown". It's IN and NOT IN (subselect) that people often get confused by, exactly because it does return NULL which means that a row not selected by IN may also not be selected by NOT IN. > Personally I think treating NULL as "unknown" is rediculous. NULL is a > value and its value is known to be NULL. I know what NULL is: NULL. The problem is that NULL isn't a known string, numeric, time, etc value. Until you define semantics for it, you don't really have a value. Those semantics could be alot simpler than the SQL ones however. > Oh yeah, my favorite is this: NULL::VARCHAR? Nope, you can't do it. Not > without creating your own CAST. Seems to me that an obvious value would > be 'NULL'. Or maybe '' (empty string). I hate having to use COALESCE. > It just reaks of a bad programming practice. I don't see how using COALESCE is particularly worse than using CAST, honestly. CAST(NULL AS VARCHAR(n)) versus COALESCE(NULL, <what you want out>) seems pretty much a wash, unless you want it to happen implicitly. > Thats the way I see it. But if NULL has to mean "unknown" then all the > functions should treat it as such. Also several other values including > UNSPECIFIED and EMPTY should be provided. EMPTY should return an empty > array {} and UNSPECIFIED should do what NULL is often used to mean > (nothing). Then NULL should be renamed to UNKNOWN to clear up any > confusion. :o) Well, yes, keeping the separate uses of NULL separate would have been nice.
On 2006-02-01, rlee0001 <robeddielee@hotmail.com> wrote: > Stephan, > > How do IN and NOT IN treat NULLs? Don't these functions search an array > for a specified value returning true or false? I guess the intuitive > thing for IN and NOT IN to do would be to return NULL if NULL appears > anywhere in the array since those elements values are "unknown". foo IN (x1,x2,x3) is exactly equivalent to (foo = x1) OR (foo = x2) OR (foo = x3) foo NOT IN (x1,x2,x3) is likewise equivalent to (foo <> x1) AND (foo <> x2) AND (foo <> x3) In the first case, if one of the x? is null, then the result of the expression is true if any of the clauses is true, or null otherwise; TRUE OR NULL is true, while FALSE OR NULL is null. In the second case, the result is likewise determined by the logic of three-valued AND. Since TRUE AND NULL is null, and FALSE AND NULL is false, that means that the expression can never return true if any of the x? is null. > Oh yeah, my favorite is this: NULL::VARCHAR? Nope, you can't do it. => select null::varchar; varchar --------- (1 row) works for me. (Note: that's not an empty string; use \pset null in psql to see the difference.) > Not without creating your own CAST. Casting from what? NULL isn't a type... > Seems to me that an obvious value would be 'NULL'. Or maybe '' > (empty string). If NULL ever got converted to 'NULL' or '', how would you distinguish it from the literal 'NULL' or ''? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services