Thread: migrating from mysql: need to convert empty string to null
Hi, We have an existing (PHP) code base that is being converted to use PostgreSQL from MySQL. In most places, our insert and update statements are formed using single quoted values, and when there is no value, the empty string is being passed in. PostgreSQL objects to empty strings for certain column types, particularly numeric columns. We're trying to get this conversion done quickly, and so a solution involving customization of string to X type conversion is what we're after. What I first tried to do was, CREATE OR REPLACE FUNCTION varchar_to_int_with_empty_string_handling(varchar) RETURNS integer AS $$ SELECT CASE WHEN $1 = '' THEN NULL ELSE pg_catalog.int4($1) END $$ IMMUTABLE LANGUAGE SQL; DROP CAST IF EXISTS (varchar AS integer); CREATE CAST (varchar AS integer) WITH FUNCTION varchar_to_int_with_empty_string_handling(varchar) AS ASSIGNMENT; This seems broken, when loading this file a second or subsequent time (we append mysql compat functions to it as we progress), there is an error saying pg_catalog.int4 doesn't exist. So somehow, the cast above is deleting/disabling/hiding pg_catalog.int4? While experimenting, trying to understand what I'm doing wrong, I ran this query: SELECT castsource::regtype, casttarget::regtype, castfunc::regprocedure, castcontext FROM pg_cast WHERE casttarget = 'int'::regtype and I notice that there isn't any rows specified for converting varchar or text to int. Which raises the question, if I run: SELECT '123'::int; What conversion is actually happening here? Any answers are much appreciated, thanks Dave
"Dave Lee" <davelee.com@gmail.com> writes: > and I notice that there isn't any rows specified for converting > varchar or text to int. Which raises the question, if I run: > SELECT '123'::int; > What conversion is actually happening here? None. There is no type coercion there --- what that is really specifying is invocation of the int data type's input function on the given string. I don't think your approach is going to work; you would have to make your client programs emit things like insert into mytab(integercol) values (''::varchar) in order to get a runtime varchar-to-int cast to be applied. At that point you might as well fix them properly. regards, tom lane
On Wed, Jun 18, 2008 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > None. There is no type coercion there --- what that is really > specifying is invocation of the int data type's input function > on the given string. I thought that something like this may be this was the case. > I don't think your approach is going to work; you would have to > make your client programs emit things like > > insert into mytab(integercol) values (''::varchar) > > in order to get a runtime varchar-to-int cast to be applied. > At that point you might as well fix them properly. I see. Other than directly modifying int4in (is this the one?), is there a way to plug-in our modified empty string handling logic? I'm picturing a scenario where we write write a wrapper function that tests for empty strings and returns NULL, else just calls into int4in, and then dynamically load this function, and finally update the corresponding rows in pg_proc? Is this even possible? thanks for your explanation Tom, Dave
"Dave Lee" <davelee.com@gmail.com> writes: > I see. Other than directly modifying int4in (is this the one?), is > there a way to plug-in our modified empty string handling logic? I'm > picturing a scenario where we write write a wrapper function that > tests for empty strings and returns NULL, else just calls into int4in, > and then dynamically load this function, and finally update the > corresponding rows in pg_proc? Is this even possible? There isn't any really nice way to do that :-(. You could put a wrapper function around int4in but it would not help, because the internal API for datatype input functions doesn't support having them return NULL (see InputFunctionCall in the sources). If you don't want to fix your clients then you'll really have little choice but to hack the Postgres source code. The good news is there are probably not very places you'd have to change, depending on exactly what contexts you need this to happen in. regards, tom lane
On Wed, Jun 18, 2008 at 3:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > There isn't any really nice way to do that :-(. You could put a wrapper > function around int4in but it would not help, because the internal API > for datatype input functions doesn't support having them return NULL > (see InputFunctionCall in the sources). I've read src/backend/utils/fmgr/README and it states that returning NULL is just a matter of setting isnull to true in the FunctionCallInfo struct, and provides a convenience macro, PG_RETURN_NULL. But then, in InputFunctionCall, I presume you're referring to the logic that that only allows NULL to be returned if the source input string is also NULL, otherwise raising an error. At a basic level, you're suggesting I need to take out this NULL check in InputFunctionCall to get the capability I'm looking for? Any idea why the input functions were designed this way? For users migrating from other RDBMs, if no one else, being able to return NULL from an input function seems to be a valuable feature. I might use this hacked functionality to return NULL for '0000-00-00' dates which MySQL foolishly accepts. Maybe it's only useful to former MySQL users thanks, Dave
"Dave Lee" <davelee.com@gmail.com> writes: > I've read src/backend/utils/fmgr/README and it states that returning > NULL is just a matter of setting isnull to true in the > FunctionCallInfo struct, and provides a convenience macro, > PG_RETURN_NULL. But then, in InputFunctionCall, I presume you're > referring to the logic that that only allows NULL to be returned if > the source input string is also NULL, otherwise raising an error. At a > basic level, you're suggesting I need to take out this NULL check in > InputFunctionCall to get the capability I'm looking for? No, I'm stating it as a fact that that won't work. You'd have to change InputFunctionCall's API so that it could pass back a null to its callers. > Any idea why the input functions were designed this way? Per above, and the fact that the behavior you're proposing is a complete violation of the SQL spec (yeah, I know Oracle does it too). regards, tom lane