Re: Funtion to clean up strings? - Mailing list pgsql-sql

From Andreas
Subject Re: Funtion to clean up strings?
Date
Msg-id 499AD87E.4010206@gmx.net
Whole thread Raw
In response to Re: Funtion to clean up strings?  (Raj Mathur <raju@linux-delhi.org>)
Responses Re: Funtion to clean up strings?
Re: Funtion to clean up strings?
List pgsql-sql
Hi
Thanks, that really works   :)

Now a last extension.
Some numbers were entered in a "110% perfect" way with an excessive (0).  
+49 (0) 123 / 456 789
I have to suspect the source liked to express that it's either +49  or  
0 if the +49 isn't applicable, but not both.
Both together are semantically wrong and your function results therefore 
to "00123456789".
Correct was "0123456789" or e.g. "+33123456789" if it were an 
international number.

This (0) should be silently dropped as long as the endresult has at 
least one 0 or + like in the allready covered cases.

I tried to use this RegEx magic myself as far as I could figure it out, 
yet and came up with replacing every p in your solution with another regex

case when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49'   then '0'||      regexp_replace(
regexp_replace(           regexp_replace(p, E'[^0-9+()]', '', 'g')         , '\\(0\\)||\\(||\\)', '', 'g')      ,
E'^(?:\\+|00)49(.*)',E'\\1') when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)'   then '+'||
regexp_replace(           regexp_replace(               regexp_replace(p, E'[^0-9+()]', '', 'g')            ,
'\\(0\\)||\\(||\\)','', 'g')      , E'^(?:\\+||00)(.*)', E'\\1') else   regexp_replace(p, E'[^0-9]', '', 'g')
 
end

That would catch the leading spaces in "  00 49 ( 0 ) 1 2 3 456 -0", too.
Creating a sql-function thows a WARNING:  nonstandard use of \\ in a 
string literal
but it still works. Do you know a better or more correct way to reach 
the same?

Perhaps one could find a way with less calls to regexp_replace ?

Regards
Andreas  :)


Raj Mathur wrote:
> On Friday 13 Feb 2009, Andreas wrote:
>   
>> now ... lets get more complicated.
>> Phone numbers are entered:
>> 0123/4567-89          national number
>> 0049/123/4567-89         the same number
>> +49/123/4567-89         still the same number
>>
>> should come out as   0123456789  to search in this column.
>> "0049" and "+49"  --> 0
>>
>> while international numbers
>> +33/123456789
>> 0033/123456789
>>
>> should come as
>> +33123456789
>>     
>
> TEST=> create table foo(p text);
>
> TEST=> insert into foo (select regexp_split_to_table('0123/4567-89 
> 0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' '));
>
> TEST=> select * from foo;
>         p
> ------------------
>  0123/4567-89
>  0049/123/4567-89
>  +49/123/4567-89
>  +33/123456789
>  0033/123456789
> (5 rows)
>
> TEST=> select
> (case
>   when p ~ E'^(\\+|00)49'
>     then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), 
> E'^(?:\\+|00)49(.*)', E'\\1')
>   when p ~ E'^(\\+|00)'
>     then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), 
> E'^(?:\\+||00)(.*)', E'\\1')
>   else
>     regexp_replace(p, E'[^0-9]', '', 'g')
> end)
> from foo;
>  regexp_replace
> ----------------
>  0123456789
>  0123456789
>  0123456789
>  +33123456789
>  +33123456789
> (5 rows)
>
> That do what you want?  (Apologies for the wrapped lines.)
>
> Regards,
>
> -- Raju
>   



pgsql-sql by date:

Previous
From: Richard Huxton
Date:
Subject: Re: How to find number of seconds between 2 timestamps
Next
From: Raj Mathur
Date:
Subject: Re: Funtion to clean up strings?