Re: Funtion to clean up strings? - Mailing list pgsql-sql
| From | Andreas |
|---|---|
| Subject | Re: Funtion to clean up strings? |
| Date | |
| Msg-id | 499AD87E.4010206@gmx.net Whole thread Raw |
| In response to | Re: Funtion to clean up strings? (Raj Mathur <raju@linux-delhi.org>) |
| Responses |
Re: Funtion to clean up strings?
Re: Funtion to clean up strings? |
| List | pgsql-sql |
Hi
Thanks, that really works :)
Now a last extension.
Some numbers were entered in a "110% perfect" way with an excessive (0).
+49 (0) 123 / 456 789
I have to suspect the source liked to express that it's either +49 or
0 if the +49 isn't applicable, but not both.
Both together are semantically wrong and your function results therefore
to "00123456789".
Correct was "0123456789" or e.g. "+33123456789" if it were an
international number.
This (0) should be silently dropped as long as the endresult has at
least one 0 or + like in the allready covered cases.
I tried to use this RegEx magic myself as far as I could figure it out,
yet and came up with replacing every p in your solution with another regex
case when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49' then '0'|| regexp_replace(
regexp_replace( regexp_replace(p, E'[^0-9+()]', '', 'g') , '\\(0\\)||\\(||\\)', '', 'g') ,
E'^(?:\\+|00)49(.*)',E'\\1') when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)' then '+'||
regexp_replace( regexp_replace( regexp_replace(p, E'[^0-9+()]', '', 'g') ,
'\\(0\\)||\\(||\\)','', 'g') , E'^(?:\\+||00)(.*)', E'\\1') else regexp_replace(p, E'[^0-9]', '', 'g')
end
That would catch the leading spaces in " 00 49 ( 0 ) 1 2 3 456 -0", too.
Creating a sql-function thows a WARNING: nonstandard use of \\ in a
string literal
but it still works. Do you know a better or more correct way to reach
the same?
Perhaps one could find a way with less calls to regexp_replace ?
Regards
Andreas :)
Raj Mathur wrote:
> On Friday 13 Feb 2009, Andreas wrote:
>
>> now ... lets get more complicated.
>> Phone numbers are entered:
>> 0123/4567-89 national number
>> 0049/123/4567-89 the same number
>> +49/123/4567-89 still the same number
>>
>> should come out as 0123456789 to search in this column.
>> "0049" and "+49" --> 0
>>
>> while international numbers
>> +33/123456789
>> 0033/123456789
>>
>> should come as
>> +33123456789
>>
>
> TEST=> create table foo(p text);
>
> TEST=> insert into foo (select regexp_split_to_table('0123/4567-89
> 0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' '));
>
> TEST=> select * from foo;
> p
> ------------------
> 0123/4567-89
> 0049/123/4567-89
> +49/123/4567-89
> +33/123456789
> 0033/123456789
> (5 rows)
>
> TEST=> select
> (case
> when p ~ E'^(\\+|00)49'
> then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'),
> E'^(?:\\+|00)49(.*)', E'\\1')
> when p ~ E'^(\\+|00)'
> then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'),
> E'^(?:\\+||00)(.*)', E'\\1')
> else
> regexp_replace(p, E'[^0-9]', '', 'g')
> end)
> from foo;
> regexp_replace
> ----------------
> 0123456789
> 0123456789
> 0123456789
> +33123456789
> +33123456789
> (5 rows)
>
> That do what you want? (Apologies for the wrapped lines.)
>
> Regards,
>
> -- Raju
>