Re: Funtion to clean up strings? - Mailing list pgsql-sql
From | Andreas |
---|---|
Subject | Re: Funtion to clean up strings? |
Date | |
Msg-id | 499AD87E.4010206@gmx.net Whole thread Raw |
In response to | Re: Funtion to clean up strings? (Raj Mathur <raju@linux-delhi.org>) |
Responses |
Re: Funtion to clean up strings?
Re: Funtion to clean up strings? |
List | pgsql-sql |
Hi Thanks, that really works :) Now a last extension. Some numbers were entered in a "110% perfect" way with an excessive (0). +49 (0) 123 / 456 789 I have to suspect the source liked to express that it's either +49 or 0 if the +49 isn't applicable, but not both. Both together are semantically wrong and your function results therefore to "00123456789". Correct was "0123456789" or e.g. "+33123456789" if it were an international number. This (0) should be silently dropped as long as the endresult has at least one 0 or + like in the allready covered cases. I tried to use this RegEx magic myself as far as I could figure it out, yet and came up with replacing every p in your solution with another regex case when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49' then '0'|| regexp_replace( regexp_replace( regexp_replace(p, E'[^0-9+()]', '', 'g') , '\\(0\\)||\\(||\\)', '', 'g') , E'^(?:\\+|00)49(.*)',E'\\1') when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)' then '+'|| regexp_replace( regexp_replace( regexp_replace(p, E'[^0-9+()]', '', 'g') , '\\(0\\)||\\(||\\)','', 'g') , E'^(?:\\+||00)(.*)', E'\\1') else regexp_replace(p, E'[^0-9]', '', 'g') end That would catch the leading spaces in " 00 49 ( 0 ) 1 2 3 456 -0", too. Creating a sql-function thows a WARNING: nonstandard use of \\ in a string literal but it still works. Do you know a better or more correct way to reach the same? Perhaps one could find a way with less calls to regexp_replace ? Regards Andreas :) Raj Mathur wrote: > On Friday 13 Feb 2009, Andreas wrote: > >> now ... lets get more complicated. >> Phone numbers are entered: >> 0123/4567-89 national number >> 0049/123/4567-89 the same number >> +49/123/4567-89 still the same number >> >> should come out as 0123456789 to search in this column. >> "0049" and "+49" --> 0 >> >> while international numbers >> +33/123456789 >> 0033/123456789 >> >> should come as >> +33123456789 >> > > TEST=> create table foo(p text); > > TEST=> insert into foo (select regexp_split_to_table('0123/4567-89 > 0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' ')); > > TEST=> select * from foo; > p > ------------------ > 0123/4567-89 > 0049/123/4567-89 > +49/123/4567-89 > +33/123456789 > 0033/123456789 > (5 rows) > > TEST=> select > (case > when p ~ E'^(\\+|00)49' > then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), > E'^(?:\\+|00)49(.*)', E'\\1') > when p ~ E'^(\\+|00)' > then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), > E'^(?:\\+||00)(.*)', E'\\1') > else > regexp_replace(p, E'[^0-9]', '', 'g') > end) > from foo; > regexp_replace > ---------------- > 0123456789 > 0123456789 > 0123456789 > +33123456789 > +33123456789 > (5 rows) > > That do what you want? (Apologies for the wrapped lines.) > > Regards, > > -- Raju >