Thread: Funtion to clean up strings?
Hi, I need a function that removes characters in strings that aren't in a given set of chars. e.g Input: 12-34/ 56(8) I want just numbers so Output should in this case be: 1234568 Is there a solution?
2009/2/12 Andreas <maps.on@gmx.net>: > Hi, > > I need a function that removes characters in strings that aren't in a given > set of chars. > e.g > Input: 12-34/ 56(8) > I want just numbers so Output should in this case be: 1234568 > > Is there a solution? One is: SELECT regexp_replace('12-34/ 56(8)',E'[^0-9]','','g'); -- Dickson S. Guedes - mail/xmpp: guedes@guedesoft.net - skype: guediz http://guedesoft.net - http://planeta.postgresql.org.br
Wow, that was fast THANKS :) now ... lets get more complicated. Phone numbers are entered: 0123/4567-89 national number 0049/123/4567-89 the same number +49/123/4567-89 still the same number should come out as 0123456789 to search in this column. "0049" and "+49" --> 0 while international numbers +33/123456789 0033/123456789 should come as +33123456789 I fear the hard part is that international codes have 1-3 digits :( Regards Andreas ivan Stoykov wrote: > regexp_replace('120323423 23(fdf_)fd','[^[:digit:]]','','gi') > > Andreas wrote: > >> Hi, >> >> I need a function that removes characters in strings that aren't in a >> given set of chars. >> e.g >> Input: 12-34/ 56(8) >> I want just numbers so Output should in this case be: 1234568 >> >> Is there a solution?
On Friday 13 Feb 2009, Andreas wrote: > now ... lets get more complicated. > Phone numbers are entered: > 0123/4567-89 national number > 0049/123/4567-89 the same number > +49/123/4567-89 still the same number > > should come out as 0123456789 to search in this column. > "0049" and "+49" --> 0 > > while international numbers > +33/123456789 > 0033/123456789 > > should come as > +33123456789 TEST=> create table foo(p text); TEST=> insert into foo (select regexp_split_to_table('0123/4567-89 0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' ')); TEST=> select * from foo; p ------------------0123/4567-890049/123/4567-89+49/123/4567-89+33/1234567890033/123456789 (5 rows) TEST=> select (case when p ~ E'^(\\+|00)49' then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), E'^(?:\\+|00)49(.*)', E'\\1') when p ~ E'^(\\+|00)' then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), E'^(?:\\+||00)(.*)', E'\\1') else regexp_replace(p, E'[^0-9]', '', 'g') end) from foo;regexp_replace ----------------012345678901234567890123456789+33123456789+33123456789 (5 rows) That do what you want? (Apologies for the wrapped lines.) Regards, -- Raju -- Raj Mathur raju@kandalaya.org http://kandalaya.org/ GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EFCC68 D17F PsyTrance & Chill: http://schizoid.in/ || It is the mind that moves
Hi Thanks, that really works :) Now a last extension. Some numbers were entered in a "110% perfect" way with an excessive (0). +49 (0) 123 / 456 789 I have to suspect the source liked to express that it's either +49 or 0 if the +49 isn't applicable, but not both. Both together are semantically wrong and your function results therefore to "00123456789". Correct was "0123456789" or e.g. "+33123456789" if it were an international number. This (0) should be silently dropped as long as the endresult has at least one 0 or + like in the allready covered cases. I tried to use this RegEx magic myself as far as I could figure it out, yet and came up with replacing every p in your solution with another regex case when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49' then '0'|| regexp_replace( regexp_replace( regexp_replace(p, E'[^0-9+()]', '', 'g') , '\\(0\\)||\\(||\\)', '', 'g') , E'^(?:\\+|00)49(.*)',E'\\1') when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)' then '+'|| regexp_replace( regexp_replace( regexp_replace(p, E'[^0-9+()]', '', 'g') , '\\(0\\)||\\(||\\)','', 'g') , E'^(?:\\+||00)(.*)', E'\\1') else regexp_replace(p, E'[^0-9]', '', 'g') end That would catch the leading spaces in " 00 49 ( 0 ) 1 2 3 456 -0", too. Creating a sql-function thows a WARNING: nonstandard use of \\ in a string literal but it still works. Do you know a better or more correct way to reach the same? Perhaps one could find a way with less calls to regexp_replace ? Regards Andreas :) Raj Mathur wrote: > On Friday 13 Feb 2009, Andreas wrote: > >> now ... lets get more complicated. >> Phone numbers are entered: >> 0123/4567-89 national number >> 0049/123/4567-89 the same number >> +49/123/4567-89 still the same number >> >> should come out as 0123456789 to search in this column. >> "0049" and "+49" --> 0 >> >> while international numbers >> +33/123456789 >> 0033/123456789 >> >> should come as >> +33123456789 >> > > TEST=> create table foo(p text); > > TEST=> insert into foo (select regexp_split_to_table('0123/4567-89 > 0049/123/4567-89 +49/123/4567-89 +33/123456789 0033/123456789',' ')); > > TEST=> select * from foo; > p > ------------------ > 0123/4567-89 > 0049/123/4567-89 > +49/123/4567-89 > +33/123456789 > 0033/123456789 > (5 rows) > > TEST=> select > (case > when p ~ E'^(\\+|00)49' > then '0'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), > E'^(?:\\+|00)49(.*)', E'\\1') > when p ~ E'^(\\+|00)' > then '+'||regexp_replace(regexp_replace(p, E'[^0-9+]', '', 'g'), > E'^(?:\\+||00)(.*)', E'\\1') > else > regexp_replace(p, E'[^0-9]', '', 'g') > end) > from foo; > regexp_replace > ---------------- > 0123456789 > 0123456789 > 0123456789 > +33123456789 > +33123456789 > (5 rows) > > That do what you want? (Apologies for the wrapped lines.) > > Regards, > > -- Raju >
On Tuesday 17 Feb 2009, Andreas wrote: > [snip] > case > when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49' > then '0'|| > regexp_replace( > regexp_replace( > regexp_replace(p, E'[^0-9+()]', '', 'g') > , '\\(0\\)||\\(||\\)', '', 'g') > , E'^(?:\\+|00)49(.*)', E'\\1') > when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)' > then '+'|| > regexp_replace( > regexp_replace( > regexp_replace(p, E'[^0-9+()]', '', 'g') > , '\\(0\\)||\\(||\\)', '', 'g') > , E'^(?:\\+||00)(.*)', E'\\1') > else > regexp_replace(p, E'[^0-9]', '', 'g') > end > > That would catch the leading spaces in " 00 49 ( 0 ) 1 2 3 456 -0", > too. Creating a sql-function thows a WARNING: nonstandard use of \\ > in a string literal > but it still works. Do you know a better or more correct way to reach > the same? > > Perhaps one could find a way with less calls to regexp_replace ? That is what I would have tried too :) The only improvement I can think of is to replace one instance of regex_replace with a string replace, since the string (0) is fixed. On the other hand, I'm not an expert at Pg functions by any means, so someone else may have a faster or more elegant solution. Regards, -- Raju -- Raj Mathur raju@kandalaya.org http://kandalaya.org/ GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EFCC68 D17F PsyTrance & Chill: http://schizoid.in/ || It is the mind that moves
string_replace would only replace one searchstring at a time. In this case I need to replace 3 : (0) ( ) because there could be some braces not just as (0) since the innermost replace spares braces regardless where they are. Could one express the following in one expression 1) remove all spaces 2) remove everything but 0-9+ or the first occurance (0) Leading spaces had to be removed anyway because the distract the CASE ... WHEN when it checks for "+49..." but gets " +49...". Then I could use string_replace at the end. Do you know the correct way to code the backslashes to avoid the Warning? Regards Andreas Raj Mathur schrieb: > On Tuesday 17 Feb 2009, Andreas wrote: > >> [snip] >> case >> when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)49' >> then '0'|| >> regexp_replace( >> regexp_replace( >> regexp_replace(p, E'[^0-9+()]', '', 'g') >> , '\\(0\\)||\\(||\\)', '', 'g') >> , E'^(?:\\+|00)49(.*)', E'\\1') >> when regexp_replace(p, E'[^0-9+]', '', 'g') ~ E'^(\\+|00)' >> then '+'|| >> regexp_replace( >> regexp_replace( >> regexp_replace(p, E'[^0-9+()]', '', 'g') >> , '\\(0\\)||\\(||\\)', '', 'g') >> , E'^(?:\\+||00)(.*)', E'\\1') >> else >> regexp_replace(p, E'[^0-9]', '', 'g') >> end >> >> That would catch the leading spaces in " 00 49 ( 0 ) 1 2 3 456 -0", >> too. Creating a sql-function thows a WARNING: nonstandard use of \\ >> in a string literal >> but it still works. Do you know a better or more correct way to reach >> the same? >> >> Perhaps one could find a way with less calls to regexp_replace ? >> > > That is what I would have tried too :) The only improvement I can think > of is to replace one instance of regex_replace with a string replace, > since the string (0) is fixed. > > On the other hand, I'm not an expert at Pg functions by any means, so > someone else may have a faster or more elegant solution. > > Regards, > > -- Raju >
Andreas <maps.on@gmx.net> writes: > Hi > Thanks, that really works :) > > Now a last extension. > Some numbers were entered in a "110% perfect" way with an excessive (0). +49 > (0) 123 / 456 789 > I have to suspect the source liked to express that it's either +49 or 0 if > the +49 isn't applicable, but not both. This is the standard format for phone numbers. Parenthesized digits -- as you suspected -- represent digits which must only be dialled when using the number locally and must be omitted from outside. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!