Thread: How to clean up phone-numbers with regex?
<font face="Arial">Hi<br /><br /> I need to clean up phone-numbers. Somehow I got a Excel list that has weird graphical characterstrailing some of the entries.</font><br /><font face="Arial"><font face="Arial">My DB is UTF8 so it would storethis mess but I don<small>'t like to import it in the first place.<br /><br /> OK, I know how to read the stuff intoa temporary table to clean it up before the actual import.<br /></small></font>How can I do an update on the column thatdeletes every char that is not in a given set of chars like '+- 0123456/()'?<br /><br /><small></small><br /> Secondbut similar question:<br /> How can I select records that have fields that contain characters not included in a givenalphabet?<br /> E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/?<br /><br /><br /> regards<br /> Andreas<br/></font>
Have you looked into regular expressions?
Sent from my iPhone
Sent from my iPhone
Hi
I need to clean up phone-numbers. Somehow I got a Excel list that has weird graphical characters trailing some of the entries.
My DB is UTF8 so it would store this mess but I don't like to import it in the first place.
OK, I know how to read the stuff into a temporary table to clean it up before the actual import.
How can I do an update on the column that deletes every char that is not in a given set of chars like '+- 0123456/()'?
Second but similar question:
How can I select records that have fields that contain characters not included in a given alphabet?
E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/?
regards
Andreas
<div class="moz-cite-prefix">On 05/19/2014 01:54 AM, Andreas wrote:<br /></div><blockquote cite="mid:5379C6C1.5020904@gmx.net"type="cite"><font face="Arial">Hi<br /><br /> I need to clean up phone-numbers. SomehowI got a Excel list that has weird graphical characters trailing some of the entries.</font><br /><font face="Arial"><fontface="Arial">My DB is UTF8 so it would store this mess but I don<small>'t like to import it in the firstplace.<br /><br /> OK, I know how to read the stuff into a temporary table to clean it up before the actual import.<br/></small></font>How can I do an update on the column that deletes every char that is not in a given set of charslike '+- 0123456/()'?<br /><br /></font></blockquote><br /> See: <a class="moz-txt-link-freetext" href="http://www.postgresql.org/docs/current/static/functions-matching.html">http://www.postgresql.org/docs/current/static/functions-matching.html</a><br /><br/> For the first case, the regexp_replace function is probably your best bet. But note that, depending on the qualityof your input, just removing characters outside that range may still not yield the desired result.<br /><br /> selectregexp_replace('(12s3)-456-635/6(a+sdk', '[^0-9()+-/]', '', 'g');<br /> regexp_replace <br /> -------------------<br/> (123)-456-635/6(+<br /><br /> You can remove all formatting by requiring only digits then checkand/or reformat later as desired.<br /> steve=> select regexp_replace('(12s3)-456-6356(a+sdk', '[^0-9]', '', 'g');<br/> regexp_replace <br /> ----------------<br /> 1234566356<br /><br /><blockquote cite="mid:5379C6C1.5020904@gmx.net"type="cite"><font face="Arial"><br /> Second but similar question:<br /> How can I selectrecords that have fields that contain characters not included in a given alphabet?<br /> E.G. find fields that containsome char not in 0-9,a-z,A-Z, +-()/?<br /></font><br /></blockquote> See regexp_match on the above-referenced page.<br/><br /> Cheers,<br /> Steve<br /><br />
On 05/19/2014 06:39 AM, Rob Sargent wrote:
I think his subject line answered that question...Have you looked into regular expressions?
Cheers,
Steve
On 05/19/2014 09:44 AM, Steve Crawford wrote:
You're right. Apologies.On 05/19/2014 06:39 AM, Rob Sargent wrote:I think his subject line answered that question...Have you looked into regular expressions?
Cheers,
Steve
Steve Crawford wrote > On 05/19/2014 01:54 AM, Andreas wrote: > >> >> Second but similar question: >> How can I select records that have fields that contain characters not >> included in a given alphabet? >> E.G. find fields that contain some char not in 0-9,a-z,A-Z, +-()/? >> > See regexp_match on the above-referenced page. > > Cheers, > Steve Actually, section "9.7.3. POSIX Regular Expressions" - specifically table 9-11 at the beginning of that section - is the most common way to perform the tests in a where clause. regexp_matches(...) is for when you want to extract data. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-clean-up-phone-numbers-with-regex-tp5804450p5804493.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.