Thread: selecting for type cast failures
Hi! I am working on updating some of our tables to use appropriate native data types; they were all defined as text when theywere created years ago. What I am running into, though, is there are some records that have bad data in them, where they can't be successfully convertedto int, or float, or boolean, for example. Is there a straightforward way to identify offending records? I've been able to identify some with things like "...not similar to '(0|1)'..." for the boolean fields, and "...not similarto '[0-9]{1,}'..." for int. Are regular expressions the best approach here or is there a better way? Thoughts? I've poked around on the internet and have found some people suggesting user-defined functions. I'd prefer to just use aquery, since it's a one-time clean-up. (I'm using postgres 9.2) Thanks! Natalie
On 3/7/2013 8:08 PM, Natalie Wenz wrote: > I am working on updating some of our tables to use appropriate native > data types; they were all defined as text when they were created > years ago. > > What I am running into, though, is there are some records that have > bad data in them, where they can't be successfully converted to int, > or float, or boolean, for example. > > Is there a straightforward way to identify offending records? > > I've been able to identify some with things like "...not similar to > '(0|1)'..." for the boolean fields, and "...not similar to > '[0-9]{1,}'..." for int. Are regular expressions the best approach > here or is there a better way? I did some quick searching also, looks like regular expressions are your way to go. Here is one for isInteger, for example: varchar ~ '^[0-9]+$' -- Guy Rouillier
On 03/07/2013 05:08 PM, Natalie Wenz wrote: > Hi! > > I am working on updating some of our tables to use appropriate native data types; they were all defined as text when theywere created years ago. > > What I am running into, though, is there are some records that have bad data in them, where they can't be successfullyconverted to int, or float, or boolean, for example. > > Is there a straightforward way to identify offending records? > > I've been able to identify some with things like "...not similar to '(0|1)'..." for the boolean fields, and "...not similarto '[0-9]{1,}'..." for int. > Are regular expressions the best approach here or is there a better way? > > Thoughts? My opinion, it would take more time to concoct regexes that cover all the corner cases than to write a script that walks the through the data , finds the problem data and flags them. > > I've poked around on the internet and have found some people suggesting user-defined functions. I'd prefer to just usea query, since it's a one-time clean-up. Again, most 'one time' things I have done turned out not to be:) > > (I'm using postgres 9.2) > > > Thanks! > Natalie > -- Adrian Klaver adrian.klaver@gmail.com
Adrian Klaver-3 wrote > My opinion, it would take more time to concoct regexes that cover all > the corner cases than to write a script that walks the through the data > , finds the problem data and flags them. ISTM that using regular expressions is necessary regardless of whether you put them into a function/script or otherwise use them interactively via queries... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/selecting-for-type-cast-failures-tp5747875p5747890.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 03/07/2013 08:56 PM, David Johnston wrote: > Adrian Klaver-3 wrote >> My opinion, it would take more time to concoct regexes that cover all >> the corner cases than to write a script that walks the through the data >> , finds the problem data and flags them. > > ISTM that using regular expressions is necessary regardless of whether you > put them into a function/script or otherwise use them interactively via > queries... Not necessarily. I have done this sort of thing in Python by 'pre' casting, using Python casting to weed out the problem children. > > David J. > > > > > -- Adrian Klaver adrian.klaver@gmail.com