Thread: selecting for type cast failures

selecting for type cast failures

From
Natalie Wenz
Date:
Hi!

I am working on updating some of our tables to use appropriate native data types;  they were all defined as text when
theywere created years ago. 

What I am running into, though, is there are some records that have bad data in them, where they can't be successfully
convertedto int, or float, or boolean, for example.  

Is there a straightforward way to identify offending records?

I've been able to identify some with things like "...not similar to '(0|1)'..." for the boolean fields, and "...not
similarto '[0-9]{1,}'..." for int.  
Are regular expressions the best approach here or is there a better way?

Thoughts?

I've poked around on the internet and have found some people suggesting user-defined functions. I'd prefer to just use
aquery, since it's a one-time clean-up. 

(I'm using postgres 9.2)


Thanks!
Natalie

Re: selecting for type cast failures

From
Guy Rouillier
Date:
On 3/7/2013 8:08 PM, Natalie Wenz wrote:
> I am working on updating some of our tables to use appropriate native
> data types;  they were all defined as text when they were created
> years ago.
>
> What I am running into, though, is there are some records that have
> bad data in them, where they can't be successfully converted to int,
> or float, or boolean, for example.
>
> Is there a straightforward way to identify offending records?
>
> I've been able to identify some with things like "...not similar to
> '(0|1)'..." for the boolean fields, and "...not similar to
> '[0-9]{1,}'..." for int. Are regular expressions the best approach
> here or is there a better way?

I did some quick searching also, looks like regular expressions are your
way to go.  Here is one for isInteger, for example:

    varchar ~ '^[0-9]+$'

--
Guy Rouillier


Re: selecting for type cast failures

From
Adrian Klaver
Date:
On 03/07/2013 05:08 PM, Natalie Wenz wrote:
> Hi!
>
> I am working on updating some of our tables to use appropriate native data types;  they were all defined as text when
theywere created years ago. 
>
> What I am running into, though, is there are some records that have bad data in them, where they can't be
successfullyconverted to int, or float, or boolean, for example. 
>
> Is there a straightforward way to identify offending records?
>
> I've been able to identify some with things like "...not similar to '(0|1)'..." for the boolean fields, and "...not
similarto '[0-9]{1,}'..." for int. 
> Are regular expressions the best approach here or is there a better way?
>
> Thoughts?

My opinion, it would take more time to concoct regexes that cover all
the corner cases than to write a script that walks the through the data
, finds the problem data and flags them.

>
> I've poked around on the internet and have found some people suggesting user-defined functions. I'd prefer to just
usea query, since it's a one-time clean-up. 

Again, most 'one time' things I have done turned out not to be:)

>
> (I'm using postgres 9.2)
>
>
> Thanks!
> Natalie
>


--
Adrian Klaver
adrian.klaver@gmail.com


Re: selecting for type cast failures

From
David Johnston
Date:
Adrian Klaver-3 wrote
> My opinion, it would take more time to concoct regexes that cover all
> the corner cases than to write a script that walks the through the data
> , finds the problem data and flags them.

ISTM that using regular expressions is necessary regardless of whether you
put them into a function/script or otherwise use them interactively via
queries...

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/selecting-for-type-cast-failures-tp5747875p5747890.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: selecting for type cast failures

From
Adrian Klaver
Date:
On 03/07/2013 08:56 PM, David Johnston wrote:
> Adrian Klaver-3 wrote
>> My opinion, it would take more time to concoct regexes that cover all
>> the corner cases than to write a script that walks the through the data
>> , finds the problem data and flags them.
>
> ISTM that using regular expressions is necessary regardless of whether you
> put them into a function/script or otherwise use them interactively via
> queries...

Not necessarily. I have done this sort of thing in Python by 'pre'
casting, using Python casting to weed out the problem children.

>
> David J.
>
>
>
>
>

--
Adrian Klaver
adrian.klaver@gmail.com