type-casting failures - Mailing list pgsql-general

From Natalie Wenz
Subject type-casting failures
Date
Msg-id A875B074-9951-4B69-A251-3C48C2C4AE77@ebureau.com
Whole thread Raw
Responses Re: type-casting failures  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
Hi all!

I am working on migrating data in a very large database (22TB +) between two existing tables that use type text for almost every field, to two new tables that use native data type for dates, integers, etc.

The way I'm doing this is running a series of "Insert into new_table values (stuff)… select stuff::appropriate_cast  from old_table". I'm splitting the data up into chunks using my one index on a date field.

The problem I'm running into is I keep hitting data that won't cast, and then the insert fails with a message like this:

ERROR:  invalid input syntax for integer: "X"

I have asked the list how to select for things that are not integer in the past, and the suggestions I got were to use regular expressions (which seems like it would not do that well performance-wise, with the amount of data I'm dealing with), or to dump all the data out and write a program to scrub it. I'm not a terribly experienced programmer, and I don't have enough extra disk space to dump all of the data, read it in, write out a new copy, and then import it back into the database. That would put me at about four copies of the data, rather than two.

Has anyone else encountered this difficulty? Does anyone have any ideas? 

One additional thing that is driving me crazy is the error message for the type cast failure--it indicates the first bad value it found, and the type it was trying to convert it to, but does not indicate which column it was from. I have about 9 integer fields to choose from to find where the trouble is! It's not so bad when the inet cast fails--I only have one column of that type in my table. :)

Feature Request:
I think adding that information (which column had the bad data, not just the type it was unable to cast to) to the error message text would be incredibly helpful.

Even *more* amazing would be some kind of function you could call to determine if the data would cast correctly. Something like that could be used in conjunction with NULLIF to set the value to null if it will fail the casting, or NULLIF and COALESCE to set a default value other than null.  Or simply being able to return all of the rows where the type cast to integer, inet, date time, etc would fail. 

Could either, or both, of these things be candidates for adding to the project in the future?

Thanks a lot!

Natalie Wenz

pgsql-general by date:

Previous
From: Alan Nilsson
Date:
Subject: Re: variadic args to C functions
Next
From: itishree sukla
Date:
Subject: Re: File System backup