Thread: copy in date string "00-00-00 00:00:00"

copy in date string "00-00-00 00:00:00"

From
Mark Phillips
Date:
I am migrating a data set from Oracle 8i to PG 9.1. The process is to export data into csv files, then use the pg "copy
tablefrom file csv header" statement to load the tables. 

There are a number of date columns in the tables that include empty value (null), valid dates, and some with the time
componentonly. The empty values are being output as 00-00-00 00:00:00. 

The import is falling over on rows that contain these "zero" dates.

I can adjust the NLS session format of the date string, within a small range, in the oracle environment. However, each
formI have attempted still results in these "zero" date values in the csv file. 

I am thinking of run the csv files through a filter to change the "00-00-00 00:00:00" to an empty value.

Is there a way for postgres to handle this?

- Mark

Re: copy in date string "00-00-00 00:00:00"

From
Scott Marlowe
Date:
On Wed, Mar 14, 2012 at 6:47 PM, Mark Phillips
<Mark.Phillips@mophilly.com> wrote:
> I am migrating a data set from Oracle 8i to PG 9.1. The process is to export data into csv files, then use the pg
"copytable from file csv header" statement to load the tables. 
>
> There are a number of date columns in the tables that include empty value (null), valid dates, and some with the time
componentonly. The empty values are being output as 00-00-00 00:00:00. 
>
> The import is falling over on rows that contain these "zero" dates.
>
> I can adjust the NLS session format of the date string, within a small range, in the oracle environment. However,
eachform I have attempted still results in these "zero" date values in the csv file. 
>
> I am thinking of run the csv files through a filter to change the "00-00-00 00:00:00" to an empty value.
>
> Is there a way for postgres to handle this?

Can you run it through sed and replace the "0000-00-00 00:00:00" to
NULL (no quotes) ?  That should work.

Re: copy in date string "00-00-00 00:00:00"

From
Andy Colson
Date:
On 03/14/2012 08:16 PM, Scott Marlowe wrote:
> On Wed, Mar 14, 2012 at 6:47 PM, Mark Phillips
> <Mark.Phillips@mophilly.com>  wrote:
>> I am migrating a data set from Oracle 8i to PG 9.1. The process is to export data into csv files, then use the pg
"copytable from file csv header" statement to load the tables. 
>>
>> There are a number of date columns in the tables that include empty value (null), valid dates, and some with the
timecomponent only. The empty values are being output as 00-00-00 00:00:00. 
>>
>> The import is falling over on rows that contain these "zero" dates.
>>
>> I can adjust the NLS session format of the date string, within a small range, in the oracle environment. However,
eachform I have attempted still results in these "zero" date values in the csv file. 
>>
>> I am thinking of run the csv files through a filter to change the "00-00-00 00:00:00" to an empty value.
>>
>> Is there a way for postgres to handle this?
>
> Can you run it through sed and replace the "0000-00-00 00:00:00" to
> NULL (no quotes) ?  That should work.
>

I think COPY (depending on arguments) uses \N by default.

Another option is to pull it into a temp table and make fix it up from there.

-Andy

Re: copy in date string "00-00-00 00:00:00"

From
Andy Colson
Date:
On 03/14/2012 08:32 PM, Andy Colson wrote:
> On 03/14/2012 08:16 PM, Scott Marlowe wrote:
>> On Wed, Mar 14, 2012 at 6:47 PM, Mark Phillips
>> <Mark.Phillips@mophilly.com> wrote:
>>> I am migrating a data set from Oracle 8i to PG 9.1. The process is to export data into csv files, then use the pg
"copytable from file csv header" statement to load the tables. 
>>>
>>> There are a number of date columns in the tables that include empty value (null), valid dates, and some with the
timecomponent only. The empty values are being output as 00-00-00 00:00:00. 
>>>
>>> The import is falling over on rows that contain these "zero" dates.
>>>
>>> I can adjust the NLS session format of the date string, within a small range, in the oracle environment. However,
eachform I have attempted still results in these "zero" date values in the csv file. 
>>>
>>> I am thinking of run the csv files through a filter to change the "00-00-00 00:00:00" to an empty value.
>>>
>>> Is there a way for postgres to handle this?
>>
>> Can you run it through sed and replace the "0000-00-00 00:00:00" to
>> NULL (no quotes) ? That should work.
>>
>
> I think COPY (depending on arguments) uses \N by default.
>
> Another option is to pull it into a temp table and make fix it up from there.
>
> -Andy
>

humm.. and speaking of arguments, Mark, did you check the help?


where option can be one of:

     NULL 'null_string'

so, perhaps just:

COPY tbl from 'dump.csv' with NULL '00-00-00 00:00:00';

-Andy

Re: copy in date string "00-00-00 00:00:00"

From
Scott Marlowe
Date:
On Wed, Mar 14, 2012 at 7:37 PM, Andy Colson <andy@squeakycode.net> wrote:
> On 03/14/2012 08:32 PM, Andy Colson wrote:
>>
>> On 03/14/2012 08:16 PM, Scott Marlowe wrote:
>>>
>>> On Wed, Mar 14, 2012 at 6:47 PM, Mark Phillips
>>> <Mark.Phillips@mophilly.com> wrote:
>>>>
>>>> I am migrating a data set from Oracle 8i to PG 9.1. The process is to
>>>> export data into csv files, then use the pg "copy table from file csv
>>>> header" statement to load the tables.
>>>>
>>>> There are a number of date columns in the tables that include empty
>>>> value (null), valid dates, and some with the time component only. The empty
>>>> values are being output as 00-00-00 00:00:00.
>>>>
>>>> The import is falling over on rows that contain these "zero" dates.
>>>>
>>>> I can adjust the NLS session format of the date string, within a small
>>>> range, in the oracle environment. However, each form I have attempted still
>>>> results in these "zero" date values in the csv file.
>>>>
>>>> I am thinking of run the csv files through a filter to change the
>>>> "00-00-00 00:00:00" to an empty value.
>>>>
>>>> Is there a way for postgres to handle this?
>>>
>>>
>>> Can you run it through sed and replace the "0000-00-00 00:00:00" to
>>> NULL (no quotes) ? That should work.
>>>
>>
>> I think COPY (depending on arguments) uses \N by default.
>>
>> Another option is to pull it into a temp table and make fix it up from
>> there.
>>
>> -Andy
>>
>
> humm.. and speaking of arguments, Mark, did you check the help?
>
>
> where option can be one of:
>
>    NULL 'null_string'
>
> so, perhaps just:
>
> COPY tbl from 'dump.csv' with NULL '00-00-00 00:00:00';

Thought of that one too, but it'll break all the other fields if they
have NULLs in them.

Re: copy in date string "00-00-00 00:00:00"

From
Mark Phillips
Date:
On Mar 14, 2012, at 6:32 PM, Andy Colson wrote:

> On 03/14/2012 08:16 PM, Scott Marlowe wrote:
>>
>>
>> Can you run it through sed and replace the "0000-00-00 00:00:00" to
>> NULL (no quotes) ?  That should work.
>>
>
> I think COPY (depending on arguments) uses \N by default.
>
> Another option is to pull it into a temp table and make fix it up from there.
>
> -Andy


Thanks to all who responded. I appreciate it very much.

Yes, that is a good idea. I wasn't sure how to use the NULL clause of the copy command, but I did wonder about other
nullvalues in the data set. There are many as this database grew up over 15+ years of use and many renditions of the
clientapplication. 

I am not familiar with sed, except for some trivial bits I nicked off the web. Enough to know it works, and to be
dangerous.Nonetheless, using SED may be the way to go as there are two tables that contain a bit over 3,000,000 rows
each. 

 - Mark

Re: copy in date string "00-00-00 00:00:00"

From
Martin Gregorie
Date:
On Wed, 2012-03-14 at 21:52 -0700, Mark Phillips wrote:
> I am not familiar with sed, except for some trivial bits I nicked off
> the web. Enough to know it works, and to be dangerous. Nonetheless,
> using SED may be the way to go as there are two tables that contain a
> bit over 3,000,000 rows each.
>
You should also consider using awk/gawk with the field separator (FS
variable) set to match the one in your input (','). The advantages in
this case are that it can be made to work on specific fields in the CSV
file and not look at the rest, something like:

BEGIN { FS = ','; }                      # set field sep to comma
$5 == '0000-00-00 00:00:00' { $5 = '' }  # empty field 5 if it matches
                            {print }     # output all lines

Disclaimer: this is untested example code


Martin