Re: Date format for bulk copy - Mailing list pgsql-general

From David Rysdam
Subject Re: Date format for bulk copy
Date
Msg-id 416D6924.8090509@ll.mit.edu
Whole thread Raw
In response to Re: Date format for bulk copy  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
Greg Stark wrote:

>David Rysdam <drysdam@ll.mit.edu> writes:
>
>
>
>>In my brute force port, I just bulk copied the date
>>fields into temporary tables and then did a to_timestamp(field, 'Mon DD YYYY
>>HH:MI:SS:MSAM').
>>
>>
>
>
>
>>Again, I created a temporary table and did a decode(field, 'hex') to the
>>real table.
>>
>>
>
>This is the standard approach. You're rather lucky these are the only
>data representation changes you've had to do so far. I fear you'll run into
>more and more complex changes over time and trying to avoid the temporary
>table will get harder and harder.
>
>
>
No, I think I'm OK there.  These are programmatically-generated values
and I've already been through them all once.  Just the millisecond issue
and the hex binary issue AFAIK.

>If it were me I would consider processing the files in perl. It should be
>pretty easy to do both of these modifications very quickly.
>
>
>
Very quick and easy to do one time.  A little trickier to handle in an
elegant, maintainable way for the dozens of data reloads I do every
month for GBs of data onto two different server types.

>If you really want to go with a custom C code then you might be able to just
>grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
>separate module and create new functions with modified names. Load it with
>CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain';
>
>Or maybe create the function as my_byteain in postgres and then update the
>catalog entries somehow. I'm not sure how to do that but it shouldn't be too
>hard. And it might make it easier to do the substitution for the data load and
>then undo the change afterwards.
>
>
>
Why not create a type and then define the load function to be the
equivalent of "decode('hex')"?

>Doing the same for timmestamp is a bit trickier but you could copy
>ParseDateTime from datetime.c as a static function for your module.
>
>Be careful though, test this out thoroughly on a test database. I'm not sure
>of all the impacts of altering the in/out functions for data types. I expect
>it would break pg_dump, for example. And I would worry about the statistics
>tables too.
>
>
>
This is kind of a hybrid of my suggestions and the problems are a hybrid
as well.  :)

1) Just change the timestamp type so that it allows a ':' delimiter for
milliseconds.  Potential problems: Other parts of the code won't expect
it. People don't want that.

2) Create a new type.  Potential problem: Things like date ranges
probably wouldn't work anymore, since the server wouldn't know it's a
date now.



pgsql-general by date:

Previous
From: David Rysdam
Date:
Subject: Re: Date format for bulk copy
Next
From: nd02tsk@student.hig.se
Date:
Subject: Level of replication support?