Thread: Functions with COPY

Functions with COPY

From
Stephen Frost
Date:
Hi,
 Consider the following input data: 1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall
 The interpretation for the numbers is: 1234 = 12.34, 24.50 = 24.50 The interpretation for the dates is: January 10th,
2003,October 1st, 2003, October 1st, 2003 
 I don't believe it's possible, currently, to correctly import this data with copy.  I'm not sure the date fields would
evenbe accepted as date fields.  It'd be nice if this could be made to work.  From a user standpoint consider: 
 copy blah (   to_number(cost,'99V99'), -- I think that's right?   to_number(cost2,'99.99'),
to_date(install_date,'DD-Mon-YYYY'),  to_date(invoice_date,'MM/DD/YY'),   to_date(bill_date,'MM-DD-YYYY'),
service_type)from stdin; 
   Perhaps the actual syntax would be different, but the idea is there,   be able to specify a function with arguments
tohandle the   conversion from the text to the resulting data type in the table.   Right now casting is done but that
castinghas to be expected to   work for all input formats for each data type cast and that just   isn't reasonable to
tryand force something to do.  Instead, I   believe, the number of input formats accepted has been limited. 
 Now, that isn't an actual data set I have to deal with, but it's a good illustration of the problem I've had with some
ofmy data sets, mainly with the date fields though there is one data set that has the problem of having an implied
decimalpoint. 
Thanks,         Stephen

Re: Functions with COPY

From
Bruno Wolff III
Date:
On Thu, Nov 27, 2003 at 09:15:20 -0500, Stephen Frost <sfrost@snowman.net> wrote:
> 
>   I don't believe it's possible, currently, to correctly import this
>   data with copy.  I'm not sure the date fields would even be accepted
>   as date fields.  It'd be nice if this could be made to work.  From a
>   user standpoint consider:

You can write a filter program that reads the data and passes it off
to copy. Perl works pretty well for this.


Re: Functions with COPY

From
Stephen Frost
Date:
* Bruno Wolff III (bruno@wolff.to) wrote:
> On Thu, Nov 27, 2003 at 09:15:20 -0500,
>   Stephen Frost <sfrost@snowman.net> wrote:
> >   I don't believe it's possible, currently, to correctly import this
> >   data with copy.  I'm not sure the date fields would even be accepted
> >   as date fields.  It'd be nice if this could be made to work.  From a
> >   user standpoint consider:
>
> You can write a filter program that reads the data and passes it off
> to copy. Perl works pretty well for this.

I already did, but it's basically a poor duplication of what the
Postgres functions listed already do.  Not what I'd consider the best
scenario.  Additionally, overall I'd expect it to be less work to have
the conversion from text->data type done once and correctly instead of
run through a filter program to 'clean it up' for Postgres and then also
run through functions in Postgres (casts at least) to convert it.
Stephen

Re: Functions with COPY

From
Rod Taylor
Date:
On Thu, 2003-11-27 at 09:28, Stephen Frost wrote:
> * Bruno Wolff III (bruno@wolff.to) wrote:
> > On Thu, Nov 27, 2003 at 09:15:20 -0500,
> >   Stephen Frost <sfrost@snowman.net> wrote:
> > >   I don't believe it's possible, currently, to correctly import this
> > >   data with copy.  I'm not sure the date fields would even be accepted
> > >   as date fields.  It'd be nice if this could be made to work.  From a
> > >   user standpoint consider:
> > 
> > You can write a filter program that reads the data and passes it off
> > to copy. Perl works pretty well for this.
> 
> I already did, but it's basically a poor duplication of what the
> Postgres functions listed already do.  Not what I'd consider the best
> scenario.  Additionally, overall I'd expect it to be less work to have
> the conversion from text->data type done once and correctly instead of
> run through a filter program to 'clean it up' for Postgres and then also
> run through functions in Postgres (casts at least) to convert it.

How about COPY into a TEMP TABLE for 10k lines, then do an 
insert into real_table .... select .... from temp_table;
which converts the data?

You could of course thread the load so 2 or 3 processes execute the data
import.




Re: Functions with COPY

From
Dave Cramer
Date:
There is a patch floating around for informix load/unload

the syntax is load from 'file' insert into ...., and unload to 'file'
select "whatever you like"

Would this solve the problem?


Dave
On Thu, 2003-11-27 at 09:38, Rod Taylor wrote:
> On Thu, 2003-11-27 at 09:28, Stephen Frost wrote:
> > * Bruno Wolff III (bruno@wolff.to) wrote:
> > > On Thu, Nov 27, 2003 at 09:15:20 -0500,
> > >   Stephen Frost <sfrost@snowman.net> wrote:
> > > >   I don't believe it's possible, currently, to correctly import this
> > > >   data with copy.  I'm not sure the date fields would even be accepted
> > > >   as date fields.  It'd be nice if this could be made to work.  From a
> > > >   user standpoint consider:
> > > 
> > > You can write a filter program that reads the data and passes it off
> > > to copy. Perl works pretty well for this.
> > 
> > I already did, but it's basically a poor duplication of what the
> > Postgres functions listed already do.  Not what I'd consider the best
> > scenario.  Additionally, overall I'd expect it to be less work to have
> > the conversion from text->data type done once and correctly instead of
> > run through a filter program to 'clean it up' for Postgres and then also
> > run through functions in Postgres (casts at least) to convert it.
> 
> How about COPY into a TEMP TABLE for 10k lines, then do an 
> insert into real_table .... select .... from temp_table;
> which converts the data?
> 
> You could of course thread the load so 2 or 3 processes execute the data
> import.
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
> 
> 



Re: Functions with COPY

From
Stephen Frost
Date:
* Rod Taylor (pg@rbt.ca) wrote:
> How about COPY into a TEMP TABLE for 10k lines, then do an
> insert into real_table .... select .... from temp_table;
> which converts the data?
>
> You could of course thread the load so 2 or 3 processes execute the data
> import.

Sure, this would work, but it's a heck of alot more work from a
processing standpoint than either of the other options...
Stephen

Re: Functions with COPY

From
Stephen Frost
Date:
* Dave Cramer (pg@fastcrypt.com) wrote:
> There is a patch floating around for informix load/unload
>
> the syntax is load from 'file' insert into ...., and unload to 'file'
> select "whatever you like"
>
> Would this solve the problem?

I'm not sure.  It depends on what you can do with the '....' after
'insert into'.  If it's :
insert into blah (what here?) to_number(blah1,'99V99'),etc

Then I'd think it would work, but I don't know if that's what you're
saying or not.
Stephen

Re: Functions with COPY

From
Dave Cramer
Date:
Stephen,

You can do whatever you can do with an insert now, so yes, I think that
is possible. Even if the current patch doesn't do that, it would
certainly be a start.

Dave


On Thu, 2003-11-27 at 10:21, Stephen Frost wrote:
> * Dave Cramer (pg@fastcrypt.com) wrote:
> > There is a patch floating around for informix load/unload
> > 
> > the syntax is load from 'file' insert into ...., and unload to 'file'
> > select "whatever you like"
> > 
> > Would this solve the problem?
> 
> I'm not sure.  It depends on what you can do with the '....' after
> 'insert into'.  If it's :
> insert into blah (what here?) to_number(blah1,'99V99'),etc
> 
> Then I'd think it would work, but I don't know if that's what you're
> saying or not.
> 
>     Stephen



Re: Functions with COPY

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
>   Consider the following input data:
>   1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall

>   The interpretation for the numbers is:
>   1234 =3D 12.34, 24.50 =3D 24.50
>   The interpretation for the dates is:
>   January 10th, 2003, October 1st, 2003, October 1st, 2003

>   I don't believe it's possible, currently, to correctly import this
>   data with copy.  I'm not sure the date fields would even be accepted
>   as date fields.

Nonsense.

regression=# set datestyle to mdy;
SET
regression=# select '10-Jan-2003'::date;   date
------------2003-01-10
(1 row)

regression=# select '10/1/03'::date;   date
------------2003-10-01
(1 row)

regression=# select '10-01-2003'::date;   date
------------2003-10-01
(1 row)

I think you'd have to do some preprocessing on the numeric inputs if you
wanted implied decimal points inserted like that, but the dates look fine.
        regards, tom lane


Re: Functions with COPY

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> >   Consider the following input data:
> >   1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall
>
> >   The interpretation for the numbers is:
> >   1234 =3D 12.34, 24.50 =3D 24.50
> >   The interpretation for the dates is:
> >   January 10th, 2003, October 1st, 2003, October 1st, 2003
>
> >   I don't believe it's possible, currently, to correctly import this
> >   data with copy.  I'm not sure the date fields would even be accepted
> >   as date fields.
> Nonsense.
[...]
> I think you'd have to do some preprocessing on the numeric inputs if you
> wanted implied decimal points inserted like that, but the dates look fine.

I guess my example was lacking, I'm sure there are cases where the
text->date casting will end up being wrong or some date style won't be
accepted.  If the above was 'January 10th, 2003, October 1st, 2003,
January 1st, 2003', for example.  Thinking back I think that might have
been the situation I was thinking about (conflicting mdy and dmy) and
would have made more sense as an example.
Stephen

Re: Functions with COPY

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I guess my example was lacking, I'm sure there are cases where the
> text->date casting will end up being wrong or some date style won't be
> accepted.  If the above was 'January 10th, 2003, October 1st, 2003,
> January 1st, 2003', for example.  Thinking back I think that might have
> been the situation I was thinking about (conflicting mdy and dmy) and
> would have made more sense as an example.

Then what are you after, some magically prescient input mode that will
guess the correct interpretation?
        regards, tom lane


Re: Functions with COPY

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I guess my example was lacking, I'm sure there are cases where the
> > text->date casting will end up being wrong or some date style won't be
> > accepted.  If the above was 'January 10th, 2003, October 1st, 2003,
> > January 1st, 2003', for example.  Thinking back I think that might have
> > been the situation I was thinking about (conflicting mdy and dmy) and
> > would have made more sense as an example.
>
> Then what are you after, some magically prescient input mode that will
> guess the correct interpretation?

No, I'm interested, as I discussed in my message[1], in the ability to
use functions in a copy statement to allow me to specify the conversion
from text to the appropriate data type.  Right now Postgres is using
casting which can end up being wrong.  That's not a fault or something
that can be fixed, the casting logic itself is fine but it's not always
appropriate to apply the same casting to all fields of a given type.

It would be nice to be able to choose what function is used and to pass
arguments to it.  This is a feature request and I'm not suggesting a
change in host the text->date casting is done.  From a programmatical
standpoint I see things like this:

Right now: text -> date : cast(text as date) text -> numeric : cast(text as numeric)

I want to be able to pop that out and put my own function in place of
it, like so: text -> date : to_date(text,'YYYY-Mon-DD') text -> numeric : to_numeric(text,'99V99')

My other message had an example syntax to do this.  I don't know if
that'd be the appropriate syntax or not but I thought it illustrated
what I was interested in.
Thanks,
    Stephen

[1] http://archives.postgresql.org/pgsql-hackers/2003-11/msg01456.php

Re: Functions with COPY

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> No, I'm interested, as I discussed in my message[1], in the ability to
> use functions in a copy statement to allow me to specify the conversion
> from text to the appropriate data type.

COPY is not intended to be that flexible; it's intended to be fast.
You can do any amount of processing you want in an INSERT statement,
though.

INSERT INTO mytable VALUES (mydatefunc('2001/01/03'), ... );
        regards, tom lane


Re: Functions with COPY

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > No, I'm interested, as I discussed in my message[1], in the ability to
> > use functions in a copy statement to allow me to specify the conversion
> > from text to the appropriate data type.
>
> COPY is not intended to be that flexible; it's intended to be fast.

I wouldn't expect much of a speed difference between to_date() and
cast(text as date).  Is there some reason I'm not seeing to expect it to
be much slower?  My guess was that supporting this wouldn't involve
that much code change either but I'm probably wrong.

> You can do any amount of processing you want in an INSERT statement,
> though.

Certainly, but for bulk loads that requires more pre-processing work for
the user and I believe results in more work for the server too (it
certainly takes longer...).
Stephen

Re: Functions with COPY

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> You can do any amount of processing you want in an INSERT statement,
>> though.

> Certainly, but for bulk loads that requires more pre-processing work for
> the user and I believe results in more work for the server too (it
> certainly takes longer...).

Have you batched multiple INSERTs into a transaction?  Also consider
using a prepared statement to eliminate parse/plan overhead.
        regards, tom lane