Thread: Functions with COPY
Hi, Consider the following input data: 1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall The interpretation for the numbers is: 1234 = 12.34, 24.50 = 24.50 The interpretation for the dates is: January 10th, 2003,October 1st, 2003, October 1st, 2003 I don't believe it's possible, currently, to correctly import this data with copy. I'm not sure the date fields would evenbe accepted as date fields. It'd be nice if this could be made to work. From a user standpoint consider: copy blah ( to_number(cost,'99V99'), -- I think that's right? to_number(cost2,'99.99'), to_date(install_date,'DD-Mon-YYYY'), to_date(invoice_date,'MM/DD/YY'), to_date(bill_date,'MM-DD-YYYY'), service_type)from stdin; Perhaps the actual syntax would be different, but the idea is there, be able to specify a function with arguments tohandle the conversion from the text to the resulting data type in the table. Right now casting is done but that castinghas to be expected to work for all input formats for each data type cast and that just isn't reasonable to tryand force something to do. Instead, I believe, the number of input formats accepted has been limited. Now, that isn't an actual data set I have to deal with, but it's a good illustration of the problem I've had with some ofmy data sets, mainly with the date fields though there is one data set that has the problem of having an implied decimalpoint. Thanks, Stephen
On Thu, Nov 27, 2003 at 09:15:20 -0500, Stephen Frost <sfrost@snowman.net> wrote: > > I don't believe it's possible, currently, to correctly import this > data with copy. I'm not sure the date fields would even be accepted > as date fields. It'd be nice if this could be made to work. From a > user standpoint consider: You can write a filter program that reads the data and passes it off to copy. Perl works pretty well for this.
* Bruno Wolff III (bruno@wolff.to) wrote: > On Thu, Nov 27, 2003 at 09:15:20 -0500, > Stephen Frost <sfrost@snowman.net> wrote: > > I don't believe it's possible, currently, to correctly import this > > data with copy. I'm not sure the date fields would even be accepted > > as date fields. It'd be nice if this could be made to work. From a > > user standpoint consider: > > You can write a filter program that reads the data and passes it off > to copy. Perl works pretty well for this. I already did, but it's basically a poor duplication of what the Postgres functions listed already do. Not what I'd consider the best scenario. Additionally, overall I'd expect it to be less work to have the conversion from text->data type done once and correctly instead of run through a filter program to 'clean it up' for Postgres and then also run through functions in Postgres (casts at least) to convert it. Stephen
On Thu, 2003-11-27 at 09:28, Stephen Frost wrote: > * Bruno Wolff III (bruno@wolff.to) wrote: > > On Thu, Nov 27, 2003 at 09:15:20 -0500, > > Stephen Frost <sfrost@snowman.net> wrote: > > > I don't believe it's possible, currently, to correctly import this > > > data with copy. I'm not sure the date fields would even be accepted > > > as date fields. It'd be nice if this could be made to work. From a > > > user standpoint consider: > > > > You can write a filter program that reads the data and passes it off > > to copy. Perl works pretty well for this. > > I already did, but it's basically a poor duplication of what the > Postgres functions listed already do. Not what I'd consider the best > scenario. Additionally, overall I'd expect it to be less work to have > the conversion from text->data type done once and correctly instead of > run through a filter program to 'clean it up' for Postgres and then also > run through functions in Postgres (casts at least) to convert it. How about COPY into a TEMP TABLE for 10k lines, then do an insert into real_table .... select .... from temp_table; which converts the data? You could of course thread the load so 2 or 3 processes execute the data import.
There is a patch floating around for informix load/unload the syntax is load from 'file' insert into ...., and unload to 'file' select "whatever you like" Would this solve the problem? Dave On Thu, 2003-11-27 at 09:38, Rod Taylor wrote: > On Thu, 2003-11-27 at 09:28, Stephen Frost wrote: > > * Bruno Wolff III (bruno@wolff.to) wrote: > > > On Thu, Nov 27, 2003 at 09:15:20 -0500, > > > Stephen Frost <sfrost@snowman.net> wrote: > > > > I don't believe it's possible, currently, to correctly import this > > > > data with copy. I'm not sure the date fields would even be accepted > > > > as date fields. It'd be nice if this could be made to work. From a > > > > user standpoint consider: > > > > > > You can write a filter program that reads the data and passes it off > > > to copy. Perl works pretty well for this. > > > > I already did, but it's basically a poor duplication of what the > > Postgres functions listed already do. Not what I'd consider the best > > scenario. Additionally, overall I'd expect it to be less work to have > > the conversion from text->data type done once and correctly instead of > > run through a filter program to 'clean it up' for Postgres and then also > > run through functions in Postgres (casts at least) to convert it. > > How about COPY into a TEMP TABLE for 10k lines, then do an > insert into real_table .... select .... from temp_table; > which converts the data? > > You could of course thread the load so 2 or 3 processes execute the data > import. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > >
* Rod Taylor (pg@rbt.ca) wrote: > How about COPY into a TEMP TABLE for 10k lines, then do an > insert into real_table .... select .... from temp_table; > which converts the data? > > You could of course thread the load so 2 or 3 processes execute the data > import. Sure, this would work, but it's a heck of alot more work from a processing standpoint than either of the other options... Stephen
* Dave Cramer (pg@fastcrypt.com) wrote: > There is a patch floating around for informix load/unload > > the syntax is load from 'file' insert into ...., and unload to 'file' > select "whatever you like" > > Would this solve the problem? I'm not sure. It depends on what you can do with the '....' after 'insert into'. If it's : insert into blah (what here?) to_number(blah1,'99V99'),etc Then I'd think it would work, but I don't know if that's what you're saying or not. Stephen
Stephen, You can do whatever you can do with an insert now, so yes, I think that is possible. Even if the current patch doesn't do that, it would certainly be a start. Dave On Thu, 2003-11-27 at 10:21, Stephen Frost wrote: > * Dave Cramer (pg@fastcrypt.com) wrote: > > There is a patch floating around for informix load/unload > > > > the syntax is load from 'file' insert into ...., and unload to 'file' > > select "whatever you like" > > > > Would this solve the problem? > > I'm not sure. It depends on what you can do with the '....' after > 'insert into'. If it's : > insert into blah (what here?) to_number(blah1,'99V99'),etc > > Then I'd think it would work, but I don't know if that's what you're > saying or not. > > Stephen
Stephen Frost <sfrost@snowman.net> writes: > Consider the following input data: > 1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall > The interpretation for the numbers is: > 1234 =3D 12.34, 24.50 =3D 24.50 > The interpretation for the dates is: > January 10th, 2003, October 1st, 2003, October 1st, 2003 > I don't believe it's possible, currently, to correctly import this > data with copy. I'm not sure the date fields would even be accepted > as date fields. Nonsense. regression=# set datestyle to mdy; SET regression=# select '10-Jan-2003'::date; date ------------2003-01-10 (1 row) regression=# select '10/1/03'::date; date ------------2003-10-01 (1 row) regression=# select '10-01-2003'::date; date ------------2003-10-01 (1 row) I think you'd have to do some preprocessing on the numeric inputs if you wanted implied decimal points inserted like that, but the dates look fine. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Consider the following input data: > > 1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall > > > The interpretation for the numbers is: > > 1234 =3D 12.34, 24.50 =3D 24.50 > > The interpretation for the dates is: > > January 10th, 2003, October 1st, 2003, October 1st, 2003 > > > I don't believe it's possible, currently, to correctly import this > > data with copy. I'm not sure the date fields would even be accepted > > as date fields. > Nonsense. [...] > I think you'd have to do some preprocessing on the numeric inputs if you > wanted implied decimal points inserted like that, but the dates look fine. I guess my example was lacking, I'm sure there are cases where the text->date casting will end up being wrong or some date style won't be accepted. If the above was 'January 10th, 2003, October 1st, 2003, January 1st, 2003', for example. Thinking back I think that might have been the situation I was thinking about (conflicting mdy and dmy) and would have made more sense as an example. Stephen
Stephen Frost <sfrost@snowman.net> writes: > I guess my example was lacking, I'm sure there are cases where the > text->date casting will end up being wrong or some date style won't be > accepted. If the above was 'January 10th, 2003, October 1st, 2003, > January 1st, 2003', for example. Thinking back I think that might have > been the situation I was thinking about (conflicting mdy and dmy) and > would have made more sense as an example. Then what are you after, some magically prescient input mode that will guess the correct interpretation? regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I guess my example was lacking, I'm sure there are cases where the > > text->date casting will end up being wrong or some date style won't be > > accepted. If the above was 'January 10th, 2003, October 1st, 2003, > > January 1st, 2003', for example. Thinking back I think that might have > > been the situation I was thinking about (conflicting mdy and dmy) and > > would have made more sense as an example. > > Then what are you after, some magically prescient input mode that will > guess the correct interpretation? No, I'm interested, as I discussed in my message[1], in the ability to use functions in a copy statement to allow me to specify the conversion from text to the appropriate data type. Right now Postgres is using casting which can end up being wrong. That's not a fault or something that can be fixed, the casting logic itself is fine but it's not always appropriate to apply the same casting to all fields of a given type. It would be nice to be able to choose what function is used and to pass arguments to it. This is a feature request and I'm not suggesting a change in host the text->date casting is done. From a programmatical standpoint I see things like this: Right now: text -> date : cast(text as date) text -> numeric : cast(text as numeric) I want to be able to pop that out and put my own function in place of it, like so: text -> date : to_date(text,'YYYY-Mon-DD') text -> numeric : to_numeric(text,'99V99') My other message had an example syntax to do this. I don't know if that'd be the appropriate syntax or not but I thought it illustrated what I was interested in. Thanks, Stephen [1] http://archives.postgresql.org/pgsql-hackers/2003-11/msg01456.php
Stephen Frost <sfrost@snowman.net> writes: > No, I'm interested, as I discussed in my message[1], in the ability to > use functions in a copy statement to allow me to specify the conversion > from text to the appropriate data type. COPY is not intended to be that flexible; it's intended to be fast. You can do any amount of processing you want in an INSERT statement, though. INSERT INTO mytable VALUES (mydatefunc('2001/01/03'), ... ); regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > No, I'm interested, as I discussed in my message[1], in the ability to > > use functions in a copy statement to allow me to specify the conversion > > from text to the appropriate data type. > > COPY is not intended to be that flexible; it's intended to be fast. I wouldn't expect much of a speed difference between to_date() and cast(text as date). Is there some reason I'm not seeing to expect it to be much slower? My guess was that supporting this wouldn't involve that much code change either but I'm probably wrong. > You can do any amount of processing you want in an INSERT statement, > though. Certainly, but for bulk loads that requires more pre-processing work for the user and I believe results in more work for the server too (it certainly takes longer...). Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> You can do any amount of processing you want in an INSERT statement, >> though. > Certainly, but for bulk loads that requires more pre-processing work for > the user and I believe results in more work for the server too (it > certainly takes longer...). Have you batched multiple INSERTs into a transaction? Also consider using a prepared statement to eliminate parse/plan overhead. regards, tom lane