Thread: psql copy errors
I'm finding the \copy is very brittle. It seems to stop for everyone little reason. Is there a way to tell it to be more forgiving -- for example, to ignore extra data fields that might exists on a line? Or, to have it just skip that offending record but continue on to the next. I've got a tab delimited file, but if \copy sees any extra tabs in the file it just stops at that record. I want to be able to control what pg does when it hits an exception. I'm curious what others do for bulk data migration. Since copy seems so brittle, there must be a better way... -- David Bear phone: 480-965-8257 fax: 480-965-9189 College of Public Programs/ASU Wilson Hall 232 Tempe, AZ 85287-0803 "Beware the IP portfolio, everyone will be suspect of trespassing"
On Thu, Jun 23, 2005 at 12:27:44PM -0700, David Bear wrote: > > I'm finding the \copy is very brittle. It seems to stop for everyone > little reason. Is there a way to tell it to be more forgiving -- for > example, to ignore extra data fields that might exists on a line? > > Or, to have it just skip that offending record but continue on to the > next. Calling this behavior "brittle" is debatable; some people would call it "reliable": either the whole input validates -- matches what's expected -- or the command rejects it. This might sound unreasonable if a partial load is acceptable, but it's desirable behavior if a partial load is unacceptable. Some would argue that the correct solution is to fix the data, not to make the command that loads the data more forgiving. The developers' TODO list has a few outstanding items to enhance COPY, but I don't know if anybody's working on them: http://www.postgresql.org/docs/faqs.TODO.html * Allow COPY to report error lines and continue This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure. * Have COPY return the number of rows loaded/unloaded (?) -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On June 23, 2005 03:27 pm, David Bear wrote: > I'm finding the \copy is very brittle. It seems to stop for everyone > little reason. Is there a way to tell it to be more forgiving -- for > example, to ignore extra data fields that might exists on a line? > > Or, to have it just skip that offending record but continue on to the > next. > > I've got a tab delimited file, but if \copy sees any extra tabs in the > file it just stops at that record. I want to be able to control what > pg does when it hits an exception. > > I'm curious what others do for bulk data migration. Since copy seems > so brittle, there must be a better way... > You may use '-d' option of pg_dump in which case it dumps data into INSERT statements. In this case when you load the damped data it will process tabs properly and will fail any invalid records but finish theprocess itself. If you redirect output into a separate file you can analyze later how many records failed. May be that's what you need in your case. The only problem with this method I know is that it takes longer to load the data as it does full validation for each record. -- Vladimir Yevdokimov <vladimir@givex.com>
David Bear: Yes. I agree with you. \copy is really too brittle. I wonder why \copy is not like oracle's sqlldr? I think sqlldr is more powerful. When using sqlldr,we can specify the maximum error records we allow,and we can alsospecify the number we should commit in every transaction. I think PostgreSQL should consider this. Another aspect is also important. Oracle has better partition table facilities,it's especially suitable for largetables,as well as index partition concepts.But PostgreSQL has no such concepts.These are really important for largedatabase. I think if PostgreSQL can implement those above in the future.It will be more powerful and more suitable for largebusiness application. ======= 2005-06-24 03:27:44 you wrote:======= >I'm finding the \copy is very brittle. It seems to stop for everyone >little reason. Is there a way to tell it to be more forgiving -- for >example, to ignore extra data fields that might exists on a line? > >Or, to have it just skip that offending record but continue on to the >next. > >I've got a tab delimited file, but if \copy sees any extra tabs in the >file it just stops at that record. I want to be able to control what >pg does when it hits an exception. > >I'm curious what others do for bulk data migration. Since copy seems >so brittle, there must be a better way... > >-- >David Bear >phone: 480-965-8257 >fax: 480-965-9189 >College of Public Programs/ASU >Wilson Hall 232 >Tempe, AZ 85287-0803 > "Beware the IP portfolio, everyone will be suspect of trespassing" > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > = = = = = = = = = = = = = = = = = = = = Best regards! 李江华 Seamus Dean Alibaba.com TEL:0571-85022088-2287 ljh1469@alibaba-inc.com 2005-06-24
On Fri, 2005-06-24 at 00:52, 李江华 wrote: > David Bear: > > Yes. I agree with you. > \copy is really too brittle. > I wonder why \copy is not like oracle's sqlldr? > I think sqlldr is more powerful. When using sqlldr,we can specify the maximum error records we allow,and we canalso specify the number we should commit in every transaction. While I wouldn't mind have the versatility of sqlldr, I would HATE to have to deal with such a clunky interface. The only features of sqlldr I actually like are the ones that allow rejected records to go into a separate file. But like most of oracle's tools, it's awkward to use, and requires a lot of up front work even if you only want a simple load. Sure, add some features to \copy, but don't emulate the WAY those features are implemented oracle. > Another aspect is also important. Oracle has better partition table facilities,it's especially suitable for largetables,as well as index partition concepts.But PostgreSQL has no such concepts.These are really important for largedatabase. No one would complain if someone stepped up and programmed a decent table partitioning patch. I have a feeling it's one of the things on the TODO list that will happen in the next version or two.
On Fri, Jun 24, 2005 at 09:43:43AM -0500, Scott Marlowe wrote: > On Fri, 2005-06-24 at 00:52, ????????? wrote: > > David Bear: > > > > Yes. I agree with you. > > \copy is really too brittle. > > I wonder why \copy is not like oracle's sqlldr? > > I think sqlldr is more powerful. When using sqlldr,we can specify the maximum error records we allow,and we canalso specify the number we should commit in every transaction. > > While I wouldn't mind have the versatility of sqlldr, I would HATE to > have to deal with such a clunky interface. The only features of sqlldr > I actually like are the ones that allow rejected records to go into a > separate file. There's a PGLoader project in pgfoundry that may suit some people. > > Another aspect is also important. Oracle has better partition > > table facilities,it's especially suitable for large tables,as > > well as index partition concepts.But PostgreSQL has no such > > concepts.These are really important for large database. > > No one would complain if someone stepped up and programmed a decent > table partitioning patch. I have a feeling it's one of the things on > the TODO list that will happen in the next version or two. I think Simon Riggs and crew are working on this. -- Alvaro Herrera (<alvherre[a]surnet.cl>) Jude: I wish humans laid eggs Ringlord: Why would you want humans to lay eggs? Jude: So I can eat them