Re: Different length lines in COPY CSV - Mailing list pgsql-hackers

From Mike G.
Subject Re: Different length lines in COPY CSV
Date
Msg-id 20051212191428.GA13700@localhost.localdomain
Whole thread Raw
In response to Re: Different length lines in COPY CSV  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
I too have to deal with this issue daily since I deal with Windows daily.  It sounds like the source of the csv files
wereoriginally Excel files. I have never been able to figure out how Excel determines to quit putting null values in
emptycolumns after X rows and resume again later on.
 

If the file has less than 65000 rows I would suggest using OpenOffice 2.0 instead of Excel.  OpenOffice does not stop
fillingthe empty columns and with 2.0 it now supports the same maximum number of rows that Excel does.
 

I use Perl constantly to "reformat" files and import them as a csv using the COPY command.  I think the original poster
wouldprefer a php solution though...
 

While it is not a problem for me I do have other less technical users who don't know perl and this makes postgres much
moredifficult for them to use.  Most of them come from a M$ Access background which can handle importing of Excel files
directlythus don't have to deal with this issue.  
 

A file conversion utility would be very helpful for supporting Postgres with Windows especially if it could handle
Excelfiles in their native format.
 


Mike


On Mon, Dec 12, 2005 at 07:58:52PM +0100, Martijn van Oosterhout wrote:
> On Mon, Dec 12, 2005 at 10:15:03AM -0500, Pollard, Mike wrote:
> > Tom Lane wrote:
> > > What's been suggested in the past is some sort of standalone
> > > file-format-conversion utility, which could deal with this sort of
> > > stuff without having to also deal with all the backend-internal
> > > considerations that COPY must handle.  So (at least in theory) it'd
> > > be simpler and more maintainable.  That still seems like a good
> > > idea to me --- in fact, given my druthers I would rather have seen
> > > CSV support done in such an external program.
> > 
> > Why not add hooks into COPY to call the user's massage functions?  That
> > way you don't have to read and write the data, then read it again to
> > load it into the database.
> 
> Well, it does make you wonder about supporting something like (perl
> style):
> 
> \copy foo FROM 'myfilter dodgy-data.csv |'
> 
> or maybe
> 
> \copy foo from pipe 'myfilter dodgy-data.csv'
> 
> or possibly
> 
> \pipe foo from dodgy-data.csv using autodetecting-format-filter.pl
> 
> Which would cause psql to fork/exec the filter and pass the output data
> to the server. We could then provide all sorts of parsers for
> format-of-the-week. This would probably include the converse:
> 
> \pipe foo to table.xls using make-excel-spreadsheet.pl
> 
> Have a nice day,
> -- 
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.




pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_relation_size locking
Next
From: "Mike G."
Date:
Subject: PgInstaller error on upgrade