Thread: CSV import
Hi again! After investigating a little bit further my CSV import couldn't work because of following reasons: 1. CSV files are delimited with CR/LF 2. text fields are surrounded by double quotes Is there a direct way to import such files into PostgreSQL? I would like to have something like MySQL provides: LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE tbl_name [FIELDS [TERMINATED BY '\t'] [[OPTIONALLY] ENCLOSED BY ''] [ESCAPED BY '\\' ] ] [LINES TERMINATED BY'\n'] [IGNORE number LINES] [(col_name,...)] Has anybody written such a function already? Regards, Oliver -- VECERNIK Datenerfassungssysteme A-2560 Hernstein, Hofkogelgasse 17 Tel.: +43 2633 47530, Fax: DW 50 http://members.aon.at/vecernik
On Tue, 28 Jan 2003, Oliver Vecernik wrote: > Hi again! > > After investigating a little bit further my CSV import couldn't work > because of following reasons: > > 1. CSV files are delimited with CR/LF See below > 2. text fields are surrounded by double quotes in vi :1,$ s/"//g > > Is there a direct way to import such files into PostgreSQL? > > I would like to have something like MySQL provides: > > LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' > [REPLACE | IGNORE] > INTO TABLE tbl_name > [FIELDS > [TERMINATED BY '\t'] > [[OPTIONALLY] ENCLOSED BY ''] > [ESCAPED BY '\\' ] > ] > [LINES TERMINATED BY '\n'] make it [LINES TERMINATED BY '\r\n'] > [IGNORE number LINES] > [(col_name,...)] > > Has anybody written such a function already? > > Regards, > Oliver > > -- > VECERNIK Datenerfassungssysteme > A-2560 Hernstein, Hofkogelgasse 17 > Tel.: +43 2633 47530, Fax: DW 50 > http://members.aon.at/vecernik > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-10-8981112 fax: +30-10-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
Hi You will need two text utilities {dos2unix and sed} to do this in the simplest way. They are fairly standard text utilities and are probably already on your machine. This is how I would do it : sed "s/\"//g" file_name.txt \| dos2unix \| pgsql -c "COPY table_name FROM STDIN USING DELIMITERS ',';" db Where "file_name.txt" is the csv file you want to import and "table_name" is the previously created table you want to insert the data into and db is the database name. How this works is "sed" {stream editor} removes all the double quote characters '"' then pipes the output through "dos2unix" which converts all the CRLF {DOS EOL} sequences into CR {UNIX EOL} characters, then pipes the data to "pgsql" with a command that does a bulk insert into the table of the database you have selected. Guy Oliver Vecernik wrote: > Hi again! > > After investigating a little bit further my CSV import couldn't work > because of following reasons: > > 1. CSV files are delimited with CR/LF > 2. text fields are surrounded by double quotes > > Is there a direct way to import such files into PostgreSQL? > > I would like to have something like MySQL provides: > > LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' > [REPLACE | IGNORE] > INTO TABLE tbl_name > [FIELDS > [TERMINATED BY '\t'] > [[OPTIONALLY] ENCLOSED BY ''] > [ESCAPED BY '\\' ] > ] > [LINES TERMINATED BY '\n'] > [IGNORE number LINES] > [(col_name,...)] > > Has anybody written such a function already? > > Regards, > Oliver >
--- Oliver Vecernik <vecernik@aon.at> wrote: > Is there a direct way to import such files into > PostgreSQL? > As I believe others have replied: no, not yet. If you are absolutely sure that your data will _never_ contain commas, then the simple solution of just deleting all of the quotes , then using COPY with comma delimiters, will work. Otherwise, parsing CSV files gets just too complicated, and you are better off using an existing solution (like a Perl module) to preprocess your data. __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
You can acheive the same result with: tr -d '"\015' < file_name.txt | psql {etc...} Unix EOL is LF not CR. Guy Fraser wrote: > > Hi > > You will need two text utilities {dos2unix and sed} to do this in the simplest > way. They are fairly standard text utilities and are probably already on your > machine. > > This is how I would do it : > > sed "s/\"//g" file_name.txt \ > | dos2unix \ > | pgsql -c "COPY table_name FROM STDIN USING DELIMITERS ',';" db > > Where "file_name.txt" is the csv file you want to import and "table_name" is > the previously created table you want to insert the data into and db is the > database name. > > How this works is "sed" {stream editor} removes all the double quote > characters '"' then pipes the output through "dos2unix" which converts all the > CRLF {DOS EOL} sequences into CR {UNIX EOL} characters, then pipes the data to > "pgsql" with a command that does a bulk insert into the table of the database > you have selected. > > Guy > > Oliver Vecernik wrote: > > Hi again! > > > > After investigating a little bit further my CSV import couldn't work > > because of following reasons: > > > > 1. CSV files are delimited with CR/LF > > 2. text fields are surrounded by double quotes > > > > Is there a direct way to import such files into PostgreSQL? > > > > I would like to have something like MySQL provides: > > > > LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' > > [REPLACE | IGNORE] > > INTO TABLE tbl_name > > [FIELDS > > [TERMINATED BY '\t'] > > [[OPTIONALLY] ENCLOSED BY ''] > > [ESCAPED BY '\\' ] > > ] > > [LINES TERMINATED BY '\n'] > > [IGNORE number LINES] > > [(col_name,...)] > > > > Has anybody written such a function already? > > > > Regards, > > Oliver > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
> --- Oliver Vecernik <vecernik@aon.at> wrote: > > Is there a direct way to import such files into > > PostgreSQL? > > > > As I believe others have replied: no, not yet. > > Otherwise, parsing CSV > files gets just too complicated, and you are better > off using an existing solution (like a Perl module) to > preprocess your data. The DBD::CSV module allows one to use a subset of SQL syntax on CSV files, as an example. Docs are at http://search.cpan.org/author/JZUCKER/DBD-CSV-0.2002/lib/DBD/CSV.pm -- Rodger Donaldson rodgerd@diaspora.gen.nz
Oliver Vecernik schrieb: > Hi again! > > After investigating a little bit further my CSV import couldn't work > because of following reasons: > > 1. CSV files are delimited with CR/LF > 2. text fields are surrounded by double quotes > > Is there a direct way to import such files into PostgreSQL? The answer seems to be no. But after googeling a bit a found a wonderful Python module called csv at: http://www.object-craft.com.au/projects/csv/ A minimal script called 'csv2tab.py' for conversion to a tab delimited file could be: #!/usr/bin/env python import csv import sys def convert(file): try: f = open(file, 'r') lines = f.readlines() p = csv.parser() for line inlines: print '\t'.join(p.parse(line)) except: print 'Error opening file!' if __name__ == '__main__': convert(sys.argv[1]); Regards, Oliver -- VECERNIK Datenerfassungssysteme A-2560 Hernstein, Hofkogelgasse 17 Tel.: +43 2633 47530, Fax: DW 50 http://members.aon.at/vecernik
On Wednesday 29 January 2003 5:50 am, Oliver Vecernik wrote: > Oliver Vecernik schrieb: > > Hi again! > > > > After investigating a little bit further my CSV import couldn't work > > because of following reasons: > > > > 1. CSV files are delimited with CR/LF > > 2. text fields are surrounded by double quotes > > > > Is there a direct way to import such files into PostgreSQL? Here's a simple command that will take "hello","world","splat","diddle" "he said "hello world" to ","his mate"and convert it to the following tab delimited file that can be COPYed using psql. It even handles quotes inside fields. (^m and ^i are done by typing CTRL+V CTRL+M and CTRL+V CTRL+I) hello world splat diddle he said "hello world" to his mate sed 's/^"//' <t.txt|sed 's/"^m$//'|sed 's/","/^i/g'>t1.txt Gary > > The answer seems to be no. But after googeling a bit a found a wonderful > Python module called csv at: > > http://www.object-craft.com.au/projects/csv/ > > A minimal script called 'csv2tab.py' for conversion to a tab delimited > file could be: > > #!/usr/bin/env python > > import csv > import sys > > def convert(file): > try: > f = open(file, 'r') > lines = f.readlines() > p = csv.parser() > for line in lines: > print '\t'.join(p.parse(line)) > except: > print 'Error opening file!' > > if __name__ == '__main__': > convert(sys.argv[1]); > > Regards, > Oliver -- Gary Stainburn This email does not contain private or confidential material as it may be snooped on by interested government parties for unknown and undisclosed purposes - Regulation of Investigatory Powers Act, 2000
> > Unix EOL is LF not CR. > > Is this the only difference between a dos and unix text file? Thanks Chad
Chad Thompson schrieb: > > >>Unix EOL is LF not CR. >> >> >> >> > >Is this the only difference between a dos and unix text file? > Yes, but to be more precise: dos: CR + LF unix: LF mac: CR Oliver -- VECERNIK Datenerfassungssysteme A-2560 Hernstein, Hofkogelgasse 17 Tel.: +43 2633 47530, Fax: DW 50 http://members.aon.at/vecernik
In DOS and Windows, text lines end with <CR><LF>. In Unix, text lines end with <LF> only. hex dec oct <CR>=CTRL-M or 0x0D or 13 or 015 <LF>=CTRL-J or 0x0A or 10 or 012 Chad Thompson wrote: > > > > > Unix EOL is LF not CR. > > > > > > Is this the only difference between a dos and unix text file? > > Thanks > Chad > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
FYI In text files on a Mac. the EOL character is a <CR> only. What a messy thing this whole EOL cruft is. To convert between these text formats on linux is easy if you have dos2unix. The dos2unix on linux can perform many format conversions to and from unix,dos and mac formats. On BSD you need dos2unix to convert from dos to unix and unix2dos to convert from unix to dos. You probably need to get the GNU version of dos2unix or mac2unix to convert to or from mac formatted text. Guy Jean-Luc Lachance wrote: > In DOS and Windows, text lines end with <CR><LF>. > In Unix, text lines end with <LF> only. > > hex dec oct > <CR>=CTRL-M or 0x0D or 13 or 015 > <LF>=CTRL-J or 0x0A or 10 or 012 > > > > Chad Thompson wrote: > >>>Unix EOL is LF not CR. >>> >>> >> >>Is this the only difference between a dos and unix text file? >> >>Thanks >>Chad ---%<...snip...