Thread: CSV import

CSV import

From
Oliver Vecernik
Date:
Hi again!

After investigating a little bit further my CSV import couldn't work 
because of following reasons:

1. CSV files are delimited with CR/LF
2. text fields are surrounded by double quotes

Is there a direct way to import such files into PostgreSQL?

I would like to have something like MySQL provides:

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'   [REPLACE | IGNORE]   INTO TABLE tbl_name
[FIELDS      [TERMINATED BY '\t']       [[OPTIONALLY] ENCLOSED BY '']       [ESCAPED BY '\\' ]   ]   [LINES TERMINATED
BY'\n']   [IGNORE number LINES]   [(col_name,...)]
 

Has anybody written such a function already?

Regards,
Oliver

-- 
VECERNIK Datenerfassungssysteme
A-2560 Hernstein, Hofkogelgasse 17
Tel.: +43 2633 47530, Fax: DW 50
http://members.aon.at/vecernik




Re: CSV import

From
Achilleus Mantzios
Date:
On Tue, 28 Jan 2003, Oliver Vecernik wrote:

> Hi again!
>
> After investigating a little bit further my CSV import couldn't work
> because of following reasons:
>
> 1. CSV files are delimited with CR/LF
See below

> 2. text fields are surrounded by double quotes

in vi
:1,$ s/"//g

>
> Is there a direct way to import such files into PostgreSQL?
>
> I would like to have something like MySQL provides:
>
> LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
>     [REPLACE | IGNORE]
>     INTO TABLE tbl_name
>     [FIELDS
>         [TERMINATED BY '\t']
>         [[OPTIONALLY] ENCLOSED BY '']
>         [ESCAPED BY '\\' ]
>     ]
>     [LINES TERMINATED BY '\n']

make it     [LINES TERMINATED BY '\r\n']

>     [IGNORE number LINES]
>     [(col_name,...)]
>
> Has anybody written such a function already?
>
> Regards,
> Oliver
>
> --
> VECERNIK Datenerfassungssysteme
> A-2560 Hernstein, Hofkogelgasse 17
> Tel.: +43 2633 47530, Fax: DW 50
> http://members.aon.at/vecernik
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com       mantzios@softlab.ece.ntua.gr



Re: CSV import

From
Guy Fraser
Date:
Hi

You will need two text utilities {dos2unix and sed} to do this in the simplest 
way. They are fairly standard text utilities and are probably already on your 
machine.

This is how I would do it :

sed "s/\"//g" file_name.txt \| dos2unix \| pgsql -c "COPY table_name FROM STDIN USING DELIMITERS ',';" db

Where "file_name.txt" is the csv file you want to import and "table_name" is 
the previously created table you want to insert the data into and db is the 
database name.

How this works is "sed" {stream editor} removes all the double quote 
characters '"' then pipes the output through "dos2unix" which converts all the 
CRLF {DOS EOL} sequences into CR {UNIX EOL} characters, then pipes the data to 
"pgsql"  with a command that does a bulk insert into the table of the database 
you have selected.


Guy

Oliver Vecernik wrote:
> Hi again!
> 
> After investigating a little bit further my CSV import couldn't work 
> because of following reasons:
> 
> 1. CSV files are delimited with CR/LF
> 2. text fields are surrounded by double quotes
> 
> Is there a direct way to import such files into PostgreSQL?
> 
> I would like to have something like MySQL provides:
> 
> LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
>    [REPLACE | IGNORE]
>    INTO TABLE tbl_name
>    [FIELDS
>        [TERMINATED BY '\t']
>        [[OPTIONALLY] ENCLOSED BY '']
>        [ESCAPED BY '\\' ]
>    ]
>    [LINES TERMINATED BY '\n']
>    [IGNORE number LINES]
>    [(col_name,...)]
> 
> Has anybody written such a function already?
> 
> Regards,
> Oliver
> 




Re: CSV import

From
Jeff Eckermann
Date:
--- Oliver Vecernik <vecernik@aon.at> wrote:
> Is there a direct way to import such files into
> PostgreSQL?
> 

As I believe others have replied: no, not yet.

If you are absolutely sure that your data will _never_
contain commas, then the simple solution of just
deleting all of the quotes , then using COPY with
comma delimiters, will work.  Otherwise, parsing CSV
files gets just too complicated, and you are better
off using an existing solution (like a Perl module) to
preprocess your data.

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com


Re: CSV import

From
Jean-Luc Lachance
Date:
You can acheive the same result with:

tr -d '"\015' < file_name.txt | psql {etc...}

Unix EOL is LF not CR.


Guy Fraser wrote:
> 
> Hi
> 
> You will need two text utilities {dos2unix and sed} to do this in the simplest
> way. They are fairly standard text utilities and are probably already on your
> machine.
> 
> This is how I would do it :
> 
> sed "s/\"//g" file_name.txt \
>         | dos2unix \
>         | pgsql -c "COPY table_name FROM STDIN USING DELIMITERS ',';" db
> 
> Where "file_name.txt" is the csv file you want to import and "table_name" is
> the previously created table you want to insert the data into and db is the
> database name.
> 
> How this works is "sed" {stream editor} removes all the double quote
> characters '"' then pipes the output through "dos2unix" which converts all the
> CRLF {DOS EOL} sequences into CR {UNIX EOL} characters, then pipes the data to
> "pgsql"  with a command that does a bulk insert into the table of the database
> you have selected.
> 
> Guy
> 
> Oliver Vecernik wrote:
> > Hi again!
> >
> > After investigating a little bit further my CSV import couldn't work
> > because of following reasons:
> >
> > 1. CSV files are delimited with CR/LF
> > 2. text fields are surrounded by double quotes
> >
> > Is there a direct way to import such files into PostgreSQL?
> >
> > I would like to have something like MySQL provides:
> >
> > LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
> >    [REPLACE | IGNORE]
> >    INTO TABLE tbl_name
> >    [FIELDS
> >        [TERMINATED BY '\t']
> >        [[OPTIONALLY] ENCLOSED BY '']
> >        [ESCAPED BY '\\' ]
> >    ]
> >    [LINES TERMINATED BY '\n']
> >    [IGNORE number LINES]
> >    [(col_name,...)]
> >
> > Has anybody written such a function already?
> >
> > Regards,
> > Oliver
> >
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html


Re: CSV import

From
"Rodger Donaldson"
Date:
> --- Oliver Vecernik <vecernik@aon.at> wrote:
> > Is there a direct way to import such files into
> > PostgreSQL?
> > 
> 
> As I believe others have replied: no, not yet.
> 
> Otherwise, parsing CSV
> files gets just too complicated, and you are better
> off using an existing solution (like a Perl module) to
> preprocess your data.

The DBD::CSV module allows one to use a subset of SQL syntax on CSV 
files, as an example.  Docs are at 
http://search.cpan.org/author/JZUCKER/DBD-CSV-0.2002/lib/DBD/CSV.pm

-- 
Rodger Donaldson
rodgerd@diaspora.gen.nz


Re: CSV import

From
Oliver Vecernik
Date:
Oliver Vecernik schrieb:

> Hi again!
>
> After investigating a little bit further my CSV import couldn't work 
> because of following reasons:
>
> 1. CSV files are delimited with CR/LF
> 2. text fields are surrounded by double quotes
>
> Is there a direct way to import such files into PostgreSQL? 

The answer seems to be no. But after googeling a bit a found a wonderful 
Python module called csv at:

http://www.object-craft.com.au/projects/csv/

A minimal script called 'csv2tab.py' for conversion to a tab delimited 
file could be:

#!/usr/bin/env python
import csv
import sys

def convert(file):   try:       f = open(file, 'r')       lines = f.readlines()       p = csv.parser()       for line
inlines:           print '\t'.join(p.parse(line))   except:       print 'Error opening file!'  
 
if __name__ == '__main__':   convert(sys.argv[1]);

Regards,
Oliver

-- 
VECERNIK Datenerfassungssysteme
A-2560 Hernstein, Hofkogelgasse 17
Tel.: +43 2633 47530, Fax: DW 50
http://members.aon.at/vecernik





Re: CSV import

From
Gary Stainburn
Date:
On Wednesday 29 January 2003 5:50 am, Oliver Vecernik wrote:
> Oliver Vecernik schrieb:
> > Hi again!
> >
> > After investigating a little bit further my CSV import couldn't work
> > because of following reasons:
> >
> > 1. CSV files are delimited with CR/LF
> > 2. text fields are surrounded by double quotes
> >
> > Is there a direct way to import such files into PostgreSQL?

Here's a simple command that will take

"hello","world","splat","diddle"
"he said "hello world" to ","his mate"and convert it to the following tab delimited file that can be COPYed using 
psql. It even handles quotes inside fields. (^m and ^i are done by typing 
CTRL+V CTRL+M and CTRL+V CTRL+I)

hello   world   splat   diddle
he said "hello world" to        his mate

sed 's/^"//' <t.txt|sed 's/"^m$//'|sed 's/","/^i/g'>t1.txt

Gary

>
> The answer seems to be no. But after googeling a bit a found a wonderful
> Python module called csv at:
>
> http://www.object-craft.com.au/projects/csv/
>
> A minimal script called 'csv2tab.py' for conversion to a tab delimited
> file could be:
>
> #!/usr/bin/env python
>
> import csv
> import sys
>
> def convert(file):
>     try:
>         f = open(file, 'r')
>         lines = f.readlines()
>         p = csv.parser()
>         for line in lines:
>             print '\t'.join(p.parse(line))
>     except:
>         print 'Error opening file!'
>
> if __name__ == '__main__':
>     convert(sys.argv[1]);
>
> Regards,
> Oliver

-- 
Gary Stainburn
This email does not contain private or confidential material as it
may be snooped on by interested government parties for unknown
and undisclosed purposes - Regulation of Investigatory Powers Act, 2000     



Re: CSV import

From
"Chad Thompson"
Date:

> 
> Unix EOL is LF not CR.
> 
> 

Is this the only difference between a dos and unix text file?

Thanks
Chad



Re: CSV import

From
Oliver Vecernik
Date:
Chad Thompson schrieb:

>  
>
>>Unix EOL is LF not CR.
>>
>>
>>    
>>
>
>Is this the only difference between a dos and unix text file?
>
Yes, but to be more precise:
dos: CR + LF
unix: LF
mac: CR

Oliver

-- 
VECERNIK Datenerfassungssysteme
A-2560 Hernstein, Hofkogelgasse 17
Tel.: +43 2633 47530, Fax: DW 50
http://members.aon.at/vecernik





Re: CSV import

From
Jean-Luc Lachance
Date:
In DOS and Windows, text lines end with <CR><LF>.
In Unix, text lines end with <LF> only.
               hex   dec    oct
<CR>=CTRL-M or 0x0D or 13 or 015
<LF>=CTRL-J or 0x0A or 10 or 012



Chad Thompson wrote:
> 
> >
> > Unix EOL is LF not CR.
> >
> >
> 
> Is this the only difference between a dos and unix text file?
> 
> Thanks
> Chad
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html


Re: CSV import

From
Guy Fraser
Date:
FYI

In text files on a Mac. the EOL character is a <CR> only.

What a messy thing this whole EOL cruft is.

To convert between these text formats on linux is easy if you have dos2unix.

The dos2unix on linux can perform many format conversions to and from unix,dos 
and mac formats.

On BSD you need dos2unix to convert from dos to unix and unix2dos to convert 
from unix to dos. You probably need to get the GNU version of dos2unix or 
mac2unix to convert to or from mac formatted text.


Guy

Jean-Luc Lachance wrote:
> In DOS and Windows, text lines end with <CR><LF>.
> In Unix, text lines end with <LF> only.
> 
>                 hex   dec    oct
> <CR>=CTRL-M or 0x0D or 13 or 015
> <LF>=CTRL-J or 0x0A or 10 or 012
> 
> 
> 
> Chad Thompson wrote:
> 
>>>Unix EOL is LF not CR.
>>>
>>>
>>
>>Is this the only difference between a dos and unix text file?
>>
>>Thanks
>>Chad
---%<...snip...