Re: COPY and double-quote-separated fields - Mailing list pgsql-general

From Jason Godden
Subject Re: COPY and double-quote-separated fields
Date
Msg-id 200308032108.37036.jasongodden@optushome.com.au
Whole thread Raw
In response to COPY and double-quote-separated fields  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: COPY and double-quote-separated fields  (Ron Johnson <ron.l.johnson@cox.net>)
List pgsql-general
Hi Ron,

I have the same issue - best thing is to run the data through sed but pipe it
straight to the psql import table - that way you never have to worry about
creating misc. files.  I also always use an intermediate temporary import
table rather than populate my actual db tables and usually copy from stdin
rather than a file.  That way you can do your imports without having the data
in a directory readable by the pg process owner, ie:

cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
| psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"

The first sed will replace all instances of "," with a tab character
(I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))

The second sed will remove the start and end " because the first only checks
for ",".

The third sed will remove the Windows carriage returns (if any - depends on
the platform the file was generated from.

Note here that my file format never contains any " in a field so I can safely
run the second sed.  If your import file contains these then you will need to
change the regex or use awk.

If you don't want to use sed at all however you can probably do:

cat unprocesseddata.txt | psql -dmydatabase -c "copy importtable from stdin
delimiter '\",\"';"

(you may have to investigate the escape characters around the " there)

and then run:

update importtable set firstfield = trim(beginning '\"' from firstfield);
update importtable set lastfield = trim(end '\"' from lastfield);

or if you're concerned about additional " in the import:

update importtable set firstfield =
substring(firstfield,2,length(firstfield)-1);
update importtable set lastfield =
substring(lastfield,1,length(firstfield)-1);

... within the same transaction.

hth,

Jason

On Sun, 3 Aug 2003 05:36 pm, Ron Johnson wrote:
> Hi,
>
> PostgreSQL 7.3.3 on i386-pc-linux-gnu, compiled by GCC gcc
> (GCC) 3.3 (Debian)
>
> I must load a file in which each field in each row is contained
> in double quotes, i.e.
> "1","foo"
> "2","bar"
> "3","puddle"
>
> test1=# copy foo from '/usr/local/data/zip/foo.unl';
> ERROR:  copy: line 1, pg_atoi: error in ""1","foo"": can't parse
> ""1","foo""
>
> I have a very large dataset (14.3GiB in 22 files) that has such
> a format, and would prefer not to sed each one of them, creating
> new files in the process.
>
> Any suggestions?
>
> Many TIA


pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: COPY and double-quote-separated fields
Next
From: "Shridhar Daithankar"
Date:
Subject: Re: Monthly table partitioning for fast purges?