Home > mailing lists

Re: COPY and double-quote-separated fields - Mailing list pgsql-general

From	Jason Godden
Subject	Re: COPY and double-quote-separated fields
Date	August 3, 2003 08:08:53
Msg-id	200308032108.37036.jasongodden@optushome.com.au Whole thread Raw
In response to	COPY and double-quote-separated fields (Ron Johnson <ron.l.johnson@cox.net>)
Responses	Re: COPY and double-quote-separated fields
List	pgsql-general

Tree view

Hi Ron,

I have the same issue - best thing is to run the data through sed but pipe it
straight to the psql import table - that way you never have to worry about
creating misc. files.  I also always use an intermediate temporary import
table rather than populate my actual db tables and usually copy from stdin
rather than a file.  That way you can do your imports without having the data
in a directory readable by the pg process owner, ie:

cat unprocesseddata.txt | sed "s/\",\"/$TAB/g" | sed "s/\"//g" | sed "s/.$//g"
| psql -dmydatabase -c "copy importtable from stdin delimiter '\t';"

The first sed will replace all instances of "," with a tab character
(I achieve tab in my shell scripts with a line like TAB=$(printf "\t"))

The second sed will remove the start and end " because the first only checks
for ",".

The third sed will remove the Windows carriage returns (if any - depends on
the platform the file was generated from.

Note here that my file format never contains any " in a field so I can safely
run the second sed.  If your import file contains these then you will need to
change the regex or use awk.

If you don't want to use sed at all however you can probably do:

cat unprocesseddata.txt | psql -dmydatabase -c "copy importtable from stdin
delimiter '\",\"';"

(you may have to investigate the escape characters around the " there)

and then run:

update importtable set firstfield = trim(beginning '\"' from firstfield);
update importtable set lastfield = trim(end '\"' from lastfield);

or if you're concerned about additional " in the import:

update importtable set firstfield =
substring(firstfield,2,length(firstfield)-1);
update importtable set lastfield =
substring(lastfield,1,length(firstfield)-1);

... within the same transaction.

hth,

Jason

On Sun, 3 Aug 2003 05:36 pm, Ron Johnson wrote:
> Hi,
>
> PostgreSQL 7.3.3 on i386-pc-linux-gnu, compiled by GCC gcc
> (GCC) 3.3 (Debian)
>
> I must load a file in which each field in each row is contained
> in double quotes, i.e.
> "1","foo"
> "2","bar"
> "3","puddle"
>
> test1=# copy foo from '/usr/local/data/zip/foo.unl';
> ERROR:  copy: line 1, pg_atoi: error in ""1","foo"": can't parse
> ""1","foo""
>
> I have a very large dataset (14.3GiB in 22 files) that has such
> a format, and would prefer not to sed each one of them, creating
> new files in the process.
>
> Any suggestions?
>
> Many TIA

pgsql-general by date:

From: Ron Johnson
Date: 03 August 2003, 04:37:03
Subject: COPY and double-quote-separated fields

From: "Shridhar Daithankar"
Date: 03 August 2003, 08:30:08
Subject: Re: Monthly table partitioning for fast purges?

Re: COPY and double-quote-separated fields - Mailing list pgsql-general

Previous

Next