Re: request for help with COPY syntax - Mailing list pgsql-sql

From Chuck D.
Subject Re: request for help with COPY syntax
Date
Msg-id 200710251016.01844.pgsql-list@nullmx.com
Whole thread Raw
In response to Re: request for help with COPY syntax  ("Chuck D." <pgsql-list@nullmx.com>)
Responses Re: request for help with COPY syntax
List pgsql-sql
On October 25, 2007 09:35:23 am Chuck D. wrote:
> On October 24, 2007 01:10:59 am Paul Lambert wrote:
> > I get around this problem with my data loads by specifying some other
> > arbitrary character that I know won't appear in the data as the quote
> > character.
> >
> > Eg QUOTE E'\f' will specify form feed as the quote character, ergo any
> > data with double or single quotes will be loaded with those quote
> > characters in the string.
> >
> > Something similar may help with your case.
>
> This was the solution.  I specified a quote character that was not in the
> data and the data imported perfectly.  Without specifying any delimiter
> postgres defaults to one of the quotes (I forget which).
>
> Unfortunately, the data I imported wasn't good.  MaxMind, like the
> Geonames.org derivatives, uses FIPS code for a state identifier in the
> cities table for all countries EXCEPT USA in which case they use the iso
> code.  Both these data sets mix types within one column and I find that
> absolutely unacceptable.
>
> Back to my original problem, which was trying to COPY in some of the
> earth-info.nga.mil world city data.  This data is tab delimited, no quotes
> around fields, newline line terminated and UTF-8 encoded.
>
> Using a similar COPY statement with the defaults, it fails with this:
>
> COPY geo.orig_city FROM
> '/home/www/geo/DATA/nga.mil/geonames_no_header.txt';
>
> ERROR:  literal carriage return found in data
> HINT:  Use "\r" to represent carriage return.
> CONTEXT:  COPY orig_city, line 1071850
>
> And of course, at that line we find a field that has several lines which
> appear (using cat -A) to be terminated with a new line ($).  I originally
> deleted this line but there are others like it.  And the file is 2 Gigs in
> size so it isn't acceptable to comb through it.
>
> I believe this is a new problem because I have a vintage file dated early
> 2007 that didn't have this problem.  Does anyone know how to solve this
> COPY issue?
>


Pardon me on this, the cat -A report for the failed line (and subsequent 
lines) shows ^M$ within the field, not just $.

I assume that is probably a \r\n and postgres wants \r for field data and \n 
to end a line.

I've tried working this over with sed but can't get the syntax right.  I also 
have iconv installed if that would help any.  Are there any good tools that 
will tell me what this really is instead of just ^M$ ?


pgsql-sql by date:

Previous
From: "Chuck D."
Date:
Subject: Re: request for help with COPY syntax
Next
From: Steve Midgley
Date:
Subject: (repost) Help understanding expressions in order by clause