On October 25, 2007 09:35:23 am Chuck D. wrote:
> On October 24, 2007 01:10:59 am Paul Lambert wrote:
> > I get around this problem with my data loads by specifying some other
> > arbitrary character that I know won't appear in the data as the quote
> > character.
> >
> > Eg QUOTE E'\f' will specify form feed as the quote character, ergo any
> > data with double or single quotes will be loaded with those quote
> > characters in the string.
> >
> > Something similar may help with your case.
>
> This was the solution. I specified a quote character that was not in the
> data and the data imported perfectly. Without specifying any delimiter
> postgres defaults to one of the quotes (I forget which).
>
> Unfortunately, the data I imported wasn't good. MaxMind, like the
> Geonames.org derivatives, uses FIPS code for a state identifier in the
> cities table for all countries EXCEPT USA in which case they use the iso
> code. Both these data sets mix types within one column and I find that
> absolutely unacceptable.
>
> Back to my original problem, which was trying to COPY in some of the
> earth-info.nga.mil world city data. This data is tab delimited, no quotes
> around fields, newline line terminated and UTF-8 encoded.
>
> Using a similar COPY statement with the defaults, it fails with this:
>
> COPY geo.orig_city FROM
> '/home/www/geo/DATA/nga.mil/geonames_no_header.txt';
>
> ERROR: literal carriage return found in data
> HINT: Use "\r" to represent carriage return.
> CONTEXT: COPY orig_city, line 1071850
>
> And of course, at that line we find a field that has several lines which
> appear (using cat -A) to be terminated with a new line ($). I originally
> deleted this line but there are others like it. And the file is 2 Gigs in
> size so it isn't acceptable to comb through it.
>
> I believe this is a new problem because I have a vintage file dated early
> 2007 that didn't have this problem. Does anyone know how to solve this
> COPY issue?
>
Pardon me on this, the cat -A report for the failed line (and subsequent
lines) shows ^M$ within the field, not just $.
I assume that is probably a \r\n and postgres wants \r for field data and \n
to end a line.
I've tried working this over with sed but can't get the syntax right. I also
have iconv installed if that would help any. Are there any good tools that
will tell me what this really is instead of just ^M$ ?