Thread: problems with copy from file

problems with copy from file

From
Andreas Kretschmer
Date:
hello @ll,

i have a simpe table :

test=# create table httpd_log(id serial primary key, data text);
CREATE TABLE


and i'm trying to import from a httpd-log, that contains this:

kretschmer@tux:~$ cat test.log
...
domain.de aaa.63.xx.yy - - [06/Nov/2014:00:48:22 +0100] "GET /index.php/impressum2/year.listevents/2015/01/08/101
HTTP/1.0"200 28076 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)" "-" 
other-domain.de bb.243.xx.yyy - - [06/Nov/2014:00:48:22 +0100] "\x16\x03\x01\x01\xb1\x01" 501 1037 "-" "-" "-"
domain.tld cc.249.xx.yy - - [06/Nov/2014:00:48:22 +0100] "GET /leipzig/transport?start=Uranusstr.&ziel=Finkengrund
HTTP/1.1"200 5610 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 
...


Url's and IP's changed by me. As you can see: simple ASCII-Text


test=*# \copy httpd_log (data) from '~/test.log';
ERROR:  invalid byte sequence for encoding "UTF8": 0xb1
CONTEXT:  COPY httpd_log, line 3: "other-domain bb.243.xx.yyy - - [06/Nov/2014:00:48:22 +0100]
"\x16\x03\x01\x01\xb1\x01"501 10..." 



I have a solution:

test=*# \copy apache_log (data) from program 'sed -e "s/\\x/\\\\x/g" test.log';
COPY 5

The table contains the corrent content:

...
other-domain.de bb.243.xx.yyy - - [06/Nov/2014:00:48:22 +0100] "\x16\x03\x01\x01\xb1\x01" 501 1037 "-" "-" "-"
...

But is there a way to COPY the file without external tools, in this case sed?



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: problems with copy from file

From
Jim Nasby
Date:
On 10/14/15 11:40 AM, Andreas Kretschmer wrote:
> test=*# \copy httpd_log (data) from '~/test.log';
> ERROR:  invalid byte sequence for encoding "UTF8": 0xb1
> CONTEXT:  COPY httpd_log, line 3: "other-domain bb.243.xx.yyy - - [06/Nov/2014:00:48:22 +0100]
"\x16\x03\x01\x01\xb1\x01"501 10..." 

I don't think it's possible with COPY. The issue is that COPY is passing
the string to text_in() to process, and text_in is treating that as an
escape sequence (which it is...).

You could maybe create a raw_text type that had a different input
function and define a view that had that type and a trigger to put the
data in the table. You wouldn't want to use raw_text in a table though,
because you wouldn't be able to safely dump and restore it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com