On Mon, May 28, 2007 at 11:18:37AM -0400, Tom Lane wrote:
L> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Johann Spies wrote:
> >> #3 0x00000000005ff1c8 in perform_default_encoding_conversion (
> >> src=0x2b881e87f040 "\"2007-04-20\" \"18:07:06\" \"192.168.0.100\" \"73\"
\"http://www.kerkbode.co.za/kerkbode/images/blank.gif\\\"\"\n\"2007-04-20\"\"18:07:06\" \"192.168.0.100\" \"69\"
\"http://develop.christians.co.za/phpAdsNew/adx.js\"\n\"20"...,len=337739539, is_client_to_server=5 '\005') at
mbutils.c:461
>
>
> > Interesting. You are passing a single 300 MB string here, not
> > 90-something char lines. The problem is probably in newlines. Why are
> > they not getting used as line separators escapes me.
>
> Stripped of gdb's own backslashing, the data seems to be
>
> ...images/blank.gif\""
> "2007...
>
Thanks to you both. I have grepped -C 5 for the particuler line(s) and the result
was:
"2007-04-20" "18:07:06" "146.232.182.90" "0" "http://static.ak.facebook.com/images/x_to_hide_hover.gif"
"2007-04-20" "18:07:06" "192.168.0.100" "73" "http://www.kerkbode.co.za/kerkbode/images/blank.gif\""
"2007-04-20" "18:07:06" "192.168.0.100" "69" "http://develop.christians.co.za/phpAdsNew/adx.js"
"2007-04-20" "18:07:06" "146.232.186.248" "873"
"http://ads.cnn.com/html.ng/site=cnn&cnn_position=306x60_lft&cnn_rollup=homepage¶ms.styles=fs&tile=1177085201252&page.allowcompete=yes&domId=283029"
"2007-04-20" "18:07:06" "146.232.182.216" "13958"
"http://us.js2.yimg.com/us.js.yimg.com/lib/s2/yschx_intl_20070330.css"
"2007-04-20" "18:07:06" "146.232.141.240" "5246"
"http://photos-819.ak.facebook.com/photos-ak-sf2p/v75/18/87/46601522/a46601522_30445819_7495.jpg"
"2007-04-20" "18:07:06" "146.232.53.112" "30663"
"http://spe.atdmt.com/ds/YCMEAOAKLO07/oakley_sportsperformance/mlt_RI_300x250.jpg?"
> which leads me to think that the dump was generated on the assumption
> that backslash is the escape character, but is being reloaded under a
> different assumption. Once you got out-of-sync on that one data value,
> you'd have a mess --- all the intended line-ending newlines would be
> inside quotes, up till you hit another URL with an embedded quote
> (or newline?). CSV is not exactly a robust representation :-(
What alternative is there to import large amounts of data?
> So the short answer seems to be that we were all misled by the large
> alloc-request number into thinking there was some data-corruption
> behavior involved, when we should have been quizzing Johann on exactly
> what COPY parameters he was giving.
I think I gave it in my first email. Here it is:
\copy squidlogs from '/home/js/accesspg.csv' using delimiters ' ' csv quote as '"'
Regards
Johann
--
Johann Spies Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch
"Let no man despise thy youth; but be thou an example
of the believers, in word, in conversation, in
charity, in spirit, in faith, in purity."
I Timothy 4:12