Re: Parallel copy - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel copy
Date
Msg-id 20201030203730.eicjk6542pwoicvb@development
Whole thread Raw
In response to Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel copy
List pgsql-hackers
Hi,

I've done a bit more testing today, and I think the parsing is busted in
some way. Consider this:

     test=# create extension random;
     CREATE EXTENSION
     
     test=# create table t (a text);
     CREATE TABLE
     
     test=# insert into t select random_string(random_int(10, 256*1024)) from generate_series(1,10000);
     INSERT 0 10000
     
     test=# copy t to '/mnt/data/t.csv';
     COPY 10000
     
     test=# truncate t;
     TRUNCATE TABLE
     
     test=# copy t from '/mnt/data/t.csv';
     COPY 10000
     
     test=# truncate t;
     TRUNCATE TABLE
     
     test=# copy t from '/mnt/data/t.csv' with (parallel 2);
     ERROR:  invalid byte sequence for encoding "UTF8": 0x00
     CONTEXT:  COPY t, line 485:
"m&\nh%_a"%r]>qtCl:Q5ltvF~;2oS6@HB>F>og,bD$Lw'nZY\tYl#BH\t{(j~ryoZ08"SGU~.}8CcTRk1\ts$@U3szCC+U1U3i@P..."
     parallel worker


The functions come from an extension I use to generate random data, I've
pushed it to github [1]. The random_string() generates a random string
with ASCII characters, symbols and a couple special characters (\r\n\t).
The intent was to try loading data where a fields may span multiple 64kB
blocks and may contain newlines etc.

The non-parallel copy works fine, the parallel one fails. I haven't
investigated the details, but I guess it gets confused about where a
string starts/end, or something like that.


[1] https://github.com/tvondra/random


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: making update/delete of inheritance trees scale better
Next
From: Tomas Vondra
Date:
Subject: Re: Parallel copy