Re: Two small questions re/ COPY CSV data into table - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Two small questions re/ COPY CSV data into table
Date
Msg-id CA+bJJbxJrNzcE-iqnRZ4g9zFGpkYZHxf7uwLvvs3i1yizhDvNA@mail.gmail.com
Whole thread Raw
In response to Two small questions re/ COPY CSV data into table  (Matthias Apitz <guru@unixarea.de>)
List pgsql-general
Mathias:

On Tue, Jun 4, 2019 at 8:35 PM Matthias Apitz <guru@unixarea.de> wrote:
> El día Tuesday, June 04, 2019 a las 05:41:43PM +0200, Francisco Olarte escribió:

> > On Tue, Jun 4, 2019 at 5:03 PM Matthias Apitz <guru@unixarea.de> wrote:
> > > Well, it's not strictly CSV, but it is what the Sybase tool produces. The
> > > delimiter is set to '|' and a data example line looks like:
> > > 0|1| cat $1 \| lpr -Pprinter |3|4
> > > I do load this now with COPY in mode TEXT and modify the data before
> > > with:
> > >  sed 's/|/\v/g' < table-from-Sybase  | sed 's/\\\v/|/g' > table-for-copy
> > You are lucky it is not CSV, it is much better, it's similar to the
> > text format originally used by postgres. Your problem is just it uses
> > | instead of TAB for field delimiter.

> > What I d not know is why you use \v, doesn't sed use \t for TAB?
> I was thinking about \t and was afraid, that in ~2 GByte char fields
> even some \t could be used in the data. That's why I decided to use \v
> which is more unlikely to be used in our data.

Fine then. I haven't got much sed-foo under my belt, but if you do
this in perl ( I learned perl a while ago, forgot all awk and sed to
free some brain cells ;-> ) or something similar you can use high code
points or control chars ( and I do not know if sed can do several
replacements in a pass, but perl certainly can ). Something like (
untested ) "perl -pe 's/\\\\/\001/g; s/\\\|/\002/; s/\|/\t/;
s/\002/|/; s/\001/\\\\/", the advantage over the pipe approach may be
noticeable on big files, as it cuts a lot of context switches ( OTOH
it'll run single-core ).

( the sample tries to do
    scaped backslash to soh
    scaped pipe to stx
    pipe to tab
    stx to unescaped pipe
    soh to escaped backlash
)

In fact you can do "tab to backslash t" after phase 2 and tab would be
ok in the input.

But, as I said before and others have also pointed, COPY is perfectly
happy to use pipe as delimiter and havinf it escaped with backslash on
input.

> (You didn't cc'ed the list, by intention?)

No, I forgot to hit reply-all because I did not pay enough attention,
my fault. Doing it now in case some one thinks this is interesting.

Francisco Olarte.



pgsql-general by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Two small questions re/ COPY CSV data into table
Next
From: PegoraroF10
Date:
Subject: dump and restore ordered by schema or table name