Re: Different length lines in COPY CSV - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Different length lines in COPY CSV
Date
Msg-id 20051212203535.GD30160@svana.org
Whole thread Raw
In response to Re: Different length lines in COPY CSV  (Tino Wildenhain <tino@wildenhain.de>)
List pgsql-hackers
On Mon, Dec 12, 2005 at 09:30:12PM +0100, Tino Wildenhain wrote:
> Am Montag, den 12.12.2005, 15:08 -0500 schrieb Andrew Dunstan:
> > You are probably right. The biggest wrinkle will be dealing with various
> > encodings, I suspect. That at least is one thing that doing CSV within
> > the backend bought us fairly painlessly. Perl's Text::CSV_XS module for
> > example simply handles this by declaring that only [\x09\x20-\x7f] are
> > valid in its non-binary mode, and in either mode appears to be MBCS
> > unaware. We should try to do better than that.
>
> Are there any test datafiles available in a repository?
> I could give it a shot I think.
>
> If not maybe we could set up something like that.

Note, recent versions of Perl allow you to specify the file encoding
when you open the file and will convert things to UTF-8 as appropriate.
So in theory it should be fairly simple to make a script that could
handle various encodings. The hardest part is always determining which
encoding a file is in in the first place...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Please Help: PostgreSQL Query Optimizer
Next
From: "Francisco Figueiredo Jr."
Date:
Subject: [Bug] Server Crash, possible security exploit, where to send security report?