Home > mailing lists

Re: multiline CSV fields - Mailing list pgsql-hackers

From	Patrick B Kelly
Subject	Re: multiline CSV fields
Date	November 12, 2004 15:47:59
Msg-id	ED80A94A-34C1-11D9-B14C-000A958A3956@patrickbkelly.org Whole thread Raw
In response to	Re: multiline CSV fields (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Nov 12, 2004, at 12:20 AM, Tom Lane wrote:

> Patrick B Kelly <pbk@patrickbkelly.org> writes:
>> I may not be explaining myself well or I may fundamentally
>> misunderstand how copy works.
>
> Well, you're definitely ignoring the character-set-conversion issue.
>

I was not trying to ignore the character set and encoding issues but 
perhaps my assumptions are naive or overly optimistic. I realized that 
quotes are not as consistent as the NL characters but I was assuming 
that some encodings would escape to ASCII or a similar encoding like 
JIS Roman that would simplify recognition of the quote character. 
Unicode files make recognizing other punctuation like the quote fairly 
straightforward and to the naive observer, the code in CopyReadLine as 
it is currently written appears to handle multi-byte encodings such as 
SJIS that may present characters below 127 in trailing bytes.

As I said, perhaps I was oversimplifying. Is there a regression test 
set of input files for that I could review to see all of the supported 
encodings?

Patrick B. Kelly
------------------------------------------------------                              http://patrickbkelly.org

pgsql-hackers by date:

From: Andrew Dunstan
Date: 12 November 2004, 09:16:06
Subject: Re: multiline CSV fields

From: "Oleg I.Ivanov"
Date: 12 November 2004, 17:42:10
Subject: Database reverse engineering

Re: multiline CSV fields - Mailing list pgsql-hackers

Previous

Next