Re: multiline CSV fields - Mailing list pgsql-hackers

From Patrick B Kelly
Subject Re: multiline CSV fields
Date
Msg-id D97EBB68-345B-11D9-B14C-000A958A3956@patrickbkelly.org
Whole thread Raw
In response to Re: multiline CSV fields  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: multiline CSV fields
List pgsql-hackers
On Nov 11, 2004, at 10:07 PM, Andrew Dunstan wrote:

>
>
> Patrick B Kelly wrote:
>
>>
>>
>>
>> My suggestion is to simply have CopyReadLine recognize these two 
>> states (in-field and out-of-field) and execute the current logic only 
>> while in the second state. It would not be too hard but as you 
>> mentioned it is non-trivial.
>>
>>
>>
>
> We don't know what state we expect the end of line to be in until 
> after we have actually read the line. To know how to treat the end of 
> line on your scheme we would have to parse as we go rather than after 
> reading the line as now. Changing this would be not only be 
> non-trivial but significantly invasive to the code.
>
>

Perhaps I am misunderstanding the code. As I read it the code currently 
goes through the input character by character looking for NL and EOF 
characters. It appears to be very well structured for what I am 
proposing. The section in question is a small and clearly defined loop 
which reads the input one character at a time and decides when it has 
reached the end of the line or file. Each call of CopyReadLine attempts 
to get one more line. I would propose that each time it starts out in 
the out-of-field state and the state is toggled by each un-escaped 
quote that it encounters in the stream. When in the in-field state, it 
would only look for the next un-escaped quote and while in the 
out-of-field state, it would execute the existing logic as well as 
looking for the next un-escaped quote.

I may not be explaining myself well or I may fundamentally 
misunderstand how copy works. I would be happy to code the change and 
send it to you for review, if you would be interested in looking it 
over and it is felt to be a worthwhile capability.



Patrick B. Kelly
------------------------------------------------------                              http://patrickbkelly.org



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: multiline CSV fields
Next
From: Bruce Momjian
Date:
Subject: Re: GUC custom variables broken