Re: utf8 COPY DELIMITER? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: utf8 COPY DELIMITER?
Date
Msg-id 4129.1176834498@sss.pgh.pa.us
Whole thread Raw
In response to Re: utf8 COPY DELIMITER?  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: utf8 COPY DELIMITER?  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> Mark Dilger wrote:
>> I'm working on fixing bugs relating to multibyte character encodings.  
>> I  wasn't sure whether this was a bug or not.  I don't think we should 
>> use the phrasing "COPY delimiter must be a single character" when, in 
>> utf8 land, I did in fact use a single character.  We might say "a 
>> single byte", or we might extend the functionality to handle multibyte 
>> characters.

> Doing the latter would be a feature, and so is of course right off the 
> table for this release. Changing the error messages to be clearer should 
> be fine.

+1 on changing the message: "character" is clearly less correct than "byte"
here.

I doubt that supporting a single multibyte character would be an
interesting extension --- if we wanted to do anything at all there, we'd
just generalize the delimiter to be an arbitrary string.  But it would
certainly slow down COPY by some amount, which is an area where you'll
get push-back for performance losses, so you'd need to make a convincing
use-case for it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: utf8 COPY DELIMITER?
Next
From: Tom Lane
Date:
Subject: Re: utf8 COPY DELIMITER?