Re: Support UTF-8 files with BOM in COPY FROM - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Support UTF-8 files with BOM in COPY FROM
Date
Msg-id 20110927.000909.594224957113812106.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Support UTF-8 files with BOM in COPY FROM  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Support UTF-8 files with BOM in COPY FROM
Re: Support UTF-8 files with BOM in COPY FROM
List pgsql-hackers
> "David E. Wheeler" <david@kineticode.com> <CAJW2+qdYg1+xLaHDqnJs3AcKmCSVCDkv_LCAPWUtwmxL9dzVhQ@mail.gmail.com>
writes:
>> On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:
>>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.
> 
>> I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."
> 
> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
> that Microsloth does it does not make it standards-conformant.
> 
> I think that accepting it on input can be sensible, on the principle of
> "be liberal in what you accept", but the other side of that is "be
> conservative in what you send".  No BOMs in output, please.

Suppose a user uses brain-dead editor, which does not accept UTF-8
without BOM.  He decides to save his editor data into PostgreSQL using
COPY FROM. He extracts the data using COPY TO. Now he finds that his
stupid editor does not accept his data any more.

So I think if we decide to accept UTF-8 with BOM, we should keep BOM
when importing the data and output the data with BOM. If we don't want
to output UTF-8 with BOM, we should not accept UTF-8 with BOM. It
seems we don't have much choice...
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: contrib/sepgsql regression tests are a no-go
Next
From: Kohei KaiGai
Date:
Subject: Re: contrib/sepgsql regression tests are a no-go