Home > mailing lists

Re: Support UTF-8 files with BOM in COPY FROM - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: Support UTF-8 files with BOM in COPY FROM
Date	September 26, 2011 15:10:05
Msg-id	20110927.000909.594224957113812106.t-ishii@sraoss.co.jp Whole thread Raw
In response to	Re: Support UTF-8 files with BOM in COPY FROM (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Support UTF-8 files with BOM in COPY FROM Re: Support UTF-8 files with BOM in COPY FROM
List	pgsql-hackers

Tree view

> "David E. Wheeler" <david@kineticode.com> <CAJW2+qdYg1+xLaHDqnJs3AcKmCSVCDkv_LCAPWUtwmxL9dzVhQ@mail.gmail.com>
writes:
>> On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:
>>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.
> 
>> I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."
> 
> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
> that Microsloth does it does not make it standards-conformant.
> 
> I think that accepting it on input can be sensible, on the principle of
> "be liberal in what you accept", but the other side of that is "be
> conservative in what you send".  No BOMs in output, please.

Suppose a user uses brain-dead editor, which does not accept UTF-8
without BOM.  He decides to save his editor data into PostgreSQL using
COPY FROM. He extracts the data using COPY TO. Now he finds that his
stupid editor does not accept his data any more.

So I think if we decide to accept UTF-8 with BOM, we should keep BOM
when importing the data and output the data with BOM. If we don't want
to output UTF-8 with BOM, we should not accept UTF-8 with BOM. It
seems we don't have much choice...
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

pgsql-hackers by date:

From: Tom Lane
Date: 26 September 2011, 15:04:01
Subject: Re: contrib/sepgsql regression tests are a no-go

From: Kohei KaiGai
Date: 26 September 2011, 15:30:03
Subject: Re: contrib/sepgsql regression tests are a no-go

Re: Support UTF-8 files with BOM in COPY FROM - Mailing list pgsql-hackers

Previous

Next