Re: Support UTF-8 files with BOM in COPY FROM - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Support UTF-8 files with BOM in COPY FROM
Date
Msg-id CA+Tgmoa7SzcuViKfdbmWWeRmzZnjo93AmbhiOHaO9E=330PFow@mail.gmail.com
Whole thread Raw
In response to Re: Support UTF-8 files with BOM in COPY FROM  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Support UTF-8 files with BOM in COPY FROM
List pgsql-hackers
On Mon, Sep 26, 2011 at 11:09 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> "David E. Wheeler" <david@kineticode.com> <CAJW2+qdYg1+xLaHDqnJs3AcKmCSVCDkv_LCAPWUtwmxL9dzVhQ@mail.gmail.com>
writes:
>>> On Sep 25, 2011, at 9:58 PM, Itagaki Takahiro wrote:
>>>> I'm thinking about only COPY FROM for reads, but if someone wants to add
>>>> BOM in COPY TO, we might also support COPY TO WITH BOM for writes.
>>
>>> I think it would have to be optional, since "some recipients of UTF-8 encoded data do not expect a BOM."
>>
>> Putting a BOM into UTF8 data is flat out invalid per spec --- the fact
>> that Microsloth does it does not make it standards-conformant.
>>
>> I think that accepting it on input can be sensible, on the principle of
>> "be liberal in what you accept", but the other side of that is "be
>> conservative in what you send".  No BOMs in output, please.
>
> Suppose a user uses brain-dead editor, which does not accept UTF-8
> without BOM.  He decides to save his editor data into PostgreSQL using
> COPY FROM. He extracts the data using COPY TO. Now he finds that his
> stupid editor does not accept his data any more.
>
> So I think if we decide to accept UTF-8 with BOM, we should keep BOM
> when importing the data and output the data with BOM. If we don't want
> to output UTF-8 with BOM, we should not accept UTF-8 with BOM. It
> seems we don't have much choice...

Maybe this needs to be an optional behavior, controlled by some COPY option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: random isolation test failures
Next
From: Robert Haas
Date:
Subject: Re: contrib/sepgsql regression tests are a no-go