Re: Using psql -f to load a UTF8 file - Mailing list pgsql-general

From Chris Angelico
Subject Re: Using psql -f to load a UTF8 file
Date
Msg-id CAPTjJmrx3Njx30=F9indfZZ5_8v5xfWsWZqD2aLiLLXmu78O_w@mail.gmail.com
Whole thread Raw
In response to Re: Using psql -f to load a UTF8 file  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-general
On Fri, Sep 21, 2012 at 11:21 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> I strongly disagree. The BOM provides a useful and standard way to
> differentiate UTF-8 encoded text files from the random pile of encodings
> that any given file could be.

The only reliable way to ascertain the encoding of a hunk of data is
with something out-of-band. Relying on the first three bytes being
\xEF\xBB\xBF is not much more reliable than detecting based on octet
frequency, which is what leads to the "Bush hid the facts" hack in
Notepad. This is why many Internet protocols have metadata carried
along with the file (eg Content-type in HTTP), rather than relying on
internal evidence.

> psql should accept UTF-8 with BOM.

However, this I would agree with. It's cheap enough to detect, and
aside from arbitrarily trying to kill Notepad (which won't happen
anyway), there's not a lot of reason to choke on the BOM. But it's not
a big deal.

ChrisA


pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Using psql -f to load a UTF8 file
Next
From: Benedikt Grundmann
Date:
Subject: Expression to construct a anonymous record with named columns?