On Fri, Sep 21, 2012 at 09:21:36AM +0800, Craig Ringer wrote:
> On 09/20/2012 11:44 PM, Leif Biberg Kristensen wrote:
> > Torsdag 20. september 2012 16.56.16 skrev Alan Millington :
> >>psql". But how am I supposed to remove the byte order mark from a UTF8
> >>file? I thought that the whole point of the byte order mark was to tell
> >>programs what the file encoding is. Other programs, such as Python, rely
> >>on this.
> >
> >http://en.wikipedia.org/wiki/Byte_order_mark
> >
> >While the Byte Order Mark is important for UTF-16, it's totally irrelevant to
> >the UTF-8 encoding.
>
> I strongly disagree. The BOM provides a useful and standard way to
> differentiate UTF-8 encoded text files from the random pile of
> encodings that any given file could be.
Use of the BOM in UTF-8 causes a host of display and interoperability
problems, and is considered by many to be a broken practice. It's
also pointless since there are no byte ordering issues with UTF-8.
Best to not use it at all. In any case, the BOM byte sequence does
not unambiguously identify UTF-8; it's equally valid for 8-bit
charsets, so an external means of specifying the encoding is
preferable and more robust.
Regards,
Roger
--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' schroot and sbuild http://alioth.debian.org/projects/buildd-tools
`- GPG Public Key F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800