Re: Using psql -f to load a UTF8 file - Mailing list pgsql-general

From Roger Leigh
Subject Re: Using psql -f to load a UTF8 file
Date
Msg-id 20120921094044.GA18133@codelibre.net
Whole thread Raw
In response to Re: Using psql -f to load a UTF8 file  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-general
On Fri, Sep 21, 2012 at 09:21:36AM +0800, Craig Ringer wrote:
> On 09/20/2012 11:44 PM, Leif Biberg Kristensen wrote:
> >  Torsdag 20. september 2012 16.56.16 skrev Alan Millington :
> >>psql". But how am I supposed to remove the byte order mark from a UTF8
> >>file? I thought that the whole point of the byte order mark was to tell
> >>programs what the file encoding is. Other programs, such as Python, rely
> >>on this.
> >
> >http://en.wikipedia.org/wiki/Byte_order_mark
> >
> >While the Byte Order Mark is important for UTF-16, it's totally irrelevant to
> >the UTF-8 encoding.
>
> I strongly disagree. The BOM provides a useful and standard way to
> differentiate UTF-8 encoded text files from the random pile of
> encodings that any given file could be.

Use of the BOM in UTF-8 causes a host of display and interoperability
problems, and is considered by many to be a broken practice.  It's
also pointless since there are no byte ordering issues with UTF-8.
Best to not use it at all.  In any case, the BOM byte sequence does
not unambiguously identify UTF-8; it's equally valid for 8-bit
charsets, so an external means of specifying the encoding is
preferable and more robust.


Regards,
Roger

--
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux    http://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-    GPG Public Key      F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800


pgsql-general by date:

Previous
From: "Carrington, Matthew (Produban)"
Date:
Subject: Re: pg_upgrade: out of memory
Next
From: "Albe Laurenz"
Date:
Subject: Re: Why csvlog logs contexts without leading tab?