Re: psql blows up on BOM character sequence - Mailing list pgsql-hackers

From Tom Lane
Subject Re: psql blows up on BOM character sequence
Date
Msg-id 24831.1395702319@sss.pgh.pa.us
Whole thread Raw
In response to Re: psql blows up on BOM character sequence  (Jim Nasby <jim@nasby.net>)
Responses Re: psql blows up on BOM character sequence  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Jim Nasby <jim@nasby.net> writes:
> Wait... I thought that was one of the objections... that we wanted to
> leave a BOM in something like a COPY untouched?

I think most of us are okay with stripping a BOM that appears at the
*beginning* of a text file (assuming there's reason to believe the file
is in UTF8 encoding).  BOM sequences embedded later in the file are a lot
more debatable, and I for one don't want to assume those can be dropped.
I don't know of any legitimate usage of such cases, and think it's
probably better to report an encoding error.

> Uh... could we just treat BOM as another whitespace character?

A BOM is *most certainly not* whitespace.  The only even semi-legitimate
usage it has in UTF8 is as a file encoding marker.  You can bet that the
user whose text editor made the file did not think he had whitespace at
the front.  Anyway, your proposition that leading whitespace is ignorable
fails completely for data files.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: psql blows up on BOM character sequence
Next
From: Robert Haas
Date:
Subject: Re: Only first XLogRecData is visible to rm_desc with WAL_DEBUG