Re: psql blows up on BOM character sequence - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: psql blows up on BOM character sequence
Date
Msg-id CAHyXU0wd8WAgexhKShb9jBMv5aN1avhEp=18EgpPYB7xGQkphA@mail.gmail.com
Whole thread Raw
In response to Re: psql blows up on BOM character sequence  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: psql blows up on BOM character sequence  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Mon, Mar 24, 2014 at 2:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I suspect suspect trying to do this in the parser will be quite messy.
>> This needs to happen before the input is converted to the server
>> encoding, I think.
>
> Indeed --- what if the server isn't using utf8 internally?
>
> And a larger point is that the server has no idea where the file
> boundaries are.  If we were to do this server-side, we'd essentially
> end up discarding BOM anywhere, which is more libertine than I care
> to be.

Right -- I had a feeling you'd say that.  That's why the best solution
ISTM is to allow psql to be invoked in such a way that it *does* know
the file boundaries for consolidated scripts; this means better
handling of multiple file arguments.  psql -1 already requires '-f' to
work (vs cat foo.sql | psql) and that's pretty reasonable.  BOM
handling fixes should probably be injected in cases where the precise
beginning points of the file are known, which AFAICT are \i and -f.
So, in short, it seems prudent to:

1. make multiple -f invocation work (with -1 spanning)
2. strip BOM from -f or \i foo.sql if it's there

That will fix all non redirection usages.  Cases involving redirection
are not psql's bailiwick.

merlin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: psql blows up on BOM character sequence
Next
From: Merlin Moncure
Date:
Subject: Re: psql blows up on BOM character sequence