Re: UTF8 with BOM support in psql - Mailing list pgsql-hackers

From Tom Lane
Subject Re: UTF8 with BOM support in psql
Date
Msg-id 29075.1258473002@sss.pgh.pa.us
Whole thread Raw
In response to Re: UTF8 with BOM support in psql  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Peter Eisentraut <peter_e@gmx.net> writes:
> I think I could support using the presence of the BOM as a fall-back
> indicator of encoding in absence of any other declaration.  It seems to
> me, however, that the description above ignores the existence of
> encodings other than SQL_ASCII and UTF8.

Yeah.  This entire proposal rests on the assumption that UTF8 is the
only encoding that really matters, and introducing a possibility of
breaking things for users of other encodings is acceptable damage.
I do not think that supporting a deprecated-by-standards behavior
is worth that.

Even assuming that we had consensus on a behavior that involved
silently changing client_encoding, I do not believe that it's practical
to implement it in an acceptable fashion.  Just issuing a SET behind the
user's back will not work in a number of scenarios:

* We are inside a transaction when \i is called, and the file contains
a ROLLBACK.

* We are inside a failed transaction when \i is called --- the SET won't
even work at all.

* Same two cases inside a savepoint.

* The file contains a \c command.

If you expect that the previous client_encoding should be restored at
the end of the \i inclusion (as I certainly would) then you have the
first three hazards at file end as well, except that now the odds of
being inside a failed transaction are significantly higher.  Also,
what if the file contained a SET CLIENT_ENCODING command itself?
How should that interact with this?

Lastly, a silent change of client_encoding would also affect the
encoding of notice and error messages that come out while the \i
file is running.  I fail to find that non-astonishing, either.

I think that the only way this sort of behavior could be implemented
without a bunch of broken corner cases would be if we put the
responsibility of encoding conversion inside psql, so that switching its
idea of the encoding was just a local change rather than something it
had to ask the backend to do, and it could be careful to apply the
encoding only to the data coming from the \i file.  Which is possible,
perhaps, but it hardly seems that slightly-more-convenient BOM handling
is worth it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Timezones (in 8.5?)
Next
From: Alex Hunsaker
Date:
Subject: Re: Writeable CTE patch