Howdy,
Can anyone explain to me when psql tries to convert between encodings?
It seems to disregard encodings set with SET CLIENT_ENCODING.
The following reproduces the behaviour I'm seeing:
1. create an UNICODE database
2. run the following: set client_encoding to latin1; create table bla(a text); insert into bla
values('meëep');
3. try the following from psql: Welcome to psql 7.3.4, the PostgreSQL interactive terminal. Type: \copyright
fordistribution terms \h for help with SQL commands \? for help on internal slash commands
\g or terminate with semicolon to execute query \q to quit mathijs=# select * from bla; a
------- meëep (1 row) mathijs=# set client_encoding = latin1; SET mathijs=# select * from bla;
a ------ meep (1 row) mathijs=# \encoding latin1 mathijs=# select * from bla; a -------
meëep (1 row)
After setting CLIENT_ENCODING, the middle character gets dropped. To me
it seems like psql is considering the data it gets from the server as
UTF8, tries to interpret it as UTF8, sees the ë (which is indeed an
invalid UTF8 character) and drops it.
My question is: why does psql seem to think it's receiving UTF8 data
-after- I've changed the client_encoding. I've checked with a network
sniffer that results returned with or without using \encoding (as
expected) are the same. Is this behaviour a bug? If not, it does not
seem very obvious to me; I would expect psql to keep track of the
encoding set between the server and the client.
Cheers,
Mathijs