Re: Client Messages - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Client Messages
Date
Msg-id 4F21A272.3000703@enterprisedb.com
Whole thread Raw
In response to Re: Client Messages  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Client Messages  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
On 26.01.2012 17:31, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> The thing is, there's currently no encoding conversion happening, so if
>> you have one database in LATIN1 encoding and another in UTF-8, for
>> example, whatever you put in your postgresql.conf is going to be wrong
>> for one database. I'm happy to just document the issue for per-database
>> messages, "ALTER DATABASE ... SET welcome_message", the encoding used
>> there need to match the encoding of the database, or it's displayed as
>> garbage. But what about per-user messages, when the user has access to
>> several databases, or postgresql.conf?
>
> I've not looked at the patch, but what exactly will happen if the string
> has the wrong encoding?

You get an incorrectly encoded string, ie. garbage, in your console, 
when you log in with psql.

You can also use current_setting() to copy the incorrectly-encoded 
string elsewhere in the system. If you insert it into a table and run 
pg_dump, I think the dump might not be restorable. That's a bit of a 
stretch, perhaps, but it would be nice to avoid that.

BTW, you can already do that if you set e.g default_text_search_config 
to something non-ASCII in postgresql.conf. Or if you do it with 
search_path, you get a warning at login. For example, I did "ALTER USER 
foouser set search_path ='kääk';" in a LATIN1 database, and then 
connected to a UTF-8 database and got:

$ ~/pgsql.master/bin/psql postgres foouser
WARNING:  invalid value for parameter "search_path": ""k��k""
DETAIL:  schema "k��k" does not exist
psql (9.2devel)
Type "help" for help.

(in case that didn't get across right, I set the search_path to a string 
containing two a-with-umlauts, and in the warning, they got replaced 
with question marks with inverse colors, which is apparently a character 
that the console uses to display bytes that are not valid UTF-8).

The problem with welcome_message would look just like that. No-one is 
likely to run into that with search_path, but it's quite reasonable and 
expected to use your native language in a welcome message.

> The idea that occurs to me is to have the code that uses the GUC do a
> verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
> (maybe with a LOG message).  This would have to be documented of course,
> but it seems better than the potential consequences of trying to send a
> wrongly-encoded string.

Hmm, fine with me. It would be nice to plug the hole that these bogus 
characters can leak elsewhere into the system through current_setting, 
though. Perhaps we could put the verify_mbstr() call somewhere in guc.c, 
to forbid incorrectly encoded characters from being stored in the guc 
variable in the first place.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Vik Reykja
Date:
Subject: Re: Different error messages executing CREATE TABLE or ALTER TABLE to create a column "xmin"
Next
From: Robert Haas
Date:
Subject: Re: Different error messages executing CREATE TABLE or ALTER TABLE to create a column "xmin"