Re: String encoding during connection "handshake" - Mailing list pgsql-hackers

From sulfinu@gmail.com
Subject Re: String encoding during connection "handshake"
Date
Msg-id 200711281754.05364.sulfinu@gmail.com
Whole thread Raw
In response to Re: String encoding during connection "handshake"  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: String encoding during connection "handshake"  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: String encoding during connection "handshake"  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Martijn,

:) don't take it personal, I am just trying to obtain confirmation that I 
understood well the problem. Afterall, it's just that C has a very outdated 
notion of "char"s (and no notion of Unicode). I was naively under the 
impression that "char"s have evolved in nowadays C.

Regarding the problem of "One True Encoding", the answer seems obvious to me: 
use only one encoding per database cluster, either UTF-8 or UTF-16 or another 
Unicode-aware scheme, whichever yields a statistically smaller database for 
the languages employed by the users in their data. This encoding should be a 
one time choice! De facto, this is already happening now, because one cannot 
change collation rules after a cluster has been created.

During the handshake, all clients should be assumed to serve data in the 
cluster's encoding. 

Have a nice day, too.

On Wednesday 28 November 2007, Martijn van Oosterhout wrote:
> On Wed, Nov 28, 2007 at 11:39:33AM +0200, sulfinu@gmail.com wrote:
> > During the authentication phase, no such conversion takes place - you
> > were right and I couldn't believe it! In the case when your database
> > name, your user name or password contain non-ASCII characters, you're out
> > of luck if the stored values were submitted in another encoding by the
> > administrator.
>
> The problem is, what conversion. You don't know the encoding of the
> server yet (because you havn't selected a DB) and you don't know the
> encoding to the client. The only real possibility is to declare One
> True Encoding and decree every username/password be in that. But you're
> never going to get people to agree on that.
>
> > I assume that no names conversion takes place between client and cluster
> > metadata when a role is created (CREATE ROLE... PASSWORD...) or when a
> > database is created (CREATE DATABASE...). Or does it? In that case, the
> > names are encoded in the encoding of the database that the administrator
> > was connected to.
>
> Honestly, UNIX usernames/passwords have always worked like this so
> we're not really doing anything wierd by doing it this way. Users need
> to type the password in the same encoding it was added. It not usually
> a big deal because people set their own passwords...
>
> Have a nice day,




pgsql-hackers by date:

Previous
From: Louis-David Mitterrand
Date:
Subject: Re: 8.3beta3 ERROR: cached plan must not change result type
Next
From: Rudolf van der Leeden
Date:
Subject: Re: PG 8.3beta3 Segmentation Fault during Database Restore