Re: pgadmin3 clientencoding - Mailing list pgadmin-hackers

From Jean-Michel POURE
Subject Re: pgadmin3 clientencoding
Date
Msg-id 200306101449.07803.jm.poure@freesurf.fr
Whole thread Raw
In response to Re: pgadmin3 clientencoding  (Andreas Pflug <Andreas.Pflug@web.de>)
Responses Re: pgadmin3 clientencoding
List pgadmin-hackers
Dear Andreas,

I don't know if I understand you well. If I don't, please disgard my message.
Here is my point of view:

> The longer I think about this, the more the current implementation
> appears wrong to me. The decisive factor is not a user's wish, but the
> ability of our charset conversion ability, and that's pretty clear:
> wxString can convert unicode to ascii and back, nothing else. Since
> unicode will be the recommended setup for non-ascii databases, the
> client encoding should be unicode for all connections. This should
> enable correct schema and property display. Allowing the connection to
> be something different would mean wxString needs to know how to convert
> from xxx to unicode, i.e. implementing a client side conversion, which
> doesn't make sense. This means: client encoding=SQL_ASCII for
> non-unicode, and UNICODE for unicode compiled pgAdmin3.

I am absolutely sure that we cannot rely on recommandations, such as "create
UNICODE database for multi-byte data and SQL_ASCII otherwise".

PostgreSQL central feature is the ability to store and manage various
encodings. For example, in Japan, many databases are stored under EUC_JP and
SJIS. You wron't ask users to migrate their database to UTF-8.

Therefore, pgAdmin3 shall manage encodings transparently. This is a ***key
feature***. Don't get me wrong, I propose to:

1) Always compile pgAdmin3 with Unicode support. (By the way, I would also be
delighted if all .po files were stored in UTF-8).

2) Always "set client_encoding=Unicode" in order to recode data streams at
backend level. This is 100% safe in case of data viewing. From my point of
view, I never had any problem with this feature, which is bug free.

PostgreSQL is the only database in the world with such on-the-fly conversion
at data stream level. So why not use it.

3) We only need to check whether the data entered in the grid can be (a)
converted from UTF-8 into the database encoding and (b) back from the
database encoding into Unicode.

Iconv (http://www.gnu.org/software/libiconv) or recode
(http://www.iro.umontreal.ca/contrib/recode/HTML/readme.html) libraries can
be used for that. In case of license incompatibilities, we can always use
binary executables of iconv and recode. Iconv and recode are installed by
default in all GNU/Linux distributions.

Alternatively, we could borrow PostgreSQL backend validation code. I know this
code exists because in some cases PostgreSQL refused to enter euro signs in a
Latin1 database and returned an error.

There is no other way. The only other way would be to add native multi-byte
support (SJSS, etc...) to wxWindows widgets, which is impossible. So, the
only remaining solution is to view all data in UTF-8 Unicode.

> The remaining problem is that of text entered by the user. This
> separates into two categories:
> 1) freetext entry from frmQuery. The user is responsible to use correct
> settings and input representation
> 2) guided entry, here we hopefully know what may be entered, and check
> ourselves for legal characters.

I doubt a user should know that the Euro sign (€) does not belong to Latin1.
There are hundreds of examples like that. Therefore, it is impossible to
create a list of legal/forbidden characters.

The only way to test for correct entry is:
- to convert the entry like explained in 3)
or,
- use PostgreSQL backend code.

Maybe we should ask for information on the hackers list. What do you think?

Cheers,
Jean-Michel




pgadmin-hackers by date:

Previous
From: "Adam H. Pendleton"
Date:
Subject: Re: Linking error (same old story)
Next
From: Jean-Michel POURE
Date:
Subject: Re: Linking error (same old story)