Thread: client encodings

client encodings

From
Dennis Björklund
Date:
I've fixed the problems in psql that was there before:
* psql alters the strings in a PQresult* psql sends non validating strings to server

This is however not the solution to the general problem with client 
encodings. When you normally run psql in a terminal, the encoding used by 
that terminal is the only reasonable encoding one can use. However, if you 
redirect the output you very well might want to produce a utf-8 file even 
if the terminal does not suppert it. So it could be usable to change the 
client encoding in psql.

However, if you want to produce a utf-8 file, how should that work with 
respect to gettext()? If the message catalog is in latin1 then we need to 
know that and convert that into utf-8.

The easiest way as I see it is to demand that all po files are stored in
utf-8 and then you can convert that into whatever client encoding you have
set in psql. Of course you can't make that translation lossfree in
general, but if you have a language that demands some characters that
don't exist in the target charset you have lost anyway. The best you can
do is to convert it to something similar (or even just through it away).

To store all po files as utf-8 is not a big problem. The translator can
very well still work using some other charset and then you use iconv to
convert it before checking it in. As long as you don't change that file
(and use other characters) the translator can later on use iconv again to
get it back to his charset. The good thing about this is that psql knows
what charset all strings are in and can convert when needed.

Would it be acceptable to have all po-files as utf-8?

-- 
/Dennis



Re: client encodings

From
Peter Eisentraut
Date:
Dennis Björklund writes:

> However, if you want to produce a utf-8 file, how should that work with
> respect to gettext()? If the message catalog is in latin1 then we need to
> know that and convert that into utf-8.

I don't think all gettext implementations support automatic character set
conversion.  We might have to roll our own sometime, but for now it's not
an option.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: client encodings

From
Dennis Björklund
Date:
On Mon, 16 Jun 2003, Peter Eisentraut wrote:

> > However, if you want to produce a utf-8 file, how should that work with
> > respect to gettext()? If the message catalog is in latin1 then we need to
> > know that and convert that into utf-8.
> 
> I don't think all gettext implementations support automatic character set
> conversion.

I agree. They don't.

>  We might have to roll our own sometime

That was why I asked if we could simply have all message catalogs as 
utf-8, then we know what charset the strings are in and can easily convert 
it to whatever we have set our client encoding to.

> but for now it's not an option.
What has to be decided is if we are going to generate output that is only 
in the client encoding or not. If you just output the strings in the 
message catalog then we will not produce validating output. Then the best 
thing we can do is simply to take the message catalog string and discard 
everything that does not work in the current client encoding.

-- 
/Dennis