Thread: Translations in the distributions

Translations in the distributions

From
Dennis Björklund
Date:
The default installation in fedora does not work very well for non 
english people. For example. if I run psql and type COMMIT i get:

dennis=# commit;
WARNING:  COMMIT: ingen transaktion p g

while it should say

dennis=# commit;
WARNING:  COMMIT: ingen transaktion pågår

And those spaces in the first version are no spaces at all but some 
strange characters.

However, I have the cvs version compiled and installed, and it seems to
work just fine. Is this because pg has been fixed lately (I don't remember
any such discussions) or something with the packaging, or something else.  

What I want is that future fedora/redhat versions work out of the box.
Most people use distributions and it's no fun to translate postgresql if
people are annoyed with the result :-)

-- 
/Dennis Björklund



Re: Translations in the distributions

From
Tom Lane
Date:
Dennis Björklund <db@zigo.dhs.org> writes:
> The default installation in fedora does not work very well for non 
> english people.

I seem to recall some discussion to the effect that the message catalog
files have to be in the same encoding the database is using, because
there's no provision in the backend for converting them on-the-fly.
Peter E. would be the person to ask though.
        regards, tom lane


Re: Translations in the distributions

From
Dennis Björklund
Date:
On Fri, 9 Jan 2004, Tom Lane wrote:

> I seem to recall some discussion to the effect that the message catalog
> files have to be in the same encoding the database is using, because
> there's no provision in the backend for converting them on-the-fly.

Still, my cvs tree seems to work. The catalogues are still in latin1 and 
fedora still uses utf-8. So something seems to have made it work (probably 
Peter).

I know we have had some discussions in the past but I've never really got
the whole picture of the problem. In any way, now that distributions
starts to change to utf-8, it puts greater demands on us since one
encoding might not work as good anymore (it never really worked, but that
is another issue).

Maybe it all just works now and when redhat/fedora starts to use 7.4 all
will be fine. All I want it to make sure that it works. If it's not
working, it's something that I might spend some time on trying to fix.

-- 
/Dennis Björklund



Re: Translations in the distributions

From
Peter Eisentraut
Date:
Am Freitag, 9. Januar 2004 08:08 schrieb Dennis Björklund:
> The default installation in fedora does not work very well for non
> english people. For example. if I run psql and type COMMIT i get:
>
> dennis=# commit;
> WARNING:  COMMIT: ingen transaktion p g
>
> while it should say
>
> dennis=# commit;
> WARNING:  COMMIT: ingen transaktion pågår

Remember that gettext will automatically recode the strings depending on what 
it thinks is the display character set, determined via LC_CTYPE (of course, a 
useless concept for server software).  After that, PostgreSQL's own client/
server recoding will happen.  So somewhere along the line there something 
might get lost.  Either the RPM package uses a different locale, or it has 
bugs in gettext or iconv.



Re: Translations in the distributions

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Freitag, 9. Januar 2004 15:51 schrieb Tom Lane:
>> Hmm.  So the problem would appear if LC_CTYPE is different from the
>> database encoding?  Could we fix it by forcing LC_CTYPE to the database
>> encoding during startup?

> That would resolve quite a few problems, but I don't think there's a way to 
> know what encoding a given LC_CTYPE value will result in.

Hmm.  Actually it looks like we already do what I had in mind:

ReadControlFile():if (setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)    ereport(FATAL, ...

So the problem really occurs when database_encoding is set to an
encoding that is incompatible with the one implied by the initdb-time
LC_CTYPE ... which we have no good way to check.  Ugh.

I have some vague recollection that glibc offers an API extension that
allows this to be checked.  Is it worth having a solution that catches
the problem on glibc only?
        regards, tom lane


Re: Translations in the distributions

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Remember that gettext will automatically recode the strings depending
> on what it thinks is the display character set, determined via
> LC_CTYPE (of course, a useless concept for server software).

Hmm.  So the problem would appear if LC_CTYPE is different from the
database encoding?  Could we fix it by forcing LC_CTYPE to the database
encoding during startup?
        regards, tom lane


Re: Translations in the distributions

From
Dennis Björklund
Date:
On Fri, 9 Jan 2004, Tom Lane wrote:

> > on what it thinks is the display character set, determined via
> > LC_CTYPE (of course, a useless concept for server software).
> 
> Hmm.  So the problem would appear if LC_CTYPE is different from the
> database encoding?  Could we fix it by forcing LC_CTYPE to the database
> encoding during startup?

What does database encoding has to do with error messages and the display 
character set?

-- 
/Dennis Björklund



Re: Translations in the distributions

From
Peter Eisentraut
Date:
Am Freitag, 9. Januar 2004 15:51 schrieb Tom Lane:
> Hmm.  So the problem would appear if LC_CTYPE is different from the
> database encoding?  Could we fix it by forcing LC_CTYPE to the database
> encoding during startup?

That would resolve quite a few problems, but I don't think there's a way to 
know what encoding a given LC_CTYPE value will result in.



Re: Translations in the distributions

From
Peter Eisentraut
Date:
Am Freitag, 9. Januar 2004 16:28 schrieb Dennis Björklund:
> What does database encoding has to do with error messages and the display
> character set?

When they are sent over the wire, the messages are converted from server 
(=database) encoding to client encoding.



Re: Translations in the distributions

From
Peter Eisentraut
Date:
Tom Lane wrote:
> So the problem really occurs when database_encoding is set to an
> encoding that is incompatible with the one implied by the initdb-time
> LC_CTYPE ... which we have no good way to check.  Ugh.
>
> I have some vague recollection that glibc offers an API extension
> that allows this to be checked.  Is it worth having a solution that
> catches the problem on glibc only?

The problem is more likely to be that it will be hard to match up the 
different encoding names.  For example, if you set LC_CTYPE=C, then the 
system encoding is report as

$ locale charmap
ANSI_X3.4-1968

whereas the closest thing in PostgreSQL would be SQL_ASCII.

It might already help if we allowed LC_CTYPE to be attached to a 
database rather than the entire cluster, and make the user match them 
up manually.  The only drawback would be that indexes on global tables 
involving upper() or lower() would no longer work reliably.



Re: Translations in the distributions

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> It might already help if we allowed LC_CTYPE to be attached to a 
> database rather than the entire cluster, and make the user match them 
> up manually.  The only drawback would be that indexes on global tables 
> involving upper() or lower() would no longer work reliably.

Make that "indexes on global tables involving any text wouldn't work".
Everyone has to have the same notion of the sort order, or the index is
corrupt from someone's point of view, and soon from everyone's point of
view.  upper/lower isn't needed to cause a problem.

However ... we do not have any global tables with indexed text columns.
Only name columns, and name comparisons are presently not locale-aware
(they're just strncmp()).  I think it wouldn't be unreasonable to
legislate that this remain true forevermore, and then it would be safe
to allow different DBs to run in different locales.  That would be a big
step forward, for sure.

[ thinks more... ]  Actually it's a bigger restriction than that.
Imagine that you create some tables with text data in template1, and
then index them.  The indexes would be corrupt if you cloned template1
and assigned the result a different locale.  So to make this work, we'd
actually need the following restrictions:

* No system table can ever have an index on a text/varchar/char column; only name columns, and name has to remain
locale-unaware.

* You can't assign a new locale to a cloned database if the source has any text/varchar/char indexes.

The simplest implementation restriction I can think of to guarantee
point 2 is to allow changing the locale only when cloning template0,
not when cloning anything else.  Or we could just warn people that
they'd better reindex after changing the locale.

It does seem like this might be a reasonable path to take.  Thoughts?
        regards, tom lane