Thread: Translations in the distributions
The default installation in fedora does not work very well for non english people. For example. if I run psql and type COMMIT i get: dennis=# commit; WARNING: COMMIT: ingen transaktion p g while it should say dennis=# commit; WARNING: COMMIT: ingen transaktion pågår And those spaces in the first version are no spaces at all but some strange characters. However, I have the cvs version compiled and installed, and it seems to work just fine. Is this because pg has been fixed lately (I don't remember any such discussions) or something with the packaging, or something else. What I want is that future fedora/redhat versions work out of the box. Most people use distributions and it's no fun to translate postgresql if people are annoyed with the result :-) -- /Dennis Björklund
Dennis Björklund <db@zigo.dhs.org> writes: > The default installation in fedora does not work very well for non > english people. I seem to recall some discussion to the effect that the message catalog files have to be in the same encoding the database is using, because there's no provision in the backend for converting them on-the-fly. Peter E. would be the person to ask though. regards, tom lane
On Fri, 9 Jan 2004, Tom Lane wrote: > I seem to recall some discussion to the effect that the message catalog > files have to be in the same encoding the database is using, because > there's no provision in the backend for converting them on-the-fly. Still, my cvs tree seems to work. The catalogues are still in latin1 and fedora still uses utf-8. So something seems to have made it work (probably Peter). I know we have had some discussions in the past but I've never really got the whole picture of the problem. In any way, now that distributions starts to change to utf-8, it puts greater demands on us since one encoding might not work as good anymore (it never really worked, but that is another issue). Maybe it all just works now and when redhat/fedora starts to use 7.4 all will be fine. All I want it to make sure that it works. If it's not working, it's something that I might spend some time on trying to fix. -- /Dennis Björklund
Am Freitag, 9. Januar 2004 08:08 schrieb Dennis Björklund: > The default installation in fedora does not work very well for non > english people. For example. if I run psql and type COMMIT i get: > > dennis=# commit; > WARNING: COMMIT: ingen transaktion p g > > while it should say > > dennis=# commit; > WARNING: COMMIT: ingen transaktion pågår Remember that gettext will automatically recode the strings depending on what it thinks is the display character set, determined via LC_CTYPE (of course, a useless concept for server software). After that, PostgreSQL's own client/ server recoding will happen. So somewhere along the line there something might get lost. Either the RPM package uses a different locale, or it has bugs in gettext or iconv.
Peter Eisentraut <peter_e@gmx.net> writes: > Am Freitag, 9. Januar 2004 15:51 schrieb Tom Lane: >> Hmm. So the problem would appear if LC_CTYPE is different from the >> database encoding? Could we fix it by forcing LC_CTYPE to the database >> encoding during startup? > That would resolve quite a few problems, but I don't think there's a way to > know what encoding a given LC_CTYPE value will result in. Hmm. Actually it looks like we already do what I had in mind: ReadControlFile():if (setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL) ereport(FATAL, ... So the problem really occurs when database_encoding is set to an encoding that is incompatible with the one implied by the initdb-time LC_CTYPE ... which we have no good way to check. Ugh. I have some vague recollection that glibc offers an API extension that allows this to be checked. Is it worth having a solution that catches the problem on glibc only? regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > Remember that gettext will automatically recode the strings depending > on what it thinks is the display character set, determined via > LC_CTYPE (of course, a useless concept for server software). Hmm. So the problem would appear if LC_CTYPE is different from the database encoding? Could we fix it by forcing LC_CTYPE to the database encoding during startup? regards, tom lane
On Fri, 9 Jan 2004, Tom Lane wrote: > > on what it thinks is the display character set, determined via > > LC_CTYPE (of course, a useless concept for server software). > > Hmm. So the problem would appear if LC_CTYPE is different from the > database encoding? Could we fix it by forcing LC_CTYPE to the database > encoding during startup? What does database encoding has to do with error messages and the display character set? -- /Dennis Björklund
Am Freitag, 9. Januar 2004 15:51 schrieb Tom Lane: > Hmm. So the problem would appear if LC_CTYPE is different from the > database encoding? Could we fix it by forcing LC_CTYPE to the database > encoding during startup? That would resolve quite a few problems, but I don't think there's a way to know what encoding a given LC_CTYPE value will result in.
Am Freitag, 9. Januar 2004 16:28 schrieb Dennis Björklund: > What does database encoding has to do with error messages and the display > character set? When they are sent over the wire, the messages are converted from server (=database) encoding to client encoding.
Tom Lane wrote: > So the problem really occurs when database_encoding is set to an > encoding that is incompatible with the one implied by the initdb-time > LC_CTYPE ... which we have no good way to check. Ugh. > > I have some vague recollection that glibc offers an API extension > that allows this to be checked. Is it worth having a solution that > catches the problem on glibc only? The problem is more likely to be that it will be hard to match up the different encoding names. For example, if you set LC_CTYPE=C, then the system encoding is report as $ locale charmap ANSI_X3.4-1968 whereas the closest thing in PostgreSQL would be SQL_ASCII. It might already help if we allowed LC_CTYPE to be attached to a database rather than the entire cluster, and make the user match them up manually. The only drawback would be that indexes on global tables involving upper() or lower() would no longer work reliably.
Peter Eisentraut <peter_e@gmx.net> writes: > It might already help if we allowed LC_CTYPE to be attached to a > database rather than the entire cluster, and make the user match them > up manually. The only drawback would be that indexes on global tables > involving upper() or lower() would no longer work reliably. Make that "indexes on global tables involving any text wouldn't work". Everyone has to have the same notion of the sort order, or the index is corrupt from someone's point of view, and soon from everyone's point of view. upper/lower isn't needed to cause a problem. However ... we do not have any global tables with indexed text columns. Only name columns, and name comparisons are presently not locale-aware (they're just strncmp()). I think it wouldn't be unreasonable to legislate that this remain true forevermore, and then it would be safe to allow different DBs to run in different locales. That would be a big step forward, for sure. [ thinks more... ] Actually it's a bigger restriction than that. Imagine that you create some tables with text data in template1, and then index them. The indexes would be corrupt if you cloned template1 and assigned the result a different locale. So to make this work, we'd actually need the following restrictions: * No system table can ever have an index on a text/varchar/char column; only name columns, and name has to remain locale-unaware. * You can't assign a new locale to a cloned database if the source has any text/varchar/char indexes. The simplest implementation restriction I can think of to guarantee point 2 is to allow changing the locale only when cloning template0, not when cloning anything else. Or we could just warn people that they'd better reindex after changing the locale. It does seem like this might be a reasonable path to take. Thoughts? regards, tom lane