Thread: Transparent i18n?

Transparent i18n?

From

"Steve - DND"

Date:

30 June 2005, 20:16:16

I've recently been trying to implement some i18n functionality as simply as
possible into my application. I have a lot of lookup values and such in the
DB that need to be translated, and I would rather not do it in the calling
client.

A friend and I put our heads together, and came up which seemed like a
ridiculously elegant way of doing it using PGs SELECT rules, and some type
of connection "session" information. Well I got the connection session stuff
figured out thanks to Richard Huxton. Apparently though, SELECT rules can
only be used to create views and not actually modify the data returned by a
table?

I was trying to do something along the lines of a translations table, which
house all translations for a given string. My lookup tables would then
reference the specific strings they needed. I was then going to place a
SELECT rule on the lookup table that would perform a join(based off of the
lookup word, and the current session language), and return that in the field
of the lookup value.

Is my only option to use a separate view? What are some techniques the rest
of you use for storing translation info in the DB?

Thanks,
Steve

Re: Transparent i18n?

From

David Pratt

Date:

30 June 2005, 23:31:49

Hi Steve.  I have been a bit puzzling over a similar issue - not i18
for interface but for text data and trying to sort out a solution so I
will be interested to hear additional advice as well.  When I wrote to
the list a couple of weeks back (look for my posting around the 17th) I
was looking at doing something with a normalized structure but now I
don't think this is going to work that well.  It was suggested that I
look at an array.  I am looking at a multidimensional array to do this.
  I am just reading up on postgres support for arrays.

I think my table will be pretty simple;
CREATE TABLE multi_language (
    id                                              SERIAL,
    lang_code_and_text            TEXT[][]
);

So records would look like:

1, {{'en','the brown cow'},{'fr','la vache brun'}}
2, {{'en','the blue turkey'},{'fr','la dandon bleu'}}

I have another table with language codes ie en, fr, etc.  When
languages are added, I would just append to array for whole table.  The
trouble for me is more of getting the data out in postgres because
retrieving the raw array will be incompatible syntax for python and I
would have to manipulate results. Quite frankly I want for this to be
done in Postgres so I only have to retrieve query results.  If I cant
it would be a pain unless I can think of something else because the
issue is going to be the keys and values in my languages table working
with the array.

For example, if I have a serial table containing my languages and add 2
entries english and french, I would then have two elements in my array
and it wouldn't be so bad because I could use the id as a key to get
the value back out through a query.  But say I delete french (and
remove second element in entries for my table) and add spanish, now I
have an language id of 3 and two elements in my array that can't call
each other properly.  In python, arrays are called dictionaries and you
can easily grab the an element doing something like
lang_code_and_text['en'] to get the value of the en (english) key.

I was hoping you could call the multi-language text out of the array
with a text key instead of a numeric index
but it appears Postgres will only let you do it this way or get results
from slices as far as I can tell.  Maybe someone else on the list has
some advice to offer here.

ie.

SELECT language_text[1][1] AS language_code,
  language[1][2] AS text
FROM language_text;


Regards,
David

Re: Transparent i18n?

From

Greg Stark

Date:

02 July 2005, 03:20:28

David Pratt <fairwinds@eastlink.ca> writes:

> It was suggested that I look at an array.

I think that was me. I tried not to say there's only one way to do it. Only
that I chose to go this way and I think it has worked a lot better for me.
Having the text right there in the column saves a *lot* of work dealing with
the tables. Especially since many tables would have multiple localized
strings.

> I think my table will be pretty simple;
> CREATE TABLE multi_language (
>     id                                              SERIAL,
>     lang_code_and_text            TEXT[][]
> );
>
> So records would look like:
>
> 1, {{'en','the brown cow'},{'fr','la vache brun'}}
> 2, {{'en','the blue turkey'},{'fr','la dandon bleu'}}

That's a lot more complicated than my model.

Postgres doesn't have any functions for handling arrays like these as
associative arrays like you might want. And as you've discovered it's not so
easy to ship the whole array to your client where it might be easier to work
with.

I just have things like (hypothetically):

CREATE TABLE states (
  abbrev        text,
  state_name    text[],
  state_capitol text[]
)

And then in my application code data layer I mark all "internationalized
columns" and the object that handles creating the actual select automatically
includes a "[$lang_id]" after every column in that list.

The list of languages supported and the mapping of languages to array
positions is fixed. I can grow it later but I can't reorganize them. This is
fine for me since pretty much everything has exactly two languages.

--
greg

Re: Transparent i18n?

From

Karsten Hilbert

Date:

02 July 2005, 06:20:32

> SELECT language_text[1][1] AS language_code,
>  language[1][2] AS text
> FROM language_text;

They way we do that in GNUmed:

 select lookup_val, _(lookup_val) from lookup_table where ...;

If you want to know how see here:


http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup

Feel free to ask for clarification.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Transparent i18n?

From

David Pratt

Date:

02 July 2005, 17:01:00

Many thanks Karsten for some insight into how you are handling this.

Regards,
David


On Saturday, July 2, 2005, at 06:08 AM, Karsten Hilbert wrote:

>> SELECT language_text[1][1] AS language_code,
>>  language[1][2] AS text
>> FROM language_text;
>
> They way we do that in GNUmed:
>
>  select lookup_val, _(lookup_val) from lookup_table where ...;
>
> If you want to know how see here:
>
>
> http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/
> sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup
>
> Feel free to ask for clarification.
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

Re: Transparent i18n?

From

David Pratt

Date:

02 July 2005, 18:55:39

Hi Greg.  Well I'm kind of half way but I think what I am doing could
work out.

I have an iso_languages table, a  languages table for languages used
and a multi_language table
for storing values of my text fields.  I choose my language from
iso_languages. Any table that needs a
multi_language field gets one by id with referential integrity with a
multi_language table id since this is a direct
relationship.  Thanks for the idea of using array BTW. Referential
integrity  could not work with my first model.

I am taking the array text and parsing the result in python to get the
key positions.  This is possible
with a query using string_array function and getting text from any
multi_language field. Then I put
result into a dictionary (an array) and get length and add one to get
new key value that is added
when a new language is added. Using this key an array is added to
existing array to each row of multi_language
table (in lang_code_and_text) field. So the length of the main array in
multi-demensional array grows by one array
for the language for each record in multilanguage table.

I can also seek the english (en) value so that I will be able to use
english as default  text for the new language
and inserting  a new array for that language into the
lang_code_and_text array.  For example, if spanish (es)
added the new key is 3 so insert for each record so have something like
this now:

1, {{'en','the brown cow'},{'fr','la vache brun'},{'es','the brown
cow'}}
2, {{'en','the blue turkey'},{'fr','la dandon bleu'},{'es','the blue
turkey'}}

In my forms, I am using a template to display entry fields for each
language used. The english
will be default for new languages added so there is something in these
fields to start with and it should
update properly based on correct key values. In my languages table, I
am storing the current key positions for
each language used in my app. I have an i18 layer for zope  and based
on language code I will
pass language id so you see right language in interface and data both.

When updating or deleting records, I am will be making a trigger to
remove the array that represents a
translation after update. Then it has to update my language table to
provide updated key values for my
languages. I am working on my first functions and triggers in plpgsql.
This is where I may need help from the
list if I get stuck but so far so good!

Well so far so go but not finished yet. Does anyone have any comments
on scalability.  I don't really see
a problem since there really is not any risk of my needing any more
than 10 - 15  languages or so max out of maybe
300 languages in the world.  I think 15 entries in an array is very
small so can't see any reason for this not to
work well.

>
>> I think my table will be pretty simple;
>> CREATE TABLE multi_language (
>>     id                                              SERIAL,
>>     lang_code_and_text            TEXT[][]
>> );
>>
>> So records would look like:
>>
>> 1, {{'en','the brown cow'},{'fr','la vache brun'}}
>> 2, {{'en','the blue turkey'},{'fr','la dandon bleu'}}
>
> That's a lot more complicated than my model.
>
> Postgres doesn't have any functions for handling arrays like these as
> associative arrays like you might want. And as you've discovered it's
> not so
> easy to ship the whole array to your client where it might be easier
> to work
> with.
>

Yes. This is a bit complicating since if they were there it would be
really
nice to work with arrays.

>
> I just have things like (hypothetically):
>
> CREATE TABLE states (
>   abbrev        text,
>   state_name    text[],
>   state_capitol text[]
> )
>
> And then in my application code data layer I mark all
> "internationalized
> columns" and the object that handles creating the actual select
> automatically
> includes a "[$lang_id]" after every column in that list.
>
> The list of languages supported and the mapping of languages to array
> positions is fixed. I can grow it later but I can't reorganize them.
> This is
> fine for me since pretty much everything has exactly two languages.

That is pretty cool.  The only advantage in what I am doing will have
is that you
will be able to add languages at any time and there will be no huge
load on postgres
as far as I can tell since multilanguage table is a table is only two
fields and one record for each
multi-language field referenced  from my other other tables and calls
to it are
direct by id.  I think this should work but it is a puzzler for sure!

Regards,
David

Re: Transparent i18n?

From

Karsten Hilbert

Date:

04 July 2005, 07:57:47

On Sat, Jul 02, 2005 at 05:00:50PM -0300, David Pratt wrote:

>>
http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup
> Many thanks Karsten for some insight into how you are handling this.
David,

if you go to the Developers Corner in our Wiki at

 http://salaam.homeunix.com/twiki/bin/view/Gnumed/WebHome

you'll find an explanation of how we use this. Feel free to
ask for comments if that doesn't suffice.

(I am offline so can't give the precise URL.)

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Transparent i18n?

From

David Pratt

Date:

04 July 2005, 10:29:51

Many thanks, Karsten.  I am going to look at your example closely.

Regards
David

On Sunday, July 3, 2005, at 09:50 AM, Karsten Hilbert wrote:

> On Sat, Jul 02, 2005 at 05:00:50PM -0300, David Pratt wrote:
>
>>> http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/
>>> sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup
>> Many thanks Karsten for some insight into how you are handling this.
> David,
>
> if you go to the Developers Corner in our Wiki at
>
>  http://salaam.homeunix.com/twiki/bin/view/Gnumed/WebHome
>
> you'll find an explanation of how we use this. Feel free to
> ask for comments if that doesn't suffice.
>
> (I am offline so can't give the precise URL.)
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: Transparent i18n?

From

Greg Stark

Date:

04 July 2005, 12:07:07

I wonder if you could make an SQL type that used text[] as its storage format
but had an output function that displayed the correct text for the "current
locale". Where "current locale" could be something you set by calling a
function at the beginning of the transaction.

Do pg_dump and all the important things use the send/receive functions not the
input/output functions? so even though this output function loses information
it wouldn't cause serious problems?

You would still need a way to retrieve all the languages for the cases like
administrative interfaces for updating the information. I'm not entirely
convinced this would be any better than the alternative of retrieving all of
them by default and having a function to retrieve only the correct language.

--
greg

Re: Transparent i18n?

From

David Pratt

Date:

04 July 2005, 15:28:54

Hi Greg. Not sure about this one since I have never made my own type.
Do you mean like an ip to country type of situation to guess locale?
If so, I am using a ip to country table to lookup ip from request and
get the country so language can be passed automatically to display
proper language (but I need some translation work done first before I
can activate this).  I will also use this for black listing purposes
and other things so multi purpose.

I have got a good part of what I wanted working so far.  I am just
working on language update delete trigger since there does not appear
to be a direct way of surgically removing a specific element from an
array in postgres unless I have missed something.  For example if I
knew spanish was 3rd array in my multi-dimensional array of say 10
lang/translation arrays in the array containing all translations - to
remove just this one without having rewrite the array and update the
field (which is what I am hoping to complete today).

So my language update delete trigger needs to scan the array for
lang/translation for deletion, update language key for each language
from a reference field (other than for the language being deleted),
rewrite the array without the lang/translation that was deleted, and
then update the field with rewritten array.  Sounds worse that it
really is since the multidimensional array containing each
lang/translation array is same length and you are performing this by
iterating with a loop through records in multi_language table. Further,
each translation can be compared by key (for me this is the iso
language code).  Also, realistically how many times do you need to add
and drop languages.  And number of languages in use for me will likely
never exceed say 20. So this process, even with large numbers of
multi-language fields should not be that problematic even if you had
say a few thousand text fields fields you wanted translations available
for. I think you would still be looking at milliseconds to perform
this. This will be an after type trigger (after deletion).  I guess I
will see what performance is like when I am finished - so far it is
pretty fast for adding.

You also have a sensible structure for multi_language fields where each
one is referenced to multi_language table by id (normalized) with
referential integrity (something I was seeking).  The only thing not
normalized are translations which is okay to me since array structure
is dynamic yet keys give you exactly what you want.  I am also going to
look at Karsten's material shortly to see how his system works but I am
interested in following through with what I started first with arrays
approach since I am happy with what I am seeing.

Regards,
David

On Monday, July 4, 2005, at 12:06 PM, Greg Stark wrote:

>
> I wonder if you could make an SQL type that used text[] as its storage
> format
> but had an output function that displayed the correct text for the
> "current
> locale". Where "current locale" could be something you set by calling a
> function at the beginning of the transaction.
>
> Do pg_dump and all the important things use the send/receive functions
> not the
> input/output functions? so even though this output function loses
> information
> it wouldn't cause serious problems?
>
> You would still need a way to retrieve all the languages for the cases
> like
> administrative interfaces for updating the information. I'm not
> entirely
> convinced this would be any better than the alternative of retrieving
> all of
> them by default and having a function to retrieve only the correct
> language.
>
> --
> greg
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: Transparent i18n?

From

Oleg Bartunov

Date:

04 July 2005, 16:59:11

Hi there,

sorry if just misunderstanding but we have contrib/hstore available from
http://www.sai.msu.su/~megera/postgres/gist/
which could be used for storing as many languages as you need.
It's sort of perl hash.

     Oleg
On Mon, 4 Jul 2005, David Pratt wrote:

> Hi Greg. Not sure about this one since I have never made my own type.  Do you
> mean like an ip to country type of situation to guess locale?  If so, I am
> using a ip to country table to lookup ip from request and get the country so
> language can be passed automatically to display proper language (but I need
> some translation work done first before I can activate this).  I will also
> use this for black listing purposes and other things so multi purpose.
>
> I have got a good part of what I wanted working so far.  I am just working on
> language update delete trigger since there does not appear to be a direct way
> of surgically removing a specific element from an array in postgres unless I
> have missed something.  For example if I knew spanish was 3rd array in my
> multi-dimensional array of say 10 lang/translation arrays in the array
> containing all translations - to remove just this one without having rewrite
> the array and update the field (which is what I am hoping to complete today).
>
> So my language update delete trigger needs to scan the array for
> lang/translation for deletion, update language key for each language from a
> reference field (other than for the language being deleted), rewrite the
> array without the lang/translation that was deleted, and then update the
> field with rewritten array.  Sounds worse that it really is since the
> multidimensional array containing each lang/translation array is same length
> and you are performing this by iterating with a loop through records in
> multi_language table. Further, each translation can be compared by key (for
> me this is the iso language code).  Also, realistically how many times do you
> need to add and drop languages.  And number of languages in use for me will
> likely never exceed say 20. So this process, even with large numbers of
> multi-language fields should not be that problematic even if you had say a
> few thousand text fields fields you wanted translations available for. I
> think you would still be looking at milliseconds to perform this. This will
> be an after type trigger (after deletion).  I guess I will see what
> performance is like when I am finished - so far it is pretty fast for adding.
>
> You also have a sensible structure for multi_language fields where each one
> is referenced to multi_language table by id (normalized) with referential
> integrity (something I was seeking).  The only thing not normalized are
> translations which is okay to me since array structure is dynamic yet keys
> give you exactly what you want.  I am also going to look at Karsten's
> material shortly to see how his system works but I am interested in following
> through with what I started first with arrays approach since I am happy with
> what I am seeing.
>
> Regards,
> David
>
> On Monday, July 4, 2005, at 12:06 PM, Greg Stark wrote:
>
>>
>> I wonder if you could make an SQL type that used text[] as its storage
>> format
>> but had an output function that displayed the correct text for the "current
>> locale". Where "current locale" could be something you set by calling a
>> function at the beginning of the transaction.
>>
>> Do pg_dump and all the important things use the send/receive functions not
>> the
>> input/output functions? so even though this output function loses
>> information
>> it wouldn't cause serious problems?
>>
>> You would still need a way to retrieve all the languages for the cases like
>> administrative interfaces for updating the information. I'm not entirely
>> convinced this would be any better than the alternative of retrieving all
>> of
>> them by default and having a function to retrieve only the correct
>> language.
>>
>> --
>> greg
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 8: explain analyze is your friend
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>      choose an index scan if your joining column's datatypes do not
>      match
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Transparent i18n?

From

Greg Stark

Date:

04 July 2005, 17:53:36

Oleg Bartunov <oleg@sai.msu.su> writes:

> Hi there,
>
> sorry if just misunderstanding but we have contrib/hstore available from
> http://www.sai.msu.su/~megera/postgres/gist/
> which could be used for storing as many languages as you need.
> It's sort of perl hash.

Huh. That's pretty neat. I don't really need it since I can just assign fixed
array indexes for each locale and use arrays. But for someone who has to
support lots of different sets of locales it could be useful. Or for someone
who has to index these columns using gist.

--
greg

Re: Transparent i18n?

From

Karsten Hilbert

Date:

07 July 2005, 09:29:12

On Mon, Jul 04, 2005 at 03:27:59PM -0300, David Pratt wrote:

> I am also going to look at Karsten's material shortly to see how his system works
I am still away from the net but here is how to find the
description in our Wiki:

Go to user support, user guide, scroll down do developers
guide, go to backend I18N.

Please point out anything you find difficult to figure out.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Transparent i18n?

From

David Pratt

Date:

07 July 2005, 13:37:21

Many thanks Karsten. I got a system working with arrays yesterday but
will still be examining your code. I guess the next challenge is to see
how well the multidimensional array can be searched.  I guess I could
make indexes on an expression to retrieve language for a specific key
since each element array of multidimensional array is a translation
that includes the iso code and text of the translation.

It is pretty light and quick.  I am open to examining anything that
will help me learn more about doing this well.

Regards,
David.

On Wednesday, July 6, 2005, at 11:19 AM, Karsten Hilbert wrote:

> On Mon, Jul 04, 2005 at 03:27:59PM -0300, David Pratt wrote:
>
>> I am also going to look at Karsten's material shortly to see how his
>> system works
> I am still away from the net but here is how to find the
> description in our Wiki:
>
> Go to user support, user guide, scroll down do developers
> guide, go to backend I18N.
>
> Please point out anything you find difficult to figure out.
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>