Hi,
On 14.07.2016 01:16, Stefan Keller wrote:
> Hi,
>
> I have a text corpus which contains either German or English docs and
> I expect queries where I don't know if it's German or English. So I'd
> like e.g. that a query "forest" matches "forest" in body_en but also
> "Wald" in body_de.
>
> I created a table with attributes body_en and body_de (type "text"). I
> will use ts_vector/ts_query on the fly (don't need yet an index
> (attributes)).
>
> * Can FTS handle this multilingual situation?
In my opinion, PostgreSQL cant handle it. It cant translate words from
one language to another, it just stems word from original form to basic
form. First you need to translate word from English to German, then
search word in the body_de attribute.
And the issue is complicated by the fact that one word could have
different meaning in the other language.
> * How to setup a text search configuration which e.g. stems en and de words?
> * Should I create a synonym dictionary which contains word
> translations en-de instead of synonyms en-en?
This synonym dictionary will contain a thousands entries. So it will
require a great effort to make this dictionary.
> * Any hints to related work where FTS has been used in a multilingual context?
>
> :Stefan
>
>
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company