Re: FTS with more than one language in body and with unknown query language? - Mailing list pgsql-general

From Stefan Keller
Subject Re: FTS with more than one language in body and with unknown query language?
Date
Msg-id CAFcOn29yEES4Y=E1c0Nj__8o1Kb_RB4Ey2NZtTYuy7w2DRcjag@mail.gmail.com
Whole thread Raw
In response to Re: FTS with more than one language in body and with unknown query language?  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: FTS with more than one language in body and with unknown query language?
List pgsql-general
приве́т! Artur

Thanks for your explanations.

2016-07-14 17:20 GMT+02:00 Artur Zakirov <a.zakirov@postgrespro.ru>:
> On 14.07.2016 01:16, Stefan Keller wrote:
...
>> * Should I create a synonym dictionary which contains word
>> translations en-de instead of synonyms en-en?
>
> This synonym dictionary will contain a thousands entries. So it will require
> a great effort to make this dictionary.

It's a domain-specific corpus of max. 1000 records of descriptive text
(metadata) about geographic data, like topographic map, land use
planning, etc.

...
>> * How to setup a text search configuration which e.g. stems en and de
>> words?

I still would like to give FTS a try with synonym dictionary (en-de).
Now, I'm wondering how to setup the configuration. I've seen examples
to process either english, german or russian alone. But I did not find
yet any documentation on how to setup the text search configuration
where a corpus contains two (or more) languages at same time in a
table (body_en and body_de).

:Stefan







2016-07-14 17:20 GMT+02:00 Artur Zakirov <a.zakirov@postgrespro.ru>:
> Hi,
>
> On 14.07.2016 01:16, Stefan Keller wrote:
>>
>> Hi,
>>
>> I have a text corpus which contains either German or English docs and
>> I expect queries where I don't know if it's German or English. So I'd
>> like e.g. that a query "forest" matches "forest" in body_en but also
>> "Wald" in body_de.
>>
>> I created a table with attributes body_en and body_de (type "text"). I
>> will use ts_vector/ts_query on the fly (don't need yet an index
>> (attributes)).
>>
>> * Can FTS handle this multilingual situation?
>
>
> In my opinion, PostgreSQL cant handle it. It cant translate words from one
> language to another, it just stems word from original form to basic form.
> First you need to translate word from English to German, then search word in
> the body_de attribute.
>
> And the issue is complicated by the fact that one word could have different
> meaning in the other language.
>
>> * How to setup a text search configuration which e.g. stems en and de
>> words?
>> * Should I create a synonym dictionary which contains word
>> translations en-de instead of synonyms en-en?
>
>
> This synonym dictionary will contain a thousands entries. So it will require
> a great effort to make this dictionary.
>
>
>> * Any hints to related work where FTS has been used in a multilingual
>> context?
>>
>> :Stefan
>>
>>
>
> --
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company


pgsql-general by date:

Previous
From: Charles Weitzer
Date:
Subject: Re: Database Architect - Voleon Capital Management LP
Next
From: Derek Mahar
Date:
Subject: Re: PostgreSQL image for rkt on CoreOS