Re: FTS with more than one language in body and with unknown query language? - Mailing list pgsql-general

From Artur Zakirov
Subject Re: FTS with more than one language in body and with unknown query language?
Date
Msg-id 54fd4e6a-93a6-90a8-111a-cb773a7a52ef@postgrespro.ru
Whole thread Raw
In response to FTS with more than one language in body and with unknown query language?  (Stefan Keller <sfkeller@gmail.com>)
Responses Re: FTS with more than one language in body and with unknown query language?
List pgsql-general
Hi,

On 14.07.2016 01:16, Stefan Keller wrote:
> Hi,
>
> I have a text corpus which contains either German or English docs and
> I expect queries where I don't know if it's German or English. So I'd
> like e.g. that a query "forest" matches "forest" in body_en but also
> "Wald" in body_de.
>
> I created a table with attributes body_en and body_de (type "text"). I
> will use ts_vector/ts_query on the fly (don't need yet an index
> (attributes)).
>
> * Can FTS handle this multilingual situation?

In my opinion, PostgreSQL cant handle it. It cant translate words from
one language to another, it just stems word from original form to basic
form. First you need to translate word from English to German, then
search word in the body_de attribute.

And the issue is complicated by the fact that one word could have
different meaning in the other language.

> * How to setup a text search configuration which e.g. stems en and de words?
> * Should I create a synonym dictionary which contains word
> translations en-de instead of synonyms en-en?

This synonym dictionary will contain a thousands entries. So it will
require a great effort to make this dictionary.

> * Any hints to related work where FTS has been used in a multilingual context?
>
> :Stefan
>
>

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Server side backend permanent session memory usage ?
Next
From: Miguel Ramos
Date:
Subject: Re: pg_restore out of memory