Re: fts, compond words? - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: fts, compond words?
Date
Msg-id 43980BE7.9000601@sigaev.ru
Whole thread Raw
In response to Re: fts, compond words?  (Mike Rylander <mrylander@gmail.com>)
Responses Re: fts, compond words?
List pgsql-general
> hrm... that is a problem.  Though, I think that's a case of how the
> compiled expression is built from user input.  Unless I'm mistaken
>
>   a + ( foo1 | foo2 )
>
> is exactly equal to
>
>   (a + foo1) | (a + foo2)
>
>
> Ahhh... but then there is the more complex example of
>
>   a + foonish + bar
>
> becoming
>
>   a + (foo1 | foo2) + bar
>
> .... but I guess that could be
>
> (a + foo1 + bar) | (a + foo2 + bar)

That a simple case, what about languages as norwegian or german? They has
compound words and ispell dictionary can split them to lexemes. But, usialy
there is more than one variant of separation:

forbruksvaremerkelov
    forbruk    vare merke lov
    forbruk    vare merkelov
    forbruk varemerke lov
    forbruk varemerkelov
    forbruksvare merke lov
    forbruksvare merkelov
(notice: I don't know translation, just an example. When we working on compound
word support we found word which has 24 variant of separation!!)

So, query 'a + forbruksvaremerkelov' will be awful:

a + ( (forbruk & vare & merke & lov) | (forbruk & vare & merkelov) | ... )

Of course, that is examle just from mind, but solution of phrase search should
work reasonably with such corner cases.



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-general by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Help on collation and accent sensitivity
Next
From: Teodor Sigaev
Date:
Subject: Re: TSearch2 / Get all unique lexems