Re: Tsearch limitations - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: Tsearch limitations
Date
Msg-id 3F3901DF.1020900@sigaev.ru
Whole thread Raw
In response to Re: Tsearch limitations  (Mike Benoit <mikeb@netnation.com>)
List pgsql-general

Mike Benoit wrote:
> Oleg,
>
>     Is it possible to have Tsearch support soundex, or levenshtein
> (http://ca3.php.net/manual/en/function.levenshtein.php) when searching?
Sorrry, No


Function of calculating levenshtein distance defined as
int levenshtein ( string str1, string str2)

So, it can't be used as dictionary. :(

Index stores only signature of lexized word and we can't find distance between
query word and signature.

>
> I've never used Tsearch before, but I assume this might just be a matter
> of writing a different parser to add soundex'd versions of words to the
> index, then modify the query functions to search on both versions of the
> word?

For work with tsearch2, dictionary must return "canonical" kind of input lexemes
(usially infinitive). If you can write function which corrects some mistakes in
word then you can use it in tsearch.



>
>
> On Mon, 2003-08-11 at 07:30, Oleg Bartunov wrote:
>
>>On Mon, 11 Aug 2003 psql-mail@freeuk.com wrote:
>>
>>
>>>Oleg,
>>>
>>>I understand (i think) how the parser breaks up the input into words
>>>and builds ts_vector's.
>>>
>>>And i understand how to do queries as described into the documentation.
>>>(I have read it!)
>>>
>>>SELECT * FROM vectors WHERE vector @@ to_tsquery('(leads|forks) & !
>>>crawl')
>>>
>>>But i haven't seen any mention of if i add the word:
>>>
>>>cathedral
>>>
>>>if there is any query which will match if I search for "thed".
>>
>>No, tsearch2 is a word oriented search. It doesn't supports substring
>>search.
>>
>>
>>>The documentation seems to say that this cannot be done - but i'd just
>>>like to check. Tsearch2 does everything i want except this.
>>>
>>>"remember that the search operator @@ finds only exact matches between
>>>query lexemes and vector lexemes ≈ if they are not exactly the same
>>>string, they will not be considered a match"
>>>
>>>
>>>
>>>>Mat,
>>>>
>>>>there are several function you may use to see (please, read
>>>
>>>documentation):
>>>
>>>>apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com'
>>>
>>>);
>>>
>>>>                    to_tsvector
>>>>----------------------------------------------------
>>>> 'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
>>>>(1 row)
>>>>
>>>>or, even better
>>>>
>>>>apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.
>>>
>>>com');
>>>
>>>>     ts_name     | tok_type | description |        token         |
>>>
>>>dict_name |        tsvector
>>>
>>>>-----------------+----------+-------------+----------------------+----
>>>
>>>-------+------------------------
>>>
>>>> default_russian | lword    | Latin word  | Hi                   | {
>>>
>>>en_stem} | 'hi'
>>>
>>>> default_russian | lword    | Latin word  | my                   | {
>>>
>>>en_stem} |
>>>
>>>> default_russian | lword    | Latin word  | email                | {
>>>
>>>en_stem} | 'email'
>>>
>>>> default_russian | lword    | Latin word  | addres               | {
>>>
>>>en_stem} | 'addr'
>>>
>>>> default_russian | lword    | Latin word  | is                   | {
>>>
>>>en_stem} |
>>>
>>>> default_russian | email    | Email       | psql-mail@freeuk.com | {
>>>
>>>simple}  | 'psql-mail@freeuk.com'
>>>
>>>>(6 rows)
>>>>
>>>>You may write your own parser or preprocess text before tsearch.
>>>>
>>>>    Oleg
>>>>On Mon, 11 Aug 2003, Mat wrote:
>>>>
>>>>
>>>>>Can Tsearch be used to return substring matches?
>>>>>
>>>>>i.e
>>>>>
>>>>>Text to search: Hi my email addres is psql-mail@freeuk.com
>>>>>
>>>>>Query "psql" would match the email address?
>>>>>
>>>>>Query "mail" would also match?
>>>>>
>>>>>Query "reeu" would also match?
>>>>>
>>>>>Or is tsearch not suitable for this type of query? should i use FTI
>>>
>>>>>instead?
>>>>>
>>>>>Thanks.
>>>>>
>>>>>
>>>>>---------------------------(end of broadcast)-----------------------
>>>
>>>----
>>>
>>>>>TIP 6: Have you searched our list archives?
>>>>>
>>>>>               http://archives.postgresql.org
>>>>>
>>>>
>>>>    Regards,
>>>>        Oleg
>>>>_____________________________________________________________
>>>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>>>Sternberg Astronomical Institute, Moscow University (Russia)
>>>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>>phone: +007(095)939-16-83, +007(095)939-23-83
>>>>
>>>
>>>
>>    Regards,
>>        Oleg
>>_____________________________________________________________
>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>Sternberg Astronomical Institute, Moscow University (Russia)
>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>phone: +007(095)939-16-83, +007(095)939-23-83
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 8: explain analyze is your friend

--
Teodor Sigaev                                  E-mail: teodor@sigaev.ru


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 7.4Beta1 "failed to create socket: Address family not
Next
From: Dennis Gearon
Date:
Subject: Re: How to prevent vacuum and reindex from deadlocking.