Re: Fuzzy substring searching with the pg_trgm extension - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: Fuzzy substring searching with the pg_trgm extension
Date
Msg-id 56AB73F6.7050200@sigaev.ru
Whole thread Raw
In response to Re: Fuzzy substring searching with the pg_trgm extension  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Fuzzy substring searching with the pg_trgm extension  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Re: Fuzzy substring searching with the pg_trgm extension  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Fuzzy substring searching with the pg_trgm extension  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
> The behavior of this function is surprising to me.
>
> select substring_similarity('dog' ,  'hotdogpound') ;
>
>   substring_similarity
> ----------------------
>                   0.25
>
Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ; substring_similarity
----------------------                 0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ; substring_similarity
----------------------                    1
It seems to me that users search words in long string. But I'm agree that more 
detailed explanation needed and, may be, we need to change feature name to 
fuzzywordsearch or something else, I can't imagine how.


>
> Also, should we have a function which indicates the position in the
> 2nd string at which the most similar match to the 1st argument occurs?
>
> select substring_similarity_pos('dog' ,  'hotdogpound') ;
>
> answering: 4
Interesting, I think, it will be useful in some cases.

>
> We could call them <<-> and <->> , where the first corresponds to <%
> and the second to %>
Agree
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: Sequence Access Method WIP
Next
From: Artur Zakirov
Date:
Subject: Re: Fuzzy substring searching with the pg_trgm extension