Re: Fuzzy substring searching with the pg_trgm extension - Mailing list pgsql-hackers

From Artur Zakirov
Subject Re: Fuzzy substring searching with the pg_trgm extension
Date
Msg-id 56BB5873.1020503@postgrespro.ru
Whole thread Raw
In response to Re: Fuzzy substring searching with the pg_trgm extension  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: Fuzzy substring searching with the pg_trgm extension  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
On 02.02.2016 15:45, Artur Zakirov wrote:
> On 01.02.2016 20:12, Artur Zakirov wrote:
>>
>> I have changed the patch:
>> 1 - trgm2.data was corrected, duplicates were deleted.
>> 2 - I have added operators <<-> and <->> with GiST index supporting. A
>> regression test will pass only with the patch
>> http://www.postgresql.org/message-id/CAPpHfdt19FwQXarYjkzxb3oxmv-KAn3FLuZrooARE_U3H3CV9g@mail.gmail.com
>>
>>
>> 3 - the function substring_similarity() was renamed to
>> subword_similarity().
>>
>> But there is not a function substring_similarity_pos() yet. It is not
>> trivial.
>>
>
> Sorry, in the previous patch was a typo. Here is the fixed patch.
>

I have attached a new version of the patch. It fixes error of operators
<->> and %>:
- operator <->> did not pass the regression test in CentOS 32 bit (gcc
4.4.7 20120313).
- operator %> did not pass the regression test in FreeBSD 32 bit (gcc
4.2.1 20070831).

It was because of variable optimization by gcc.

In this patch pg_trgm documentation was corrected. Now operators were
wrote as %> and <->> (not <% and <<->).

There is a problem in adding the substring_similarity_pos() function. It
can bring additional overhead. Because we need to store characters
position including spaces in addition. Spaces between words are lost in
current implementation.
Does it actually need?


In conclusion, this patch introduces:
1 - functions:
     - subword_similarity()
2 - operators:
     - %>
     - <->>
3 - GUC variables:
     - pg_trgm.sml_limit
     - pg_trgm.subword_limit

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] Refactoring of LWLock tranches
Next
From: Fabien COELHO
Date:
Subject: Re: extend pgbench expressions with functions