Re: Fuzzy substring searching with the pg_trgm extension - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Fuzzy substring searching with the pg_trgm extension
Date
Msg-id CAMkU=1wutmCDfEsDp+KDwUHbTGCBRwh_xzf0tp=3E5FA5OEcug@mail.gmail.com
Whole thread Raw
In response to Re: Fuzzy substring searching with the pg_trgm extension  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Fuzzy substring searching with the pg_trgm extension  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Dec 26, 2015 at 9:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Dec 18, 2015 at 11:43 AM, Artur Zakirov
> <a.zakirov@postgrespro.ru> wrote:
>> Hello.
>>
>> PostgreSQL has a contrib module named pg_trgm. It is used to the fuzzy text
>> search. It provides some functions and operators for determining the
>> similarity of the given texts using trigram matching.
>>
>> At the moment, in pg_trgm both the similarity function and the % operator
>> match two strings expecting that they are similar entirely. But they give
>> bad results if we want to find documents by a query which is substring of a
>> document.
>
> This is very interesting.  I suspect the index will not be very useful
> in cases where the full string is much larger than the substring,
> because the limit will not be met often enough to rule out many rows
> just based on the index data.  I have a pretty good test case to see.

My test case is turning out to be harder to evaluate than I thought,
because it was more complicated than I remembered it being.  Hopefully
I can provide some more feedback on it next week.

In the meantime, I had a question about bumping the version to 1.3.

Version 1.2 of pg_trgm has never been included in a community release
(because it didn't make the 9.5 cutoff).  So should we really bump the
version to 1.3, or just merge the changes here directly into the
existing 1.2 HEAD code?

I think it would be pretty odd for 9.5. to come with pg_trgm 1.1 and
for 9.6 to come with pg_trgm 1.3.

Thanks,

Jeff



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: pgindent-polluted commits
Next
From: Jim Nasby
Date:
Subject: Re: Weird behavior during CREATE EXTENSION