Home > mailing lists

Re: Improving docs for strict_word_similarity() - Mailing list pgsql-docs

From	Alexander Korotkov
Subject	Re: Improving docs for strict_word_similarity()
Date	June 1, 2018 18:39:11
Msg-id	CAPpHfdumsXfLUhtuiwDWU+Gf-KYkkqHCvMvRggYOugt-FBjfFg@mail.gmail.com Whole thread Raw
In response to	Improving docs for strict_word_similarity() (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Improving docs for strict_word_similarity() Re: Improving docs for strict_word_similarity()
List	pgsql-docs

Tree view

Hi, Bruce!

On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce@momjian.us> wrote:

While creating the release notes, I was confused by the description for
strict_word_similarity(), particularly "extent boundaries". The
attached patch clarifies, at least for me, how word_similarity() and
strict_word_similarity() differ.

Thank you for your efforts on improving documentation of pg_trgm.

However, I don't find all of them correct. I've following notes regarding

the edits you propose.

--- 112,119 ----

</entry>

<entry>

! Same as <function>word_similarity(text, text)</function>, but

! considers the set of trigrams to be of the same length.

</entry>

</row>

<row>

This doesn't look a correct description. In short, strict_word_similarity() is searching

for extent of words in the second string, which is best match for the first string.

So, this function takes care about using whole words from the second strings,

not parts of words. However, this is not matter of length of trigrams sets.

--- 164,182 ----

This function returns a value that can be approximately understood as the

greatest similarity between the first string and any substring of the second

string. However, this function does not add padding to the boundaries of

! the extent. Thus, the number of additional characters present in the

! second string is not considered, except for the mismatched word boundry.

</para>

This looks correct for me.

! The function <function>strict_word_similarity(text, text)</function>

! does consider additional characters in the second string. In the

! example above, <function>strict_word_similarity(text, text)</function>

! would use the full trigram for the second string when computing

! similarity, not just the part of the trigram that matches the

! first string. For example, it would use the <literal>{" w","

! wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole

! word <literal>'words'</literal>.

After your edits, it looks like strict_word_similarity() matches full

set of first string trigrams to full set of second string trigrams. However,

this is description of just similarity() function. Actually,

strict_word_similarity() matches set of trigrams of first string to

set of trigrams of conjuncted subset of second string words.

--- 189,197 ----

<para>

Thus, the <function>strict_word_similarity(text, text)</function> function

! is useful for finding the similarity to whole words, while

<function>word_similarity(text, text)</function> is more suitable for

! finding the similarity for parts of words.

</para>

This also looks correct to me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-docs by date:

From: Dmitry Igrishin
Date: 01 June 2018, 17:56:34
Subject: Add Pgfe library to client interfaces

From: Lætitia Avrot
Date: 01 June 2018, 18:39:18
Subject: Constraint documentation

Re: Improving docs for strict_word_similarity() - Mailing list pgsql-docs

Previous

Next