Re: Improving docs for strict_word_similarity() - Mailing list pgsql-docs

From Alexander Korotkov
Subject Re: Improving docs for strict_word_similarity()
Date
Msg-id CAPpHfdumsXfLUhtuiwDWU+Gf-KYkkqHCvMvRggYOugt-FBjfFg@mail.gmail.com
Whole thread Raw
In response to Improving docs for strict_word_similarity()  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Improving docs for strict_word_similarity()  (Alexander Korotkov <aekorotkov@gmail.com>)
Re: Improving docs for strict_word_similarity()  (Bruce Momjian <bruce@momjian.us>)
List pgsql-docs
Hi, Bruce!

On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce@momjian.us> wrote:
While creating the release notes, I was confused by the description for
strict_word_similarity(), particularly "extent boundaries".  The
attached patch clarifies, at least for me, how word_similarity() and
strict_word_similarity() differ.

Thank you for your efforts on improving documentation of pg_trgm.
However, I don't find all of them correct.  I've following notes regarding
the edits you propose.

--- 112,119 ----
        </entry>
        <entry><type>real</type></entry>
        <entry>
!        Same as <function>word_similarity(text, text)</function>, but
!        considers the set of trigrams to be of the same length.
        </entry>
       </row>
       <row>

This doesn't look a correct description.  In short, strict_word_similarity() is searching
for extent of words in the second string, which is best match for the first string.
So, this function takes care about using whole words from the second strings,
not parts of words.  However, this is not matter of length of trigrams sets.

--- 164,182 ----
     This function returns a value that can be approximately understood as the
     greatest similarity between the first string and any substring of the second
     string.  However, this function does not add padding to the boundaries of
!    the extent.  Thus, the number of additional characters present in the
!    second string is not considered, except for the mismatched word boundry.
    </para>

This looks correct for me.

!    The function <function>strict_word_similarity(text, text)</function>
!    does consider additional characters in the second string.  In the
!    example above, <function>strict_word_similarity(text, text)</function>
!    would use the full trigram for the second string when computing
!    similarity, not just the part of the trigram that matches the
!    first string. For example, it would use the <literal>{" w","
!    wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole
!    word <literal>'words'</literal>.

After your edits, it looks like strict_word_similarity() matches full
set of first string trigrams to full set of second string trigrams.  However,
this is description of just similarity() function.  Actually,
strict_word_similarity() matches set of trigrams of first string to
set of trigrams of conjuncted subset of second string words.

--- 189,197 ----
  
    <para>
     Thus, the <function>strict_word_similarity(text, text)</function> function
!    is useful for finding the similarity to whole words, while
     <function>word_similarity(text, text)</function> is more suitable for
!    finding the similarity for parts of words.
    </para>

This also looks correct to me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-docs by date:

Previous
From: Dmitry Igrishin
Date:
Subject: Add Pgfe library to client interfaces
Next
From: Lætitia Avrot
Date:
Subject: Constraint documentation