Re: Improving docs for strict_word_similarity() - Mailing list pgsql-docs

From Alexander Korotkov
Subject Re: Improving docs for strict_word_similarity()
Date
Msg-id CAPpHfds38hGF9_Qs3Up4Dx4vuvVWJkqdCyAPYo7nZvo_5eebkA@mail.gmail.com
Whole thread Raw
In response to Re: Improving docs for strict_word_similarity()  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-docs
On Fri, Jun 1, 2018 at 6:39 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce@momjian.us> wrote:
>>
>> While creating the release notes, I was confused by the description for
>> strict_word_similarity(), particularly "extent boundaries".  The
>> attached patch clarifies, at least for me, how word_similarity() and
>> strict_word_similarity() differ.
>
>
> Thank you for your efforts on improving documentation of pg_trgm.
> However, I don't find all of them correct.  I've following notes regarding
> the edits you propose.
>
> --- 112,119 ----
>         </entry>
>         <entry><type>real</type></entry>
>         <entry>
> !        Same as <function>word_similarity(text, text)</function>, but
> !        considers the set of trigrams to be of the same length.
>         </entry>
>        </row>
>        <row>
>
> This doesn't look a correct description.  In short, strict_word_similarity() is searching
> for extent of words in the second string, which is best match for the first string.
> So, this function takes care about using whole words from the second strings,
> not parts of words.  However, this is not matter of length of trigrams sets.
>
> --- 164,182 ----
>      This function returns a value that can be approximately understood as the
>      greatest similarity between the first string and any substring of the second
>      string.  However, this function does not add padding to the boundaries of
> !    the extent.  Thus, the number of additional characters present in the
> !    second string is not considered, except for the mismatched word boundry.
>     </para>
>
> This looks correct for me.
>
> !    The function <function>strict_word_similarity(text, text)</function>
> !    does consider additional characters in the second string.  In the
> !    example above, <function>strict_word_similarity(text, text)</function>
> !    would use the full trigram for the second string when computing
> !    similarity, not just the part of the trigram that matches the
> !    first string. For example, it would use the <literal>{" w","
> !    wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole
> !    word <literal>'words'</literal>.
>
> After your edits, it looks like strict_word_similarity() matches full
> set of first string trigrams to full set of second string trigrams.  However,
> this is description of just similarity() function.  Actually,
> strict_word_similarity() matches set of trigrams of first string to
> set of trigrams of conjuncted subset of second string words.
>
> --- 189,197 ----
>
>     <para>
>      Thus, the <function>strict_word_similarity(text, text)</function> function
> !    is useful for finding the similarity to whole words, while
>      <function>word_similarity(text, text)</function> is more suitable for
> !    finding the similarity for parts of words.
>     </para>
>
> This also looks correct to me.

I've edited places, which looked incorrect for me.  I tried to do my
best in making them as clear as possible.  Bruce, could you please
take a look on them?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-docs by date:

Previous
From: Tom Lane
Date:
Subject: Re: updatable cursors and ORDER BY
Next
From: Bruce Momjian
Date:
Subject: Re: Improving docs for strict_word_similarity()