Re: Improving docs for strict_word_similarity() - Mailing list pgsql-docs
From | Alexander Korotkov |
---|---|
Subject | Re: Improving docs for strict_word_similarity() |
Date | |
Msg-id | CAPpHfds38hGF9_Qs3Up4Dx4vuvVWJkqdCyAPYo7nZvo_5eebkA@mail.gmail.com Whole thread Raw |
In response to | Re: Improving docs for strict_word_similarity() (Alexander Korotkov <aekorotkov@gmail.com>) |
List | pgsql-docs |
On Fri, Jun 1, 2018 at 6:39 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce@momjian.us> wrote: >> >> While creating the release notes, I was confused by the description for >> strict_word_similarity(), particularly "extent boundaries". The >> attached patch clarifies, at least for me, how word_similarity() and >> strict_word_similarity() differ. > > > Thank you for your efforts on improving documentation of pg_trgm. > However, I don't find all of them correct. I've following notes regarding > the edits you propose. > > --- 112,119 ---- > </entry> > <entry><type>real</type></entry> > <entry> > ! Same as <function>word_similarity(text, text)</function>, but > ! considers the set of trigrams to be of the same length. > </entry> > </row> > <row> > > This doesn't look a correct description. In short, strict_word_similarity() is searching > for extent of words in the second string, which is best match for the first string. > So, this function takes care about using whole words from the second strings, > not parts of words. However, this is not matter of length of trigrams sets. > > --- 164,182 ---- > This function returns a value that can be approximately understood as the > greatest similarity between the first string and any substring of the second > string. However, this function does not add padding to the boundaries of > ! the extent. Thus, the number of additional characters present in the > ! second string is not considered, except for the mismatched word boundry. > </para> > > This looks correct for me. > > ! The function <function>strict_word_similarity(text, text)</function> > ! does consider additional characters in the second string. In the > ! example above, <function>strict_word_similarity(text, text)</function> > ! would use the full trigram for the second string when computing > ! similarity, not just the part of the trigram that matches the > ! first string. For example, it would use the <literal>{" w"," > ! wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole > ! word <literal>'words'</literal>. > > After your edits, it looks like strict_word_similarity() matches full > set of first string trigrams to full set of second string trigrams. However, > this is description of just similarity() function. Actually, > strict_word_similarity() matches set of trigrams of first string to > set of trigrams of conjuncted subset of second string words. > > --- 189,197 ---- > > <para> > Thus, the <function>strict_word_similarity(text, text)</function> function > ! is useful for finding the similarity to whole words, while > <function>word_similarity(text, text)</function> is more suitable for > ! finding the similarity for parts of words. > </para> > > This also looks correct to me. I've edited places, which looked incorrect for me. I tried to do my best in making them as clear as possible. Bruce, could you please take a look on them? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-docs by date: