Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug - Mailing list pgsql-bugs

From Alexander Korotkov
Subject Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
Date
Msg-id CAPpHfdtXJp0xvi8QbcHWqnrk=XyyMux4FtbJCgZFPknPbLERVA@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-bugs
On Fri, Dec 8, 2017 at 2:50 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:
> I'm afraid that creating a function that implements quite different
> algorithms depending on a global parameter seems very hacky and would lead
> to misunderstandings. I do understand the need of backward compatibility,
> but I'd opt for the lesser evil. Perhaps a good idea would be to change the
> name to 'substring_similarity()' and introduce the new function
> 'word_similarity()' later, for example in the next major version release.

That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now.  Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.

+1
Thank you for pointing this.  Yes, it would be better not to change existing names and behavior, but adjust documentation and add alternative behavior with another name.
Therefore, I'm going to provide patchset of two patches:
1) Improve word_similarity() documentation.
2) Add new function strict_word_similarity() (or whatever better name we invent).

Please, find patchset attached.

0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement to documentation of word_similarity() and related operators.  I decided to give formal definition first (what exactly it internally does), and then example and some more human-understandable description.  This patch also adjusts two comments where lower and upper bounds mess up.

0002-pg-trgm-strict_word-similarity.patch – implementation of strict_word_similarity() with comments, docs and tests.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

 
Attachment

pgsql-bugs by date:

Previous
From: nonmint@gmail.com
Date:
Subject: BUG #14965: PGAdmin 2.0 fails to launch after restart
Next
From: dennis.noordsij@alumni.helsinki.fi
Date:
Subject: BUG #14966: Related to #14702 / corruption in replication