my statement about the function usefulness was probably too categorical, though I had in mind the current name of the function.
I'm afraid that creating a function that implements quite different algorithms depending on a global parameter seems very hacky and would lead to misunderstandings. I do understand the need of backward compatibility, but I'd opt for the lesser evil. Perhaps a good idea would be to change the name to 'substring_similarity()' and introduce the new function 'word_similarity()' later, for example in the next major version release.
Good point. I've no complaints about that. I'm going to propose corresponding patch to the next commitfest.
I've written a draft patch for fixing this inconsistency. Please, find it in attachment. This patch doesn't contain proper documentation and comments yet.
I've called existing behavior subset_similarity(). I didn't use name substring_similarity(), because it doesn't really looking for substring with appropriate padding, but rather searching for continuous subset of trigrams. For index search over subset similarity, %>>, <<%, <->>>, <<<-> operators are provided. I've added extra arrow sign to denote these operators look deeper into string.
Simultaneously, word_similarity() now forces extent bounds to be word bounds. Now word_similarity() behaves similar to my_word_similarity() proposed on stackoverlow.
The difference here is only in 'messsage s' row, because word_similarity() allows matching one word to two or more while my_word_similarity() doesn't allow that. In this case word_similarity() returns similarity between 'sage' and 'message s'.
# select similarity('sage', 'message s');
similarity
------------
0.363636
(1 row)
I think behavior of word_similarity() appears better here, because typo can break word into two.
I also wonder if word_similarity() and subset_similarity() should share same threshold value for indexed search. subset_similarity() typically returns higher values than word_similarity(). Thus, it's probably makes sense to split their threshold values.
From:
"Raghavendra Rao Jsv -X (rjsv - SCARLET WIRELESS INDIA PRIVATE LIMITEDat Cisco)" Date: Subject:
missing chunk number 0 for toast value 1086251 in pg_toast_2619
Есть вопросы? Напишите нам!
Соглашаюсь с условиями обработки персональных данных
✖
By continuing to browse this website, you agree to the use of cookies. Go to Privacy Policy.