Thread: [BUGS] pg_trgm word_similarity inconsistencies or bug
Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words
But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
with data(t) as (
values
('message'),
('message s'),
('message sag'),
('message sag sag'),
('message sag sage')
)
select t, word_similarity('sage', t), my_word_similarity('sage', t)
from data;
t | word_similarity | my_word_similarity
------------------+-----------------+--------------------
message | 0.6 | 0.3
message s | 0.8 | 0.3
message sag | 1 | 0.5
message sag sag | 1 | 0.5
message sag sage | 1 | 1
On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote: > Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words > > But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiplewords rather tan doing a word by word comparison. > > Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it. > Interesting. An klin's answer from stackoverflow.com is right. The initial example can be reduced to the next: =# select word_similarity('sage', 'age sag');word_similarity ----------------- 1 It computes maximum similarity using closest trigrams not considering order of 'sage' trigrams. It determines that all trigrams from 'sage' match trigrams from 'age sag'. Initial order of 'age sag' trigrams: ' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag ' ^ ^ |from |to Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams continuously): ' s', ' sa', 'age', 'ge ', 'sag' Maybe the problem should be solved by considering 'sage' trigrams initial order. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
> Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres- word-similarity-not-comparing- words
>
> But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
>
> Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
>
Interesting. An klin's answer from stackoverflow.com is right.
The initial example can be reduced to the next:
=# select word_similarity('sage', 'age sag');
word_similarity
-----------------
1
It computes maximum similarity using closest trigrams not considering order of
'sage' trigrams. It determines that all
trigrams from 'sage' match trigrams from 'age sag'.
Initial order of 'age sag' trigrams:
' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
^ ^
|from |to
Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
continuously):
' s', ' sa', 'age', 'ge ', 'sag'
Maybe the problem should be solved by considering 'sage' trigrams
initial order.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
2017-10-30 19:08 GMT+01:00 Alexander Korotkov <a.korotkov@postgrespro.ru>: > > On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: >> >> On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote: >> > Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words >> > >> > But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiplewords rather tan doing a word by word comparison. >> > >> > Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it. >> > >> >> Interesting. An klin's answer from stackoverflow.com is right. >> >> The initial example can be reduced to the next: >> >> =# select word_similarity('sage', 'age sag'); >> word_similarity >> ----------------- >> 1 >> >> It computes maximum similarity using closest trigrams not considering order of >> 'sage' trigrams. It determines that all >> trigrams from 'sage' match trigrams from 'age sag'. >> >> Initial order of 'age sag' trigrams: >> ' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag ' >> ^ ^ >> |from |to >> Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams >> continuously): >> ' s', ' sa', 'age', 'ge ', 'sag' >> >> Maybe the problem should be solved by considering 'sage' trigrams >> initial order. > > > We searching for continuous extent of second string trigrams (in original orders) which has best similarity with firststring trigrams. > Possible solution could be forcing this extent boundaries to be at word boundaries. However, it would become less convenientto search for *part* of word. And we already have users adopt this feature. > So, I see following solution: > 1) Define GUC variable which specifies whether word_similarity() should force extent boundaries to be at word boundaries, > 2) Document both cases of word_similarity() behavior. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > Look at the example: with data(word, string) as ( values ('sage', 'message'), ('sage', 'message s'), ('sage', 'message sa') ) select similarity(word, string), word_similarity (word, string) from data; similarity | word_similarity ------------+----------------- 0.3 | 0.6 0.363636 | 0.8 0.454545 | 1 (3 rows) When searching for a part of a word I would expect that the word similarity is the same in all three rows. It's really strange that the context of the second word (sa) makes the similarity equal to 1. From a user's point of view it's also hard to understand why there is such a big difference between similarity() and word_similarity(), especially when comparing just two words (the first row). I do not think the current function has any practical use. ------ Jan Przemysław Wójcik 2017-10-30 19:08 GMT+01:00 Alexander Korotkov <a.korotkov@postgrespro.ru>: > On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <a.zakirov@postgrespro.ru> > wrote: >> >> On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote: >> > Hello all, this is related to postgres 9.6 (9.6.4) and a good >> > description can be found here >> > https://stackoverflow.com/questions/46966360/postgres-word-similarity-not-comparing-words >> > >> > But in summary, word_similarity doesn’t seem to do exactly what the docs >> > say, since it will match trigrams from multiple words rather tan doing a >> > word by word comparison. >> > >> > Below is a table with output and expected output, thanks to kiln from >> > stackoverflow to provide it. >> > >> >> Interesting. An klin's answer from stackoverflow.com is right. >> >> The initial example can be reduced to the next: >> >> =# select word_similarity('sage', 'age sag'); >> word_similarity >> ----------------- >> 1 >> >> It computes maximum similarity using closest trigrams not considering >> order of >> 'sage' trigrams. It determines that all >> trigrams from 'sage' match trigrams from 'age sag'. >> >> Initial order of 'age sag' trigrams: >> ' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag ' >> ^ ^ >> |from |to >> Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams >> continuously): >> ' s', ' sa', 'age', 'ge ', 'sag' >> >> Maybe the problem should be solved by considering 'sage' trigrams >> initial order. > > > We searching for continuous extent of second string trigrams (in original > orders) which has best similarity with first string trigrams. > Possible solution could be forcing this extent boundaries to be at word > boundaries. However, it would become less convenient to search for *part* > of word. And we already have users adopt this feature. > So, I see following solution: > 1) Define GUC variable which specifies whether word_similarity() should > force extent boundaries to be at word boundaries, > 2) Document both cases of word_similarity() behavior. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
2017-10-30 19:08 GMT+01:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:
>> On Sat, Oct 28, 2017 at 11:22 AM, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>>
>> On Fri, Oct 27, 2017 at 06:48:08PM +0000, Cristiano Coelho wrote:
>> > Hello all, this is related to postgres 9.6 (9.6.4) and a good description can be found here https://stackoverflow.com/questions/46966360/postgres- word-similarity-not-comparing- words
>> >
>> > But in summary, word_similarity doesn’t seem to do exactly what the docs say, since it will match trigrams from multiple words rather tan doing a word by word comparison.
>> >
>> > Below is a table with output and expected output, thanks to kiln from stackoverflow to provide it.
>> >
>>
>> Interesting. An klin's answer from stackoverflow.com is right.
>>
>> The initial example can be reduced to the next:
>>
>> =# select word_similarity('sage', 'age sag');
>> word_similarity
>> -----------------
>> 1
>>
>> It computes maximum similarity using closest trigrams not considering order of
>> 'sage' trigrams. It determines that all
>> trigrams from 'sage' match trigrams from 'age sag'.
>>
>> Initial order of 'age sag' trigrams:
>> ' a', ' ag', 'age', 'ge ', ' s', ' sa', 'sag', 'ag '
>> ^ ^
>> |from |to
>> Sorted 'sage' trigrams (all of them occured within 'age sag' trigrams
>> continuously):
>> ' s', ' sa', 'age', 'ge ', 'sag'
>>
>> Maybe the problem should be solved by considering 'sage' trigrams
>> initial order.
>
>
> We searching for continuous extent of second string trigrams (in original orders) which has best similarity with first string trigrams.
> Possible solution could be forcing this extent boundaries to be at word boundaries. However, it would become less convenient to search for *part* of word. And we already have users adopt this feature.
> So, I see following solution:
> 1) Define GUC variable which specifies whether word_similarity() should force extent boundaries to be at word boundaries,
> 2) Document both cases of word_similarity() behavior.Look at the example:
with data(word, string) as (
values
('sage', 'message'),
('sage', 'message s'),
('sage', 'message sa')
)
select
similarity(word, string),
word_similarity (word, string)
from data;
similarity | word_similarity
------------+-----------------
0.3 | 0.6
0.363636 | 0.8
0.454545 | 1
(3 rows)
When searching for a part of a word I would expect that the word
similarity is the same in all three rows. It's really strange that the
context of the second word (sa) makes the similarity equal to 1.
From a user's point of view it's also hard to understand why there is
such a big difference between similarity() and word_similarity(),
especially when comparing just two words (the first row).
I do not think the current function has any practical use.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
I'd like to forward a feedback from our customer who uses word_similarity() function.
François finds current behavior of word_similarity() to be useful. Thus, I think we should preserve it. But documentation correction is needed and option for alternative behavior would be useful too.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From: François CHAHUNEAU <Francois.CHAHUNEAU@numen.fr>
Date: Wed, Nov 1, 2017 at 1:04 AM
Subject: RE: [BUGS] pg_trgm word_similarity inconsistencies or bug
To: Alexander Korotkov <a.korotkov@postgrespro.ru>
Cc: Thierry BOUDIERE <Thierry.BOUDIERE@numen.fr>, "foli@numen.mg" <foli@numen.mg>
Hello Alexander,
We agree that the current pg_trgm documentation does not correctly reflect the de facto behavior of word_similarity(), and that something has to be changed. But to us, it is more a documentation problem than anything else.
What is computed is still « substring_similarity » as was initially specified between us, but it is influenced by a strong word boundary bias caused by the way trigrams are padded at word boundaries. This bias was noticed by early reviewers and you explained that this motivated the name switch to « word_similarity ». As you will remember, at the time we discovered this, we were suprised because we considerd this as a slight misnomer. Indeed, what is currently described in the 9.6 pg_trgm documentation is inaccurate (although seemingly consistent with this new name) and has to be amended.
Now, word_similarity() has been out for more than a year and, of course, it is preferable to avoid any breaking changes… In our case, we consider the name « unfortunate » and the explanation buggy, not the function itself.
As you may remember from the initial discussion, some other users stressed the importance to be able to matchsub strings. We tend to agree with what Jeff Janes wrote in this discussion :
The reason I like the option of not treating word boundaries as
special in this case is that often in scientific vocabulary, and in
catalog part numbers, people are pretty inconsistent about whether
they included spaces. "HEK 293", "HEK293", and "HEK-293" could be all
the same thing. So I like to strip out spaces and punctuation on both
sides of operator. Of course I can't do that if there are invisible
un-removable spaces on the substring side.
But, It doesn't sound like I am going to win that debate. Given that,
I don't think we need a different name for the function. I'm fine with
explaining the word-boundary subtlety in the documentation, and
keeping the function name itself simple.
Now, considering your proposal :
As far as we are concerned, we use <% and %> everyday for efficient fuzzy matching on large databases. Our typical usage scenario is matching noisy OCRized text strings against reference databases. > 1) Define GUC variable which specifies whether word_similarity() should force extent boundaries to be at word boundaries,
Ok for us, iff default behavior remains the same as now, for backward compatibility reasons. We could take advantage, in some cases, of the new « word rounded » behavior controlled by the GUC variable, but this would not cover all scenarios currently in use.
> 2) Document both cases of word_similarity() behavior.
This is clearly needed anyway.
Best regards,
François CHAHUNEAU Directeur des technologies |
NUMEN DIGITAL| 24, rue Marc Seguin 75018 Paris France | www.numen.fr Tel +33 1 40 37 95 03 | Mob +33 6 07 85 21 79 | Fax +33 1 40 37 94 94 |
De : Alexander Korotkov [mailto:a.korotkov@
Envoyé : mardi 31 octobre 2017 16:18
À : Thierry BOUDIERE <Thierry.BOUDIERE@numen.fr>; François CHAHUNEAU <Francois.CHAHUNEAU@numen.fr>
Objet : Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
Dear, Thierry and François!
PostgreSQL users found inconsistency between documentation and implementation of word_similarity().
Possible solution proposed by the reporter is to alter the implementation.
But it's important for me that your interests are not affected but potential further change of implementation of word_similarity().
Could you please share your opinion on changes proposed by Jan in the pgsql-bugs mailing list?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi, my statement about the function usefulness was probably too categorical, though I had in mind the current name of the function. I'm afraid that creating a function that implements quite different algorithms depending on a global parameter seems very hacky and would lead to misunderstandings. I do understand the need of backward compatibility, but I'd opt for the lesser evil. Perhaps a good idea would be to change the name to 'substring_similarity()' and introduce the new function 'word_similarity()' later, for example in the next major version release. ------ Jan Przemysław Wójcik -- Sent from: http://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.
I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik <jan.przemyslaw.wojcik@gmail.com> wrote: my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.
I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.Good point. I've no complaints about that. I'm going to propose corresponding patch to the next commitfest.
Attachment
Sent: Thursday, December 7, 2017 4:38:59 PM
To: Jan Przemysław Wójcik; Cristiano Coelho
Cc: pgsql-bugs@postgresql.org; François CHAHUNEAU; Artur Zakirov; pgsql-hackers
Subject: Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug
François CHAHUNEAU Directeur des technologies |
NUMEN DIGITAL| 24, rue Marc Seguin 75018 Paris France | www.numen.fr Tel +33 1 40 37 95 03 | Mob +33 6 07 85 21 79 | Fax +33 1 40 37 94 94 |
On Tue, Nov 7, 2017 at 3:51 PM, Jan Przemysław Wójcik <jan.przemyslaw.wojcik@gmail.com> wrote: my statement about the function usefulness was probably too categorical,
though I had in mind the current name of the function.
I'm afraid that creating a function that implements quite different
algorithms depending on a global parameter seems very hacky and would lead
to misunderstandings. I do understand the need of backward compatibility,
but I'd opt for the lesser evil. Perhaps a good idea would be to change the
name to 'substring_similarity()' and introduce the new function
'word_similarity()' later, for example in the next major version release.Good point. I've no complaints about that. I'm going to propose corresponding patch to the next commitfest.
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik <jan.przemyslaw.wojcik@gmail.com> wrote: > I'm afraid that creating a function that implements quite different > algorithms depending on a global parameter seems very hacky and would lead > to misunderstandings. I do understand the need of backward compatibility, > but I'd opt for the lesser evil. Perhaps a good idea would be to change the > name to 'substring_similarity()' and introduce the new function > 'word_similarity()' later, for example in the next major version release. That breaks things for everybody using word_similarity() currently. If the previous discussion of this topic concluded that word_similarity() was an OK name despite being a slight misnomer, I don't think we should change our mind now. Instead the new function can be called something which makes the difference clear, e.g. strict_word_similarity(), and the old function can remain as it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:
> I'm afraid that creating a function that implements quite different
> algorithms depending on a global parameter seems very hacky and would lead
> to misunderstandings. I do understand the need of backward compatibility,
> but I'd opt for the lesser evil. Perhaps a good idea would be to change the
> name to 'substring_similarity()' and introduce the new function
> 'word_similarity()' later, for example in the next major version release.
That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:
> I'm afraid that creating a function that implements quite different
> algorithms depending on a global parameter seems very hacky and would lead
> to misunderstandings. I do understand the need of backward compatibility,
> but I'd opt for the lesser evil. Perhaps a good idea would be to change the
> name to 'substring_similarity()' and introduce the new function
> 'word_similarity()' later, for example in the next major version release.
That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:
> I'm afraid that creating a function that implements quite different
> algorithms depending on a global parameter seems very hacky and would lead
> to misunderstandings. I do understand the need of backward compatibility,
> but I'd opt for the lesser evil. Perhaps a good idea would be to change the
> name to 'substring_similarity()' and introduce the new function
> 'word_similarity()' later, for example in the next major version release.
That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.+1Thank you for pointing this. Yes, it would be better not to change existing names and behavior, but adjust documentation and add alternative behavior with another name.Therefore, I'm going to provide patchset of two patches:1) Improve word_similarity() documentation.2) Add new function strict_word_similarity() (or whatever better name we invent).
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
Attachment
On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
<jan.przemyslaw.wojcik@gmail.com> wrote:
> I'm afraid that creating a function that implements quite different
> algorithms depending on a global parameter seems very hacky and would lead
> to misunderstandings. I do understand the need of backward compatibility,
> but I'd opt for the lesser evil. Perhaps a good idea would be to change the
> name to 'substring_similarity()' and introduce the new function
> 'word_similarity()' later, for example in the next major version release.
That breaks things for everybody using word_similarity() currently.
If the previous discussion of this topic concluded that
word_similarity() was an OK name despite being a slight misnomer, I
don't think we should change our mind now. Instead the new function
can be called something which makes the difference clear, e.g.
strict_word_similarity(), and the old function can remain as it is.+1Thank you for pointing this. Yes, it would be better not to change existing names and behavior, but adjust documentation and add alternative behavior with another name.Therefore, I'm going to provide patchset of two patches:1) Improve word_similarity() documentation.2) Add new function strict_word_similarity() (or whatever better name we invent).
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
Attachment
> 0002-pg-trgm-strict_word-similarity.patch – implementation of > strict_word_similarity() with comments, docs and tests. The patch looks commmitable, but sometime I get *** ...pgsql/contrib/pg_trgm/expected/pg_strict_word_trgm.out 2017-12-12 14:16:55.190989000 +0300 --- .../pgsql/contrib/pg_trgm/results/pg_strict_word_trgm.out 2017-12-12 14:17:27.645639000 +0300 *************** *** 153,160 **** ----------+---------------------------------- 0 | Kabankala 0.25 | Kabankalan City Public Plaza - 0.416667 | Kabakala 0.416667 | Abankala 0.538462 | Kabikala 0.625 | Ntombankala School 0.642857 | Nehalla Bankalah Reserved Forest --- 153,160 ---- ----------+---------------------------------- 0 | Kabankala 0.25 | Kabankalan City Public Plaza 0.416667 | Abankala + 0.416667 | Kabakala 0.538462 | Kabikala 0.625 | Ntombankala School 0.642857 | Nehalla Bankalah Reserved Forest ====================================================================== Seems, some stability order should be added to tests -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> 0002-pg-trgm-strict_word-similarity.patch – implementation of > strict_word_similarity() with comments, docs and tests. The patch looks commmitable, but sometime I get *** ...pgsql/contrib/pg_trgm/expected/pg_strict_word_trgm.out 2017-12-12 14:16:55.190989000 +0300 --- .../pgsql/contrib/pg_trgm/results/pg_strict_word_trgm.out 2017-12-12 14:17:27.645639000 +0300 *************** *** 153,160 **** ----------+---------------------------------- 0 | Kabankala 0.25 | Kabankalan City Public Plaza - 0.416667 | Kabakala 0.416667 | Abankala 0.538462 | Kabikala 0.625 | Ntombankala School 0.642857 | Nehalla Bankalah Reserved Forest --- 153,160 ---- ----------+---------------------------------- 0 | Kabankala 0.25 | Kabankalan City Public Plaza 0.416667 | Abankala + 0.416667 | Kabakala 0.538462 | Kabikala 0.625 | Ntombankala School 0.642857 | Nehalla Bankalah Reserved Forest ====================================================================== Seems, some stability order should be added to tests -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> 0002-pg-trgm-strict_word-similarity.patch – implementation of > strict_word_similarity() with comments, docs and tests. After some looking in 1) repeated piece of code: + if (strategy == SimilarityStrategyNumber) + nlimit = similarity_threshold; + else if (strategy == WordSimilarityStrategyNumber) + nlimit = word_similarity_threshold; + else /* strategy == StrictWordSimilarityStrategyNumber */ + nlimit = strict_word_similarity_threshold; Isn't it better to move that piece to separate function? 2) calc_word_similarity(char *str1, int slen1, char *str2, int slen2, bool check_only, bool word_bounds) Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify adding new options in future. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> 0002-pg-trgm-strict_word-similarity.patch – implementation of > strict_word_similarity() with comments, docs and tests. After some looking in 1) repeated piece of code: + if (strategy == SimilarityStrategyNumber) + nlimit = similarity_threshold; + else if (strategy == WordSimilarityStrategyNumber) + nlimit = word_similarity_threshold; + else /* strategy == StrictWordSimilarityStrategyNumber */ + nlimit = strict_word_similarity_threshold; Isn't it better to move that piece to separate function? 2) calc_word_similarity(char *str1, int slen1, char *str2, int slen2, bool check_only, bool word_bounds) Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify adding new options in future. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> 0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement to > documentation of word_similarity() and related operators. I decided to give > formal definition first (what exactly it internally does), and then example and > some more human-understandable description. This patch also adjusts two > comments where lower and upper bounds mess up. I'm ready for commit that, but I'd like someone from native English speaker to check that. Thank you. And, suppose, this patch should be backpatched to 9.6 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> 0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement to > documentation of word_similarity() and related operators. I decided to give > formal definition first (what exactly it internally does), and then example and > some more human-understandable description. This patch also adjusts two > comments where lower and upper bounds mess up. I'm ready for commit that, but I'd like someone from native English speaker to check that. Thank you. And, suppose, this patch should be backpatched to 9.6 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
0002-pg-trgm-strict_word-similAfter some looking inarity.patch – implementation of strict_word_similarity() with comments, docs and tests.
1)
repeated piece of code:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
Isn't it better to move that piece to separate function?
2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)
Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify adding new options in future.
------
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
Attachment
0002-pg-trgm-strict_word-similAfter some looking inarity.patch – implementation of strict_word_similarity() with comments, docs and tests.
1)
repeated piece of code:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
Isn't it better to move that piece to separate function?
2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)
Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify adding new options in future.
------
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
Attachment
On Tue, Dec 12, 2017 at 2:33 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:0002-pg-trgm-strict_word-similAfter some looking inarity.patch – implementation of strict_word_similarity() with comments, docs and tests.
1)
repeated piece of code:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
Isn't it better to move that piece to separate function?Good point. Moved to separate function.2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)
Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify adding new options in future.Yep. I've introduced flags.Also, I've adjusted tests to make them stable (found example where TOP-8 distances are unique).Please, find revised patch in attachment.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
On Tue, Dec 12, 2017 at 2:33 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:0002-pg-trgm-strict_word-similAfter some looking inarity.patch – implementation of strict_word_similarity() with comments, docs and tests.
1)
repeated piece of code:
+ if (strategy == SimilarityStrategyNumber)
+ nlimit = similarity_threshold;
+ else if (strategy == WordSimilarityStrategyNumber)
+ nlimit = word_similarity_threshold;
+ else /* strategy == StrictWordSimilarityStrategyNumber */
+ nlimit = strict_word_similarity_threshold;
Isn't it better to move that piece to separate function?Good point. Moved to separate function.2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
bool check_only, bool word_bounds)
Seems, two bools args are replaceble to bitwise-ORed flag. It will simplify adding new options in future.Yep. I've introduced flags.Also, I've adjusted tests to make them stable (found example where TOP-8 distances are unique).Please, find revised patch in attachment.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
Hi Alexander, On 1/4/18 4:25 PM, Alexander Korotkov wrote: > > I just found that patch apply is failed according to > commitfest.cputube.org <http://commitfest.cputube.org>. I think it's > because I sent only second patch from patchset in last message. > Anyway I resend both patches rebased to current master. I agree with Teodor (upthread, not quoted here) that the documentation could use some editing. I started to do it myself, but quickly realized I have no knowledge of the content. I'm afraid I would destroy the meaning while updating the grammar. Anyone understand the subject matter well enough to review the documentation? Thanks, -- -David david@pgmasters.net
Hi Alexander, On 1/4/18 4:25 PM, Alexander Korotkov wrote: > > I just found that patch apply is failed according to > commitfest.cputube.org <http://commitfest.cputube.org>. I think it's > because I sent only second patch from patchset in last message. > Anyway I resend both patches rebased to current master. I agree with Teodor (upthread, not quoted here) that the documentation could use some editing. I started to do it myself, but quickly realized I have no knowledge of the content. I'm afraid I would destroy the meaning while updating the grammar. Anyone understand the subject matter well enough to review the documentation? Thanks, -- -David david@pgmasters.net
On 1/4/18 4:25 PM, Alexander Korotkov wrote:
>
> I just found that patch apply is failed according to
> commitfest.cputube.org <http://commitfest.cputube.org>. I think it's
> because I sent only second patch from patchset in last message.
> Anyway I resend both patches rebased to current master.
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.
I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.
Anyone understand the subject matter well enough to review the
documentation?
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 1/4/18 4:25 PM, Alexander Korotkov wrote:
>
> I just found that patch apply is failed according to
> commitfest.cputube.org <http://commitfest.cputube.org>. I think it's
> because I sent only second patch from patchset in last message.
> Anyway I resend both patches rebased to current master.
I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.
I started to do it myself, but quickly realized I have no knowledge of
the content. I'm afraid I would destroy the meaning while updating the
grammar.
Anyone understand the subject matter well enough to review the
documentation?
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi Alexander, On 3/1/18 4:26 PM, Alexander Korotkov wrote: > On Thu, Mar 1, 2018 at 11:05 PM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > I agree with Teodor (upthread, not quoted here) that the documentation > could use some editing. > > I started to do it myself, but quickly realized I have no knowledge of > the content. I'm afraid I would destroy the meaning while updating the > grammar. > > That's to problem. If you're willing to help you can edit the documentation > and let me review that it's correct. Also feel free to ask any > questions and > more explanation from me. Ultimately, we need to have a documentation > that any average user can understand, not to mention you :) OK, I'm the CFM so I have my plate full for the next few days but if nobody picks this up then I will give it a go. > Anyone understand the subject matter well enough to review the > documentation? > > I expect it would be hard to find anybody matching this criteria. You are probably right, but it never hurts to try. -- -David david@pgmasters.net
Hi Alexander, On 3/1/18 4:26 PM, Alexander Korotkov wrote: > On Thu, Mar 1, 2018 at 11:05 PM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > I agree with Teodor (upthread, not quoted here) that the documentation > could use some editing. > > I started to do it myself, but quickly realized I have no knowledge of > the content. I'm afraid I would destroy the meaning while updating the > grammar. > > That's to problem. If you're willing to help you can edit the documentation > and let me review that it's correct. Also feel free to ask any > questions and > more explanation from me. Ultimately, we need to have a documentation > that any average user can understand, not to mention you :) OK, I'm the CFM so I have my plate full for the next few days but if nobody picks this up then I will give it a go. > Anyone understand the subject matter well enough to review the > documentation? > > I expect it would be hard to find anybody matching this criteria. You are probably right, but it never hurts to try. -- -David david@pgmasters.net
> I agree with Teodor (upthread, not quoted here) that the documentation > could use some editing. > > I started to do it myself, but quickly realized I have no knowledge of > the content. I'm afraid I would destroy the meaning while updating the > grammar. > > Anyone understand the subject matter well enough to review the > documentation? Liudmila tried to improve docs in Alexander's patchset. https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru BTW, adding Liudmila's message to commitfest task (https://commitfest.postgresql.org/17/1403/) doesn't work -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I agree with Teodor (upthread, not quoted here) that the documentation > could use some editing. > > I started to do it myself, but quickly realized I have no knowledge of > the content. I'm afraid I would destroy the meaning while updating the > grammar. > > Anyone understand the subject matter well enough to review the > documentation? Liudmila tried to improve docs in Alexander's patchset. https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru BTW, adding Liudmila's message to commitfest task (https://commitfest.postgresql.org/17/1403/) doesn't work -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 3/6/18 7:04 AM, Teodor Sigaev wrote: >> I agree with Teodor (upthread, not quoted here) that the documentation >> could use some editing. >> >> I started to do it myself, but quickly realized I have no knowledge of >> the content. I'm afraid I would destroy the meaning while updating the >> grammar. >> >> Anyone understand the subject matter well enough to review the >> documentation? > > Liudmila tried to improve docs in Alexander's patchset. > > https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru This looks good to me with a few minor exceptions: + <function>word_similarity(text, text)</function> requires further + explanation. Consider the following example: Maybe too verbose? I think "<function>word_similarity(text, text)</function> requires further explanation." can be removed entirely. + string. However, this function does not add paddings to the "add padding" > BTW, adding Liudmila's message to commitfest task > (https://commitfest.postgresql.org/17/1403/) doesn't work Doesn't work for me either. Alexander, can you post the final patches to the thread so they show up in the CF app? Thanks, -- -David david@pgmasters.net
On 3/6/18 7:04 AM, Teodor Sigaev wrote: >> I agree with Teodor (upthread, not quoted here) that the documentation >> could use some editing. >> >> I started to do it myself, but quickly realized I have no knowledge of >> the content. I'm afraid I would destroy the meaning while updating the >> grammar. >> >> Anyone understand the subject matter well enough to review the >> documentation? > > Liudmila tried to improve docs in Alexander's patchset. > > https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru This looks good to me with a few minor exceptions: + <function>word_similarity(text, text)</function> requires further + explanation. Consider the following example: Maybe too verbose? I think "<function>word_similarity(text, text)</function> requires further explanation." can be removed entirely. + string. However, this function does not add paddings to the "add padding" > BTW, adding Liudmila's message to commitfest task > (https://commitfest.postgresql.org/17/1403/) doesn't work Doesn't work for me either. Alexander, can you post the final patches to the thread so they show up in the CF app? Thanks, -- -David david@pgmasters.net
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
>> I agree with Teodor (upthread, not quoted here) that the documentation
>> could use some editing.
>>
>> I started to do it myself, but quickly realized I have no knowledge of
>> the content. I'm afraid I would destroy the meaning while updating the
>> grammar.
>>
>> Anyone understand the subject matter well enough to review the
>> documentation?
>
> Liudmila tried to improve docs in Alexander's patchset.
>
> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8 b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+ <function>word_similarity(text, text)</function> requires further
+ explanation. Consider the following example:
Maybe too verbose? I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.
+ string. However, this function does not add paddings to the
"add padding"
> BTW, adding Liudmila's message to commitfest task
> (https://commitfest.postgresql.org/17/1403/) doesn't work
Doesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
On 3/6/18 7:04 AM, Teodor Sigaev wrote:
>> I agree with Teodor (upthread, not quoted here) that the documentation
>> could use some editing.
>>
>> I started to do it myself, but quickly realized I have no knowledge of
>> the content. I'm afraid I would destroy the meaning while updating the
>> grammar.
>>
>> Anyone understand the subject matter well enough to review the
>> documentation?
>
> Liudmila tried to improve docs in Alexander's patchset.
>
> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8 b-d37e9752cd93@postgrespro.ru
This looks good to me with a few minor exceptions:
+ <function>word_similarity(text, text)</function> requires further
+ explanation. Consider the following example:
Maybe too verbose? I think "<function>word_similarity(text,
text)</function> requires further explanation." can be removed entirely.
+ string. However, this function does not add paddings to the
"add padding"
> BTW, adding Liudmila's message to commitfest task
> (https://commitfest.postgresql.org/17/1403/) doesn't work
Doesn't work for me either.
Alexander, can you post the final patches to the thread so they show up
in the CF app?
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
Thank you, pushed David Steele wrote: > On 3/6/18 7:04 AM, Teodor Sigaev wrote: >>> I agree with Teodor (upthread, not quoted here) that the documentation >>> could use some editing. >>> >>> I started to do it myself, but quickly realized I have no knowledge of >>> the content. I'm afraid I would destroy the meaning while updating the >>> grammar. >>> >>> Anyone understand the subject matter well enough to review the >>> documentation? >> >> Liudmila tried to improve docs in Alexander's patchset. >> >> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru > > This looks good to me with a few minor exceptions: > > + <function>word_similarity(text, text)</function> requires further > + explanation. Consider the following example: > > Maybe too verbose? I think "<function>word_similarity(text, > text)</function> requires further explanation." can be removed entirely. > > + string. However, this function does not add paddings to the > > "add padding" > >> BTW, adding Liudmila's message to commitfest task >> (https://commitfest.postgresql.org/17/1403/) doesn't work > > Doesn't work for me either. > > Alexander, can you post the final patches to the thread so they show up > in the CF app? > > Thanks, > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Thank you, pushed David Steele wrote: > On 3/6/18 7:04 AM, Teodor Sigaev wrote: >>> I agree with Teodor (upthread, not quoted here) that the documentation >>> could use some editing. >>> >>> I started to do it myself, but quickly realized I have no knowledge of >>> the content. I'm afraid I would destroy the meaning while updating the >>> grammar. >>> >>> Anyone understand the subject matter well enough to review the >>> documentation? >> >> Liudmila tried to improve docs in Alexander's patchset. >> >> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru > > This looks good to me with a few minor exceptions: > > + <function>word_similarity(text, text)</function> requires further > + explanation. Consider the following example: > > Maybe too verbose? I think "<function>word_similarity(text, > text)</function> requires further explanation." can be removed entirely. > > + string. However, this function does not add paddings to the > > "add padding" > >> BTW, adding Liudmila's message to commitfest task >> (https://commitfest.postgresql.org/17/1403/) doesn't work > > Doesn't work for me either. > > Alexander, can you post the final patches to the thread so they show up > in the CF app? > > Thanks, > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Hi everyone, When translating doc updates, Alexander Lakhin noticed that trigram examples were not quite accurate. A small patch fixing this issue is attached. On 03/21/2018 03:35 PM, Teodor Sigaev wrote: > Thank you, pushed > > David Steele wrote: >> On 3/6/18 7:04 AM, Teodor Sigaev wrote: >>>> I agree with Teodor (upthread, not quoted here) that the documentation >>>> could use some editing. >>>> >>>> I started to do it myself, but quickly realized I have no knowledge of >>>> the content. I'm afraid I would destroy the meaning while updating >>>> the >>>> grammar. >>>> >>>> Anyone understand the subject matter well enough to review the >>>> documentation? >>> >>> Liudmila tried to improve docs in Alexander's patchset. >>> >>> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru >>> >> >> This looks good to me with a few minor exceptions: >> >> + <function>word_similarity(text, text)</function> requires further >> + explanation. Consider the following example: >> >> Maybe too verbose? I think "<function>word_similarity(text, >> text)</function> requires further explanation." can be removed entirely. >> >> + string. However, this function does not add paddings to the >> >> "add padding" >> >>> BTW, adding Liudmila's message to commitfest task >>> (https://commitfest.postgresql.org/17/1403/) doesn't work >> >> Doesn't work for me either. >> >> Alexander, can you post the final patches to the thread so they show up >> in the CF app? >> >> Thanks, >> > -- Liudmila Mantrova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi everyone, When translating doc updates, Alexander Lakhin noticed that trigram examples were not quite accurate. A small patch fixing this issue is attached. On 03/21/2018 03:35 PM, Teodor Sigaev wrote: > Thank you, pushed > > David Steele wrote: >> On 3/6/18 7:04 AM, Teodor Sigaev wrote: >>>> I agree with Teodor (upthread, not quoted here) that the documentation >>>> could use some editing. >>>> >>>> I started to do it myself, but quickly realized I have no knowledge of >>>> the content. I'm afraid I would destroy the meaning while updating >>>> the >>>> grammar. >>>> >>>> Anyone understand the subject matter well enough to review the >>>> documentation? >>> >>> Liudmila tried to improve docs in Alexander's patchset. >>> >>> https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru >>> >> >> This looks good to me with a few minor exceptions: >> >> + <function>word_similarity(text, text)</function> requires further >> + explanation. Consider the following example: >> >> Maybe too verbose? I think "<function>word_similarity(text, >> text)</function> requires further explanation." can be removed entirely. >> >> + string. However, this function does not add paddings to the >> >> "add padding" >> >>> BTW, adding Liudmila's message to commitfest task >>> (https://commitfest.postgresql.org/17/1403/) doesn't work >> >> Doesn't work for me either. >> >> Alexander, can you post the final patches to the thread so they show up >> in the CF app? >> >> Thanks, >> > -- Liudmila Mantrova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Apr 16, 2018 at 07:48:47PM +0300, Liudmila Mantrova wrote: > Hi everyone, > > When translating doc updates, Alexander Lakhin noticed that trigram examples > were not quite accurate. > A small patch fixing this issue is attached. FYI, this has been applied by Teodor Sigaev: https://git.postgresql.org/pg/commitdiff/9975c128a1d1bd7e7366adf133b21540a2bc2450 --------------------------------------------------------------------------- > > > On 03/21/2018 03:35 PM, Teodor Sigaev wrote: > >Thank you, pushed > > > >David Steele wrote: > >>On 3/6/18 7:04 AM, Teodor Sigaev wrote: > >>>>I agree with Teodor (upthread, not quoted here) that the documentation > >>>>could use some editing. > >>>> > >>>>I started to do it myself, but quickly realized I have no knowledge of > >>>>the content. I'm afraid I would destroy the meaning while updating > >>>>the > >>>>grammar. > >>>> > >>>>Anyone understand the subject matter well enough to review the > >>>>documentation? > >>> > >>>Liudmila tried to improve docs in Alexander's patchset. > >>> > >>>https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru > >>> > >> > >>This looks good to me with a few minor exceptions: > >> > >>+ <function>word_similarity(text, text)</function> requires further > >>+ explanation. Consider the following example: > >> > >>Maybe too verbose? I think "<function>word_similarity(text, > >>text)</function> requires further explanation." can be removed entirely. > >> > >>+ string. However, this function does not add paddings to the > >> > >>"add padding" > >> > >>>BTW, adding Liudmila's message to commitfest task > >>>(https://commitfest.postgresql.org/17/1403/) doesn't work > >> > >>Doesn't work for me either. > >> > >>Alexander, can you post the final patches to the thread so they show up > >>in the CF app? > >> > >>Thanks, > >> > > > > -- > Liudmila Mantrova > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml > index 8f39529..be43cdf 100644 > --- a/doc/src/sgml/pgtrgm.sgml > +++ b/doc/src/sgml/pgtrgm.sgml > @@ -152,9 +152,9 @@ > </programlisting> > > In the first string, the set of trigrams is > - <literal>{" w"," wo","ord","wor","rd "}</literal>. > + <literal>{" w"," wo","wor","ord","rd "}</literal>. > In the second string, the ordered set of trigrams is > - <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>. > + <literal>{" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}</literal>. > The most similar extent of an ordered set of trigrams in the second string > is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is > <literal>0.8</literal>. > @@ -172,7 +172,7 @@ > At the same time, <function>strict_word_similarity(text, text)</function> > has to select an extent that matches word boundaries. In the example above, > <function>strict_word_similarity(text, text)</function> would select the > - extent <literal>{" w"," wo","wor","ord","rds", ds "}</literal>, which > + extent <literal>{" w"," wo","wor","ord","rds","ds "}</literal>, which > corresponds to the whole word <literal>'words'</literal>. > > <programlisting> -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Mon, Apr 16, 2018 at 07:48:47PM +0300, Liudmila Mantrova wrote: > Hi everyone, > > When translating doc updates, Alexander Lakhin noticed that trigram examples > were not quite accurate. > A small patch fixing this issue is attached. FYI, this has been applied by Teodor Sigaev: https://git.postgresql.org/pg/commitdiff/9975c128a1d1bd7e7366adf133b21540a2bc2450 --------------------------------------------------------------------------- > > > On 03/21/2018 03:35 PM, Teodor Sigaev wrote: > >Thank you, pushed > > > >David Steele wrote: > >>On 3/6/18 7:04 AM, Teodor Sigaev wrote: > >>>>I agree with Teodor (upthread, not quoted here) that the documentation > >>>>could use some editing. > >>>> > >>>>I started to do it myself, but quickly realized I have no knowledge of > >>>>the content. I'm afraid I would destroy the meaning while updating > >>>>the > >>>>grammar. > >>>> > >>>>Anyone understand the subject matter well enough to review the > >>>>documentation? > >>> > >>>Liudmila tried to improve docs in Alexander's patchset. > >>> > >>>https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752cd93@postgrespro.ru > >>> > >> > >>This looks good to me with a few minor exceptions: > >> > >>+ <function>word_similarity(text, text)</function> requires further > >>+ explanation. Consider the following example: > >> > >>Maybe too verbose? I think "<function>word_similarity(text, > >>text)</function> requires further explanation." can be removed entirely. > >> > >>+ string. However, this function does not add paddings to the > >> > >>"add padding" > >> > >>>BTW, adding Liudmila's message to commitfest task > >>>(https://commitfest.postgresql.org/17/1403/) doesn't work > >> > >>Doesn't work for me either. > >> > >>Alexander, can you post the final patches to the thread so they show up > >>in the CF app? > >> > >>Thanks, > >> > > > > -- > Liudmila Mantrova > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml > index 8f39529..be43cdf 100644 > --- a/doc/src/sgml/pgtrgm.sgml > +++ b/doc/src/sgml/pgtrgm.sgml > @@ -152,9 +152,9 @@ > </programlisting> > > In the first string, the set of trigrams is > - <literal>{" w"," wo","ord","wor","rd "}</literal>. > + <literal>{" w"," wo","wor","ord","rd "}</literal>. > In the second string, the ordered set of trigrams is > - <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>. > + <literal>{" t"," tw","two","wo "," w"," wo","wor","ord","rds","ds "}</literal>. > The most similar extent of an ordered set of trigrams in the second string > is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is > <literal>0.8</literal>. > @@ -172,7 +172,7 @@ > At the same time, <function>strict_word_similarity(text, text)</function> > has to select an extent that matches word boundaries. In the example above, > <function>strict_word_similarity(text, text)</function> would select the > - extent <literal>{" w"," wo","wor","ord","rds", ds "}</literal>, which > + extent <literal>{" w"," wo","wor","ord","rds","ds "}</literal>, which > corresponds to the whole word <literal>'words'</literal>. > > <programlisting> -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +