Thread: Lower or Upper case for F.33. pg_trgm

Lower or Upper case for F.33. pg_trgm

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/pgtrgm.html
Description:

Hey guys,

I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:

Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?

Happy to get a short feedback from you,

Greetings, Marc

Re: Lower or Upper case for F.33. pg_trgm

From
Daniel Gustafsson
Date:
> On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:

> I have a question regarding the trigram algorithm and I can not find any
> information about it in your documentation:

Maybe we should add something about this?

> Do you distinguish between lower and uppercase? Or do you consider all words
> in lowercase?

There is support for compiling pg_trgm case sensitive, but it's by default case
insensitive.

# SELECT word_similarity('word', 'WORD');
 word_similarity
-----------------
               1
(1 row)

> Happy to get a short feedback from you,

I would recommend the pg_general mailinglist as that will be a safer way to get
general questions answered.

--
Daniel Gustafsson        https://vmware.com/




Re: Lower or Upper case for F.33. pg_trgm

From
Erik Rijkers
Date:

Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
>> On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
> 
>> I have a question regarding the trigram algorithm and I can not find any
>> information about it in your documentation:
> 
> Maybe we should add something about this?

Yeah, it's a bit strange that none of the following strings yield any 
info on that page:  'case', 'sensitiv', 'upper', 'lower', and that there 
is no mention of the  ~  versus  ~*  difference.

Maybe worth to (already in pgtrgm.html) give the simple hint:
   ~  is case-sensitive
   ~* is case-insensitive


In any case a link to  functions-matching.html  seems indicated.


Erik Rijkers


> 
>> Do you distinguish between lower and uppercase? Or do you consider all words
>> in lowercase?
> 
> There is support for compiling pg_trgm case sensitive, but it's by default case
> insensitive.
> 
> # SELECT word_similarity('word', 'WORD');
>   word_similarity
> -----------------
>                 1
> (1 row)
> 
>> Happy to get a short feedback from you,
> 
> I would recommend the pg_general mailinglist as that will be a safer way to get
> general questions answered.
> 
> --
> Daniel Gustafsson        https://vmware.com/
> 
> 
> 



Re: Lower or Upper case for F.33. pg_trgm

From
Daniel Gustafsson
Date:
> On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
>>> On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
>>> I have a question regarding the trigram algorithm and I can not find any
>>> information about it in your documentation:
>> Maybe we should add something about this?
>
> Yeah, it's a bit strange that none of the following strings yield any info on that page:  'case', 'sensitiv',
'upper','lower', and that there is no mention of the  ~  versus  ~*  difference. 
>
> Maybe worth to (already in pgtrgm.html) give the simple hint:
>  ~  is case-sensitive
>  ~* is case-insensitive
>
> In any case a link to  functions-matching.html  seems indicated.

Yeah, I think there is room for improvements here.  Are you up for drafting a
patch for this?

--
Daniel Gustafsson        https://vmware.com/




Re: Lower or Upper case for F.33. pg_trgm

From
"Marc M."
Date:
Thanks for your fast response.

Is this a question for me? I am fine with a short hint regarding the default.
A link to another documentation is also fine.

Am Di., 16. Aug. 2022 um 13:46 Uhr schrieb Daniel Gustafsson <daniel@yesql.se>:
> On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
>>> On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
>>> I have a question regarding the trigram algorithm and I can not find any
>>> information about it in your documentation:
>> Maybe we should add something about this?
>
> Yeah, it's a bit strange that none of the following strings yield any info on that page:  'case', 'sensitiv', 'upper', 'lower', and that there is no mention of the  ~  versus  ~*  difference.
>
> Maybe worth to (already in pgtrgm.html) give the simple hint:
>  ~  is case-sensitive
>  ~* is case-insensitive
>
> In any case a link to  functions-matching.html  seems indicated.

Yeah, I think there is room for improvements here.  Are you up for drafting a
patch for this?

--
Daniel Gustafsson               https://vmware.com/

Re: Lower or Upper case for F.33. pg_trgm

From
Erik Rijkers
Date:
Op 16-08-2022 om 13:46 schreef Daniel Gustafsson:
>> On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:
>>
>> Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
>>>> On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
>>>> I have a question regarding the trigram algorithm and I can not find any
>>>> information about it in your documentation:
>>> Maybe we should add something about this?
>>
>> Yeah, it's a bit strange that none of the following strings yield any info on that page:  'case', 'sensitiv',
'upper','lower', and that there is no mention of the  ~  versus  ~*  difference.
 
>>
>> Maybe worth to (already in pgtrgm.html) give the simple hint:
>>   ~  is case-sensitive
>>   ~* is case-insensitive
>>
>> In any case a link to  functions-matching.html  seems indicated.
> 
> Yeah, I think there is room for improvements here.  Are you up for drafting a
> patch for this?
> 

How is this?

(bluntly stating 'similarity comparisons are case-insensitive' - 
although I'm not really sure..)


Erik

> --
> Daniel Gustafsson        https://vmware.com/
>
Attachment

Re: Lower or Upper case for F.33. pg_trgm

From
Tom Lane
Date:
Erik Rijkers <er@xs4all.nl> writes:
> (bluntly stating 'similarity comparisons are case-insensitive' - 
> although I'm not really sure..)

Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.

            regards, tom lane



Re: Lower or Upper case for F.33. pg_trgm

From
"Marc M."
Date:
Sounds good to me. 

Am Di., 16. Aug. 2022 um 15:53 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:
Erik Rijkers <er@xs4all.nl> writes:
> (bluntly stating 'similarity comparisons are case-insensitive' -
> although I'm not really sure..)

Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.

                        regards, tom lane

Re: Lower or Upper case for F.33. pg_trgm

From
Daniel Gustafsson
Date:
> On 16 Aug 2022, at 15:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Erik Rijkers <er@xs4all.nl> writes:
>> (bluntly stating 'similarity comparisons are case-insensitive' -
>> although I'm not really sure..)
>
> Perhaps like "similarity comparisons are case-insensitive in a
> standard build of pg_trgm", if you want to nod to the existence
> of a compile option without going into detail.

Looking at this I'm leaning towards paring down the diff posted upthread with
pretty much this, I think that will provide value while avoid causing
confusion.

As a related side note, there are four instances of "case insensitive{ly}" in
the docs with all other instances using "case-insensitive{ly}".  I'm inclined
to fix those four to use a dash while at it to be consistent across all pages.

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Lower or Upper case for F.33. pg_trgm

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> Looking at this I'm leaning towards paring down the diff posted upthread with
> pretty much this, I think that will provide value while avoid causing
> confusion.

WFM.

> As a related side note, there are four instances of "case insensitive{ly}" in
> the docs with all other instances using "case-insensitive{ly}".  I'm inclined
> to fix those four to use a dash while at it to be consistent across all pages.

+1

            regards, tom lane