Home > mailing lists

Re: strange behavior of pg_trgm's similarity function - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: strange behavior of pg_trgm's similarity function
Date	October 10, 2013 12:12:56
Msg-id	525699BE.9040508@vmware.com Whole thread
In response to	strange behavior of pg_trgm's similarity function (Fujii Masao <masao.fujii@gmail.com>)
Responses	Re: strange behavior of pg_trgm's similarity function
List	pgsql-hackers

Tree view

On 10.10.2013 15:03, Fujii Masao wrote:
> Hi,
>
> The behavior of pg_trgm's similarity function seems strange. Is this
> intentional?
>
> I was thinking that the following three calls of the similarity function return
> the same number because the second argument is just the three characters
> contained in the first argument in every calls.
>
> =# SELECT similarity('12345', '123');
> =# SELECT similarity('12345', '234');
> =# SELECT similarity('12345', '345');
>
> But that's not true. Each returns the different number.
>
> =# SELECT similarity('12345', '123');
>   similarity
> ------------
>     0.428571
> (1 row)
>
> =# SELECT similarity('12345', '234');
>   similarity
> ------------
>     0.111111
> (1 row)
>
> =# SELECT similarity('12345', '345');
>   similarity
> ------------
>         0.25
> (1 row)
>
> This happens because, for example, similarity('12345', '123') returns
> the similarity number of '**12345*' and '**123*' (* means the blank character),
> NOT '12345' and '123'. IOW, two and one blank characters are added into
> the heading and tailing of each argument, respectively. I wonder why
> pg_trgm's similarity function works in this way. We should change this
> so that no blank characters are added into the arguments?

Well, you could also argue that "111111" and "222222" are quite similar, 
even though pg_trgm's similarity will not think so. It comes down to the 
definition of similarity, and how well that definition matches your 
intuition.

FWIW, it feels right to me that a match in the beginning of a word is 
worth more than one in the middle of a string. -1 on changing that.

- Heikki

pgsql-hackers by date:

From: Fujii Masao
Date: 10 October 2013, 12:03:33
Subject: strange behavior of pg_trgm's similarity function

From: Stephen Frost
Date: 10 October 2013, 12:28:08
Subject: Re: Auto-tuning work_mem and maintenance_work_mem

Re: strange behavior of pg_trgm's similarity function - Mailing list pgsql-hackers

Previous

Next