Thread: pg_trgm module: no convertion into Trigrams on one side when comparing

pg_trgm module: no convertion into Trigrams on one side when comparing

From
"Janek Sendrowski"
Date:
Hi,
I'm using the pg_trgm module,
Is there a possibility not to convert one side of the string when comparing?
I need a kind of reference string containing only certain trigrams.
It's not possible to build every kind of string using the function.

example:
Code: Alles auswählen
SELECT show_trgm('abc');
show_trgm
-------------------------
{" a"," ab",abc,"bc "}
(1 row)

But I like to have only {'abc'} for example.

I would use this function: "similarity(text, text)"
Could it somehow work with explicit data-type definitions or maybe with putting the string in brackets or quoting?
Or do I have to change the source code?

Thanks for your help

Janek Sendrowski


Re: pg_trgm module: no convertion into Trigrams on one side when comparing

From
Amit Langote
Date:
On Thu, Nov 14, 2013 at 2:37 AM, Janek Sendrowski <janek12@web.de> wrote:
> Hi,
> I'm using the pg_trgm module,
> Is there a possibility not to convert one side of the string when comparing?
> I need a kind of reference string containing only certain trigrams.
> It's not possible to build every kind of string using the function.
>
> example:
> Code: Alles auswählen
> SELECT show_trgm('abc');
> show_trgm
> -------------------------
> {" a"," ab",abc,"bc "}
> (1 row)
>
> But I like to have only {'abc'} for example.
>
> I would use this function: "similarity(text, text)"
> Could it somehow work with explicit data-type definitions or maybe with putting the string in brackets or quoting?
> Or do I have to change the source code?
>

I guess you're looking for simple pattern matching; something like:

column LIKE '%abc%'

postgres=# create table foo(a text);
CREATE TABLE
postgres=# insert into foo values ('ab'), ('abc'), ('gabcd'), ('xabf');
INSERT 0 4

-- similarity() threshold
postgres=# select show_limit();
 show_limit
------------
        0.3
(1 row)

-- '%' is similarity operator which returns true if column value is
"sufficiently similar" to key (in this case 'abc'). This is determined
by the number of tri-grams two strings share.
postgres=# select *,similarity(a, 'abc') from foo where a % 'abc';
  a  | similarity
-----+------------
 ab  |        0.4
 abc |          1
(2 rows)

-- And finally, probably what you're looking for. Simple pattern matching.
postgres=# select * from foo where a LIKE '%abc%';
   a
-------
 abc
 gabcd
(2 rows)

-- You could go ahead and add more trigrams that you'd want result to contain.
postgres=# insert into foo values ('gabcddfg');
INSERT 0 4

postgres=# select * from foo where a LIKE '%abc%dfg%';
    a
----------
 gabcddfg
(1 row)

Is this what you want?

--
Amit


Re: pg_trgm module: no convertion into Trigrams on one side when comparing

From
"Janek Sendrowski"
Date:
Hi Amit,
 
Thanks for your answer.

My issue is that I still need a ranking like the similarity when comparing trigrams.
I'm working on a similarity search, which determindes similiar sentences from my table.

The usual similarity search with Trigrams and a gist bzw. gin index is to slow with my set of rows.

So I've built an Fixed Query Array. I took some sentences as pivots and calculated the distance between them and all my
sentences.
Now I can search for similiar sentences by calculating their distance to my pivots and comparing them with the
distancesI already got. 
I noticed that it's a good way to use trigrams as metric distance function.
 
So the issue is that I have to take certain trigrams as pivots and of cource I need to store an exact ranking when
comparing.
Your method detect the rows which includes all of the trigrams you are searching for, but I still need a ranking...

something like SELECT similarity('How are your', "{" ho","how","are","re "," hi","wha","hat","at ","  a"}"::trgm);
 
I hope you have an idea
 
Janek Sendrowski
 
 
 
 
 
 
 


Re: pg_trgm module: no convertion into Trigrams on one side when comparing

From
Amit Langote
Date:
On Thu, Nov 14, 2013 at 9:05 PM, Janek Sendrowski <janek12@web.de> wrote:
> Hi Amit,
>
> Thanks for your answer.
>
> My issue is that I still need a ranking like the similarity when comparing trigrams.
> I'm working on a similarity search, which determindes similiar sentences from my table.
>
> The usual similarity search with Trigrams and a gist bzw. gin index is to slow with my set of rows.
>
> So I've built an Fixed Query Array. I took some sentences as pivots and calculated the distance between them and all
mysentences. 
> Now I can search for similiar sentences by calculating their distance to my pivots and comparing them with the
distancesI already got. 
> I noticed that it's a good way to use trigrams as metric distance function.
>
> So the issue is that I have to take certain trigrams as pivots and of cource I need to store an exact ranking when
comparing.
> Your method detect the rows which includes all of the trigrams you are searching for, but I still need a ranking...
>
> something like SELECT similarity('How are your', "{" ho","how","are","re "," hi","wha","hat","at ","  a"}"::trgm);
>
> I hope you have an idea

If I understand you correctly, you want to find a similarity value
based NOT on set of tri-grams internally produced by pg_trgm but a
custom set of tri-grams you have chosen. So, you are looking for a way
to provide pg_trgm a custom set of tri-grams to be used for similarity
calculation.

If that is the case, I am not sure if there is any way/interface
currently to do that.

--
Amit


Re: pg_trgm module: no convertion into Trigrams on one side when comparing

From
"Janek Sendrowski"
Date:
Hi Amit,
Do you think it would by difficult to edit the source and wirte a new function, wich does it?
 
Janek Sendrowski