Thread: pg_trgm module: no convertion into Trigrams on one side when comparing
pg_trgm module: no convertion into Trigrams on one side when comparing
From
"Janek Sendrowski"
Date:
Hi, I'm using the pg_trgm module, Is there a possibility not to convert one side of the string when comparing? I need a kind of reference string containing only certain trigrams. It's not possible to build every kind of string using the function. example: Code: Alles auswählen SELECT show_trgm('abc'); show_trgm ------------------------- {" a"," ab",abc,"bc "} (1 row) But I like to have only {'abc'} for example. I would use this function: "similarity(text, text)" Could it somehow work with explicit data-type definitions or maybe with putting the string in brackets or quoting? Or do I have to change the source code? Thanks for your help Janek Sendrowski
On Thu, Nov 14, 2013 at 2:37 AM, Janek Sendrowski <janek12@web.de> wrote: > Hi, > I'm using the pg_trgm module, > Is there a possibility not to convert one side of the string when comparing? > I need a kind of reference string containing only certain trigrams. > It's not possible to build every kind of string using the function. > > example: > Code: Alles auswählen > SELECT show_trgm('abc'); > show_trgm > ------------------------- > {" a"," ab",abc,"bc "} > (1 row) > > But I like to have only {'abc'} for example. > > I would use this function: "similarity(text, text)" > Could it somehow work with explicit data-type definitions or maybe with putting the string in brackets or quoting? > Or do I have to change the source code? > I guess you're looking for simple pattern matching; something like: column LIKE '%abc%' postgres=# create table foo(a text); CREATE TABLE postgres=# insert into foo values ('ab'), ('abc'), ('gabcd'), ('xabf'); INSERT 0 4 -- similarity() threshold postgres=# select show_limit(); show_limit ------------ 0.3 (1 row) -- '%' is similarity operator which returns true if column value is "sufficiently similar" to key (in this case 'abc'). This is determined by the number of tri-grams two strings share. postgres=# select *,similarity(a, 'abc') from foo where a % 'abc'; a | similarity -----+------------ ab | 0.4 abc | 1 (2 rows) -- And finally, probably what you're looking for. Simple pattern matching. postgres=# select * from foo where a LIKE '%abc%'; a ------- abc gabcd (2 rows) -- You could go ahead and add more trigrams that you'd want result to contain. postgres=# insert into foo values ('gabcddfg'); INSERT 0 4 postgres=# select * from foo where a LIKE '%abc%dfg%'; a ---------- gabcddfg (1 row) Is this what you want? -- Amit
Re: pg_trgm module: no convertion into Trigrams on one side when comparing
From
"Janek Sendrowski"
Date:
Hi Amit, Thanks for your answer. My issue is that I still need a ranking like the similarity when comparing trigrams. I'm working on a similarity search, which determindes similiar sentences from my table. The usual similarity search with Trigrams and a gist bzw. gin index is to slow with my set of rows. So I've built an Fixed Query Array. I took some sentences as pivots and calculated the distance between them and all my sentences. Now I can search for similiar sentences by calculating their distance to my pivots and comparing them with the distancesI already got. I noticed that it's a good way to use trigrams as metric distance function. So the issue is that I have to take certain trigrams as pivots and of cource I need to store an exact ranking when comparing. Your method detect the rows which includes all of the trigrams you are searching for, but I still need a ranking... something like SELECT similarity('How are your', "{" ho","how","are","re "," hi","wha","hat","at "," a"}"::trgm); I hope you have an idea Janek Sendrowski
On Thu, Nov 14, 2013 at 9:05 PM, Janek Sendrowski <janek12@web.de> wrote: > Hi Amit, > > Thanks for your answer. > > My issue is that I still need a ranking like the similarity when comparing trigrams. > I'm working on a similarity search, which determindes similiar sentences from my table. > > The usual similarity search with Trigrams and a gist bzw. gin index is to slow with my set of rows. > > So I've built an Fixed Query Array. I took some sentences as pivots and calculated the distance between them and all mysentences. > Now I can search for similiar sentences by calculating their distance to my pivots and comparing them with the distancesI already got. > I noticed that it's a good way to use trigrams as metric distance function. > > So the issue is that I have to take certain trigrams as pivots and of cource I need to store an exact ranking when comparing. > Your method detect the rows which includes all of the trigrams you are searching for, but I still need a ranking... > > something like SELECT similarity('How are your', "{" ho","how","are","re "," hi","wha","hat","at "," a"}"::trgm); > > I hope you have an idea If I understand you correctly, you want to find a similarity value based NOT on set of tri-grams internally produced by pg_trgm but a custom set of tri-grams you have chosen. So, you are looking for a way to provide pg_trgm a custom set of tri-grams to be used for similarity calculation. If that is the case, I am not sure if there is any way/interface currently to do that. -- Amit
Re: pg_trgm module: no convertion into Trigrams on one side when comparing
From
"Janek Sendrowski"
Date:
Hi Amit, Do you think it would by difficult to edit the source and wirte a new function, wich does it? Janek Sendrowski