Thread: [GENERAL] Use full text to rank results higher if they are "closer hit"
Hello, I think it's easier to explain my question with example code: ------------------------ CREATE TABLE t ( s VARCHAR ); CREATE TABLE INSERT INTO t VALUES ('hello'), ('hello world'); INSERT 0 2 SELECT * FROM t; s ------------- hello hello world (2 rows) SELECT s, ts_rank(vector, query) AS rank FROM t, to_tsvector(s) vector, to_tsquery('hello') query WHERE query @@ vector; s | rank -------------+----------- hello | 0.0607927 hello world | 0.0607927 (2 rows) ------------------------ Here both 'hello' and 'hello world' are ranked equally highly when searching with 'hello'. What I'm wondering is, is there a way within postgres to have it match higher to just 'hello' than 'hello world'? I.e. something like it slightly down-weights extraneous terms? Of course in general I don't know the query or the field strings ahead of time. Thanks for any help! Cheers, Thomas
On 14.02.2017 18:35, Thomas Nyberg wrote: > > Here both 'hello' and 'hello world' are ranked equally highly when > searching with 'hello'. What I'm wondering is, is there a way within > postgres to have it match higher to just 'hello' than 'hello world'? > I.e. something like it slightly down-weights extraneous terms? Of course > in general I don't know the query or the field strings ahead of time. > > Thanks for any help! > > Cheers, > Thomas > > Hello, try the query: SELECT s, ts_rank(vector, query) AS rank FROM t, to_tsvector(s) vector, to_tsquery('hello') query WHERE query @@ vector; s | rank -------------+----------- hello | 0.0607927 hello world | 0.0303964 (2 rows) And read about **normalization** in [1] https://www.postgresql.org/docs/current/static/textsearch-controls.html -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 14.02.2017 18:57, Artur Zakirov wrote: > > Hello, > > try the query: > > SELECT s, ts_rank(vector, query) AS rank > FROM t, to_tsvector(s) vector, to_tsquery('hello') query > WHERE query @@ vector; > s | rank > -------------+----------- > hello | 0.0607927 > hello world | 0.0303964 > (2 rows) Sorry, the query is: SELECT s, ts_rank(vector, query, 2) AS rank FROM t, to_tsvector(s) vector, to_tsquery('hello') query WHERE query @@ vector; -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 02/14/2017 07:35 AM, Thomas Nyberg wrote: > Hello, > > I think it's easier to explain my question with example code: > > ------------------------ > CREATE TABLE t ( s VARCHAR ); > CREATE TABLE > > INSERT INTO t VALUES ('hello'), ('hello world'); > INSERT 0 2 > > SELECT * FROM t; > s > ------------- > hello > hello world > (2 rows) > > SELECT s, ts_rank(vector, query) AS rank > FROM t, to_tsvector(s) vector, to_tsquery('hello') query > WHERE query @@ vector; > s | rank > -------------+----------- > hello | 0.0607927 > hello world | 0.0607927 > (2 rows) > ------------------------ > > Here both 'hello' and 'hello world' are ranked equally highly when > searching with 'hello'. What I'm wondering is, is there a way within > postgres to have it match higher to just 'hello' than 'hello world'? > I.e. something like it slightly down-weights extraneous terms? Of course > in general I don't know the query or the field strings ahead of time. Some digging around found this: https://www.postgresql.org/docs/9.6/static/textsearch-controls.html#TEXTSEARCH-RANKING Setting a normalization of 1: test=# SELECT s, ts_rank(vector, query, 1) AS rank FROM t, to_tsvector(s) vector, to_tsquery('hello') query WHERE query @@ vector; s | rank -------------+----------- hello | 0.0607927 hello world | 0.0383559 > > Thanks for any help! > > Cheers, > Thomas > > -- Adrian Klaver adrian.klaver@aklaver.com
Thomas Nyberg <tomuxiong@gmx.com> writes: > Here both 'hello' and 'hello world' are ranked equally highly when > searching with 'hello'. What I'm wondering is, is there a way within > postgres to have it match higher to just 'hello' than 'hello world'? > I.e. something like it slightly down-weights extraneous terms? Of course > in general I don't know the query or the field strings ahead of time. Read the documentation for ts_rank --- there's a normalization option for that. I'd also suggest you might prefer using ts_rank_cd, which is supposed to penalize cases where the matching words aren't close together. regards, tom lane
Excellent great info! To save the extra mailing list pings, thanks to _everyone_ this is exactly what I was looking for. Cheers, Thomas