Re: Text search prefix matching and stop words - Mailing list pgsql-bugs

From Pavel Borisov
Subject Re: Text search prefix matching and stop words
Date
Msg-id CALT9ZEG-i0prBw5N7pMAPqL_Kj=g_xK-oKjumE6-q0TVvOfB4A@mail.gmail.com
Whole thread Raw
In response to Text search prefix matching and stop words  ("Matthew Nelson" <mnelson@binarykeep.com>)
Responses Re: Text search prefix matching and stop words  (Pavel Borisov <pashkin.elfe@gmail.com>)
Re: Text search prefix matching and stop words  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Text search prefix matching and stop words  (Artur Zakirov <zaartur@gmail.com>)
List pgsql-bugs
Prefix matching should not omit stop words, as matching lexemes may legitimately begin with stop words.

# select to_tsquery('english', 'over:*') @@ to_tsvector('english', 'overhaul');
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored
 ?column?
----------
 f
(1 row)

I noticed this after implementing interactive, incremental search in an application. As the user typed "overhaul," with each successive character executing a search, "ove" and "overh" matched a particular document, but "over" did not.

Big thanks for the reporting! 

I am not sure that it is a bug. I think this is a way how to_tsquery conversion work: stopwords first then template processing.

If you want to process successive characters typing, you can use casting to tsvector type until input is not finished

'over:*'::tsquery;

and when the user finishes input then process the result via to_tsquery with stop words.

if we do to_tsquery in a way you described I expect it will never apply the stop-word filter on templated input as it can not be compared to stop words.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

pgsql-bugs by date:

Previous
From: "Matthew Nelson"
Date:
Subject: Text search prefix matching and stop words
Next
From: Pavel Borisov
Date:
Subject: Re: Text search prefix matching and stop words