Thread: Full text search - wildcard and a stop word

Full text search - wildcard and a stop word

From

Allan Jardine

Date:

22 February 2022, 15:22:41

Hi all,

I'm venturing into full text search in Postgres for the first time, and I'd like to be able to do a search by the start of a word - so I used the `:*` operator. However, this doesn't operate as I'd expect with a stop word - for example, my name is "Allan" so I often use it as a test string. It contains `all` which is a stop word, which is how I noticed this issue.

To illustrate:

=> select to_tsquery('al:*');
to_tsquery
------------
'al':*
(1 row)

=> select to_tsquery('all:*');
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)

=> select to_tsquery('alla:*');
to_tsquery
------------
'alla':*
(1 row)

I get why that is happening - the notification basically details it, but the wildcard at the end seems to me that it should return `'all':*` in this case? Is this by design or could it be considered a bug? I'm using Postgres 12.10.

Thanks,
Allan

Re: Full text search - wildcard and a stop word

From

Tom Lane

Date:

22 February 2022, 15:56:23

Allan Jardine <allan.jardine@sprymedia.co.uk> writes:
> => select to_tsquery('all:*');
> NOTICE:  text-search query contains only stop words or doesn't contain
> lexemes, ignored
>  to_tsquery
> ------------
> (1 row)

> I get why that is happening - the notification basically details it, but
> the wildcard at the end seems to me that it should return `'all':*` in this
> case? Is this by design or could it be considered a bug?

It's a hard problem.  If we don't normalize the presented word, we risk
not matching cases that users would expect to match (because the word
is going to be compared to data that probably *was* normalized).

In this particular case, you can skip the normalization by just not
using to_tsquery:

n=# select 'all:*'::tsquery;
 tsquery 
---------
 'all':*
(1 row)

but that might or might not be what you want in general.

Perhaps the ideal behavior here would be "normalize, but don't throw away
stopwords", but unfortunately our dictionary APIs don't support that.

            regards, tom lane