Thread: Full text search - wildcard and a stop word
Hi all,
I'm venturing into full text search in Postgres for the first time, and I'd like to be able to do a search by the start of a word - so I used the `:*` operator. However, this doesn't operate as I'd expect with a stop word - for example, my name is "Allan" so I often use it as a test string. It contains `all` which is a stop word, which is how I noticed this issue.
To illustrate:
=> select to_tsquery('al:*');
to_tsquery
------------
'al':*
(1 row)
=> select to_tsquery('all:*');
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)
=> select to_tsquery('alla:*');
to_tsquery
------------
'alla':*
(1 row)
I get why that is happening - the notification basically details it, but the wildcard at the end seems to me that it should return `'all':*` in this case? Is this by design or could it be considered a bug? I'm using Postgres 12.10.
Thanks,
Allan
To illustrate:
=> select to_tsquery('al:*');
to_tsquery
------------
'al':*
(1 row)
=> select to_tsquery('all:*');
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)
=> select to_tsquery('alla:*');
to_tsquery
------------
'alla':*
(1 row)
I get why that is happening - the notification basically details it, but the wildcard at the end seems to me that it should return `'all':*` in this case? Is this by design or could it be considered a bug? I'm using Postgres 12.10.
Thanks,
Allan
Allan Jardine <allan.jardine@sprymedia.co.uk> writes: > => select to_tsquery('all:*'); > NOTICE: text-search query contains only stop words or doesn't contain > lexemes, ignored > to_tsquery > ------------ > (1 row) > I get why that is happening - the notification basically details it, but > the wildcard at the end seems to me that it should return `'all':*` in this > case? Is this by design or could it be considered a bug? It's a hard problem. If we don't normalize the presented word, we risk not matching cases that users would expect to match (because the word is going to be compared to data that probably *was* normalized). In this particular case, you can skip the normalization by just not using to_tsquery: n=# select 'all:*'::tsquery; tsquery --------- 'all':* (1 row) but that might or might not be what you want in general. Perhaps the ideal behavior here would be "normalize, but don't throw away stopwords", but unfortunately our dictionary APIs don't support that. regards, tom lane