Thread: Full text search randomly not working for short prefixes?
Something funny going on with my full text search.. and I have no idea what. I have a receiver called "Ana", this is her tsv column: '38651000000':4 'aceventura@mailinator.com':3B 'ana':1A 'novak':2A This queries do not find her: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('a:*')); SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('an:*')); This does: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('ana:*')); Now to an even more interesting part: I have 3 people with last name "Novak" and one with name "Nov" This query finds all 4: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('n:*')); This finds NONE: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('no:*')); This finds all 4 again: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('nov:*')); ..and this finds all with the last name only: SELECT * FROM receivers r WHERE r.tsv @@ to_tsquery(unaccent('nova:*')); These are the TSV columns of last name: "'38651000000':4 'janez':1A 'janeznovak@mailinator.com':3B 'novak':2A" "'38651000000':4 'aceventura@mailinator.com':3B 'ana':1A 'novak':2A" "'38651000000':4 'novak':2A 'tine':1A 'tnovak@mailinator.com':3B" "'21415000000':4 'alen.nova@gmailer.com':3B 'allan':1A 'novak':2A" And the first name: "'38651604724':6 'brez':3A 'list':4A 'nov':1A 'novreceiver101@mailinator.com':5B 'receiv':2A" What is going on here?
cen <imbacen@gmail.com> writes: > Something funny going on with my full text search.. and I have no idea what. The way to debug this sort of thing is generally to look at what tsquery you're actually getting. I get regression=# select to_tsquery(unaccent('a:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery ------------ (1 row) regression=# select to_tsquery(unaccent('an:*')); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery ------------ (1 row) regression=# select to_tsquery(unaccent('ana:*')); to_tsquery ------------ 'ana':* (1 row) Of course, only the last is going to match 'ana'. So you need to use a text search configuration in which a/an are not stop words. Or possibly you could cast the unaccent result directly to tsquery rather than passing it through to_tsquery(), though likely that would just have a different set of failure modes with queries where you do wish stemming would occur. The problem with "no" seems to be the same. regards, tom lane
Thanks, that makes sense. I think I'll go with the cast approach, I don't really need stemming anywhere. Tom Lane je 02. 12. 2016 ob 16:33 napisal: > cen <imbacen@gmail.com> writes: >> Something funny going on with my full text search.. and I have no idea what. > The way to debug this sort of thing is generally to look at what tsquery > you're actually getting. I get > > regression=# select to_tsquery(unaccent('a:*')); > NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored > to_tsquery > ------------ > > (1 row) > > regression=# select to_tsquery(unaccent('an:*')); > NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored > to_tsquery > ------------ > > (1 row) > > regression=# select to_tsquery(unaccent('ana:*')); > to_tsquery > ------------ > 'ana':* > (1 row) > > Of course, only the last is going to match 'ana'. > > So you need to use a text search configuration in which a/an are > not stop words. Or possibly you could cast the unaccent result > directly to tsquery rather than passing it through to_tsquery(), > though likely that would just have a different set of failure modes > with queries where you do wish stemming would occur. > > The problem with "no" seems to be the same. > > regards, tom lane
På fredag 02. desember 2016 kl. 16:33:12, skrev Tom Lane <tgl@sss.pgh.pa.us>:
cen <imbacen@gmail.com> writes:
> Something funny going on with my full text search.. and I have no idea what.
The way to debug this sort of thing is generally to look at what tsquery
you're actually getting. I get
regression=# select to_tsquery(unaccent('a:*'));
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)
regression=# select to_tsquery(unaccent('an:*'));
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
to_tsquery
------------
(1 row)
regression=# select to_tsquery(unaccent('ana:*'));
to_tsquery
------------
'ana':*
(1 row)
Of course, only the last is going to match 'ana'.
So you need to use a text search configuration in which a/an are
not stop words. Or possibly you could cast the unaccent result
directly to tsquery rather than passing it through to_tsquery(),
though likely that would just have a different set of failure modes
with queries where you do wish stemming would occur.
The problem with "no" seems to be the same.
One can always specify 'simple' as the config, eliminating any "stop-wprd smartness":
andreak=> select to_tsquery('simple', 'a:*');
to_tsquery
------------
'a':*
(1 row)
to_tsquery
------------
'a':*
(1 row)
--
Andreas Joseph Krogh