Inconsistency with stemming/stop words in Tsearch2 - Mailing list pgsql-general

From Yishai Lerner
Subject Inconsistency with stemming/stop words in Tsearch2
Date
Msg-id E18D9633-0692-4D9D-A11D-0C0303FAC508@alum.mit.edu
Whole thread Raw
Responses Re: Inconsistency with stemming/stop words in Tsearch2
List pgsql-general
Hi, having an issue with Tsearch2 and how stop words lexemes are sometimes being utilized and sometimes not.  I would expect the behavior for to_tsquery for the three variations of "what", "what's" and "whats" to be consistent (using 'en_stem') and for all variations to be ignored since they all result in a stop word of "what".  However, this is not the case as to_tsquery("whats") returns the stop word "what" as a result.  Even more confusing is that if one were to look at the lexize results below, they are inconsistent with the to_tsquery results below.  This seems like a bug to me.

goodrec_2=# select lexize('en_stem', 'what''s');
 lexize 
--------
 {what}

goodrec_2=# select lexize('en_stem', 'whats');
 lexize 
--------
 {what}

goodrec_2=# select lexize('en_stem', 'what');
 lexize 
--------
 {}

goodrec_2=# select to_tsquery('what''s');
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s), ignored
 to_tsquery 


goodrec_2=# select to_tsquery('whats');
 to_tsquery 
------------
 'what'

goodrec_2=# select to_tsquery('what');
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s), ignored

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: 8.3.3 Complie issue
Next
From: ken andrew
Date:
Subject: Installing PostgreSQL without using CygWin