Thread: BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent

BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent

From
"Yishai Lerner"
Date:
The following bug has been logged online:

Bug reference:      4306
Logged by:          Yishai Lerner
Email address:      yish@alum.mit.edu
PostgreSQL version: 8.3.1
Operating system:   RHEL5 and MacOSX 10.4
Description:        TSearch2 stemming, stop words and lexize behaviour
inconsistent
Details:

I would expect the behavior for to_tsquery for the three variations of
"what", "what's" and "whats" to be consistent and for all variations to be
ignored since they all result in a stop word of "what".  However, this is
not the case as to_tsquery("whats") returns the stop word "what" as a
result.  Even more confusing is that if one were to look at the lexize
results below, they are inconsistent with the to_tsquery results below.
This seems like a bug to me.

goodrec_2=# select lexize('en_stem', 'what''s');
 lexize
--------
 {what}

goodrec_2=# select lexize('en_stem', 'whats');
 lexize
--------
 {what}

goodrec_2=# select lexize('en_stem', 'what');
 lexize
--------
 {}

goodrec_2=# select to_tsquery('what''s');
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s),
ignored
 to_tsquery


goodrec_2=# select to_tsquery('whats');
 to_tsquery
------------
 'what'

goodrec_2=# select to_tsquery('what');
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s),
ignored