BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent - Mailing list pgsql-bugs

From Yishai Lerner
Subject BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent
Date
Msg-id 200807142104.m6EL4fcq051121@wwwmaster.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged online:

Bug reference:      4306
Logged by:          Yishai Lerner
Email address:      yish@alum.mit.edu
PostgreSQL version: 8.3.1
Operating system:   RHEL5 and MacOSX 10.4
Description:        TSearch2 stemming, stop words and lexize behaviour
inconsistent
Details:

I would expect the behavior for to_tsquery for the three variations of
"what", "what's" and "whats" to be consistent and for all variations to be
ignored since they all result in a stop word of "what".  However, this is
not the case as to_tsquery("whats") returns the stop word "what" as a
result.  Even more confusing is that if one were to look at the lexize
results below, they are inconsistent with the to_tsquery results below.
This seems like a bug to me.

goodrec_2=# select lexize('en_stem', 'what''s');
 lexize
--------
 {what}

goodrec_2=# select lexize('en_stem', 'whats');
 lexize
--------
 {what}

goodrec_2=# select lexize('en_stem', 'what');
 lexize
--------
 {}

goodrec_2=# select to_tsquery('what''s');
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s),
ignored
 to_tsquery


goodrec_2=# select to_tsquery('whats');
 to_tsquery
------------
 'what'

goodrec_2=# select to_tsquery('what');
NOTICE:  query contains only stopword(s) or doesn't contain lexeme(s),
ignored

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #4191: Include hint for Windows-like locals in documentation
Next
From: "Thibauld Favre"
Date:
Subject: Re: BUG #4286: ORDER BY returns inconsistent results when using LIMIT on a integer column set to default values