Thread: bug in ts_rank_cd

bug in ts_rank_cd

From
Sushant Sinha
Date:
MY PREV EMAIL HAD A PROBLEM. Please reply to this one
======================================================

There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.

Example:

select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));
 ts_rank_cd
------------
          0

select ts_rank_cd(to_tsvector('english', 'bcg abc sdd'),
plainto_tsquery('english', 'abc'));
 ts_rank_cd
------------
        0.1

The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position, I have attached a patch which fixes it. After the
patch the result is fine.

select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery(
'english', 'abc'));
 ts_rank_cd
------------
        0.1


Attachment

Re: bug in ts_rank_cd

From
Tom Lane
Date:
Sushant Sinha <sushant354@gmail.com> writes:
> There is a bug in ts_rank_cd. It does not correctly give rank when the
> query lexeme is the first one in the tsvector.

Hmm ... I cannot reproduce the behavior you're complaining of.
You say

> select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> plainto_tsquery('english', 'abc'));   
>  ts_rank_cd 
> ------------
>           0

but I get

regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
regression(# plainto_tsquery('english', 'abc'));   ts_rank_cd 
------------       0.1
(1 row)

> The problem is that the Cover finding algorithm ignores the lexeme at
> the 0th position,

As far as I can tell, there is no "0th position" --- tsvector counts
positions from one.  The only way to see pos == 0 in the input to
Cover() is if the tsvector has been stripped of position information.
ts_rank_cd is documented to return 0 in that situation.  Your patch
would have the effect of causing it to return some nonzero, but quite
bogus, ranking.
        regards, tom lane


Re: bug in ts_rank_cd

From
Sushant Sinha
Date:
Sorry for sounding the false alarm. I was not running the vanilla
postgres and that is why I was seeing that problem. Should have checked
with the vanilla one.

-Sushant

On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote:
> Sushant Sinha <sushant354@gmail.com> writes:
> > There is a bug in ts_rank_cd. It does not correctly give rank when the
> > query lexeme is the first one in the tsvector.
> 
> Hmm ... I cannot reproduce the behavior you're complaining of.
> You say
> 
> > select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> > plainto_tsquery('english', 'abc'));   
> >  ts_rank_cd 
> > ------------
> >           0
> 
> but I get
> 
> regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> regression(# plainto_tsquery('english', 'abc'));   
>  ts_rank_cd 
> ------------
>         0.1
> (1 row)
> 
> > The problem is that the Cover finding algorithm ignores the lexeme at
> > the 0th position,
> 
> As far as I can tell, there is no "0th position" --- tsvector counts
> positions from one.  The only way to see pos == 0 in the input to
> Cover() is if the tsvector has been stripped of position information.
> ts_rank_cd is documented to return 0 in that situation.  Your patch
> would have the effect of causing it to return some nonzero, but quite
> bogus, ranking.
> 
>             regards, tom lane