Thread: bug in ts_rank_cd
MY PREV EMAIL HAD A PROBLEM. Please reply to this one ====================================================== There is a bug in ts_rank_cd. It does not correctly give rank when the query lexeme is the first one in the tsvector. Example: select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery('english', 'abc')); ts_rank_cd ------------ 0 select ts_rank_cd(to_tsvector('english', 'bcg abc sdd'), plainto_tsquery('english', 'abc')); ts_rank_cd ------------ 0.1 The problem is that the Cover finding algorithm ignores the lexeme at the 0th position, I have attached a patch which fixes it. After the patch the result is fine. select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery( 'english', 'abc')); ts_rank_cd ------------ 0.1
Attachment
Sushant Sinha <sushant354@gmail.com> writes: > There is a bug in ts_rank_cd. It does not correctly give rank when the > query lexeme is the first one in the tsvector. Hmm ... I cannot reproduce the behavior you're complaining of. You say > select ts_rank_cd(to_tsvector('english', 'abc sdd'), > plainto_tsquery('english', 'abc')); > ts_rank_cd > ------------ > 0 but I get regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'), regression(# plainto_tsquery('english', 'abc')); ts_rank_cd ------------ 0.1 (1 row) > The problem is that the Cover finding algorithm ignores the lexeme at > the 0th position, As far as I can tell, there is no "0th position" --- tsvector counts positions from one. The only way to see pos == 0 in the input to Cover() is if the tsvector has been stripped of position information. ts_rank_cd is documented to return 0 in that situation. Your patch would have the effect of causing it to return some nonzero, but quite bogus, ranking. regards, tom lane
Sorry for sounding the false alarm. I was not running the vanilla postgres and that is why I was seeing that problem. Should have checked with the vanilla one. -Sushant On Tue, 2010-12-21 at 23:03 -0500, Tom Lane wrote: > Sushant Sinha <sushant354@gmail.com> writes: > > There is a bug in ts_rank_cd. It does not correctly give rank when the > > query lexeme is the first one in the tsvector. > > Hmm ... I cannot reproduce the behavior you're complaining of. > You say > > > select ts_rank_cd(to_tsvector('english', 'abc sdd'), > > plainto_tsquery('english', 'abc')); > > ts_rank_cd > > ------------ > > 0 > > but I get > > regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'), > regression(# plainto_tsquery('english', 'abc')); > ts_rank_cd > ------------ > 0.1 > (1 row) > > > The problem is that the Cover finding algorithm ignores the lexeme at > > the 0th position, > > As far as I can tell, there is no "0th position" --- tsvector counts > positions from one. The only way to see pos == 0 in the input to > Cover() is if the tsvector has been stripped of position information. > ts_rank_cd is documented to return 0 in that situation. Your patch > would have the effect of causing it to return some nonzero, but quite > bogus, ranking. > > regards, tom lane