Re: possible bug in cover density ranking? - Mailing list pgsql-hackers

From Sushant Sinha
Subject Re: possible bug in cover density ranking?
Date
Msg-id 9fb559330901291054l13164cf8ia45caa1ea52cdc96@mail.gmail.com
Whole thread Raw
In response to Re: possible bug in cover density ranking?  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: possible bug in cover density ranking?  (Sushant Sinha <sushant354@gmail.com>)
List pgsql-hackers


On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Is this what is desired? It seems to me that Wdoc is getting a high
ranking even when we are not sure of the position information.
0.1 is not very high rank, and we could not suggest any reasonable rank in this case. This document may be good, may be bad. rank_cd is not limited by 1.

 
For a cover of 2 query items, 0.1 is actually the maximum rank. This is only possible when both query items are adjacent to each other.

0.1 may not seem too high when we look at its absoule value. But the problem is we are ranking a document for which we have no positional information available higher than a document for which we may have positional information available with let suppose the cover length of 3. I think we should rank the document with cover length 3 higher than the document for which we have no positional information (and assume cover length of 2 as we are doing now).

I feel that if ext.p=ext.q for query items > 1, then we should not count that cover for ranking at all. Or, another option will be to significantly inflate nNoise in this scenrio to  say 100. Putting nNoise=(ext.end-ext.begin)/2 is way too low for covers that we have no idea on (it is 0 for query items = 2).

I am not assuming or suggesting that rank_cd is bounded by one. Off course its rank increases as more and more covers are added.

Thanks,
Sushant.



The comment above says that "In this case we approximate number of
noise word as half cover's length". But we do not know the cover's
length in this case as ext.p and ext.q are both unreliable. And ext.end
-ext.begin is not the "cover's length". It is the number of query items
found in the cover.

Yeah, but if there is no information then information is absent :), but I agree with you to change comment
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                  WWW: http://www.sigaev.ru/

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: reloptions with a "namespace"
Next
From: Andrew Chernow
Date:
Subject: PQinitSSL broken in some use cases