Re: possible bug in cover density ranking? - Mailing list pgsql-hackers

From Sushant Sinha
Subject Re: possible bug in cover density ranking?
Date
Msg-id 1241227234.4633.1.camel@dragflick
Whole thread Raw
In response to Re: possible bug in cover density ranking?  (Sushant Sinha <sushant354@gmail.com>)
List pgsql-hackers
I see this as open items here

http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items

Any interest in fixing this?

-Sushant.

On Thu, 2009-01-29 at 13:54 -0500, Sushant Sinha wrote:
> 
> 
> On Thu, Jan 29, 2009 at 12:38 PM, Teodor Sigaev <teodor@sigaev.ru>
> wrote:
>                 Is this what is desired? It seems to me that Wdoc is
>                 getting a high
>                 ranking even when we are not sure of the position
>                 information. 
>         0.1 is not very high rank, and we could not suggest any
>         reasonable rank in this case. This document may be good, may
>         be bad. rank_cd is not limited by 1.
> 
>  
> For a cover of 2 query items, 0.1 is actually the maximum rank. This
> is only possible when both query items are adjacent to each other.
> 
> 0.1 may not seem too high when we look at its absoule value. But the
> problem is we are ranking a document for which we have no positional
> information available higher than a document for which we may have
> positional information available with let suppose the cover length of
> 3. I think we should rank the document with cover length 3 higher than
> the document for which we have no positional information (and assume
> cover length of 2 as we are doing now).
> 
> I feel that if ext.p=ext.q for query items > 1, then we should not
> count that cover for ranking at all. Or, another option will be to
> significantly inflate nNoise in this scenrio to  say 100. Putting
> nNoise=(ext.end-ext.begin)/2 is way too low for covers that we have no
> idea on (it is 0 for query items = 2).
> 
> I am not assuming or suggesting that rank_cd is bounded by one. Off
> course its rank increases as more and more covers are added.
> 
> Thanks,
> Sushant.
>         
>         
>                 
>                 The comment above says that "In this case we
>                 approximate number of
>                 noise word as half cover's length". But we do not know
>                 the cover's
>                 length in this case as ext.p and ext.q are both
>                 unreliable. And ext.end
>                 -ext.begin is not the "cover's length". It is the
>                 number of query items
>                 found in the cover.
>         
>         
>         Yeah, but if there is no information then information is
>         absent :), but I agree with you to change comment
>         -- 
>         Teodor Sigaev                                   E-mail:
>         teodor@sigaev.ru
>                                                           WWW:
>         http://www.sigaev.ru/
> 



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Throw some low-level C scutwork at me
Next
From: Vinicius Abrahao
Date:
Subject: [OT?] how postgresql fits in