Re: Ts_rank internals - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: Ts_rank internals
Date
Msg-id Pine.LNX.4.64.0709111118150.2767@sn.sai.msu.ru
Whole thread Raw
In response to Re: Ts_rank internals  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
On Tue, 11 Sep 2007, Teodor Sigaev wrote:

>> I tried to understand how ts_rank works, but I failed. What does Cover
>> function do? How does it work? What is the DocRepresentation data
>> structure like? I can see the definition of the struct, and the
>> get_docrep function to convert to that format, but by reading those I
>> can't figure out what the resulting DocRepresentation looks like.
>> I wonder if we could get rid of the istrue flag in QueryOperand, and use
>> a local BitmapSet variable instead? It seems wrong to have a temporary
>> flag that's only used in one function, in a struct that's used everywhere.
> It's a play around CDR algorithms (Cover Density Ranking).
>
> Based on paper Clarke et al., Relevance Ranking for One to Three Term 
> Queries.  " (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I 
> lost the article itself, but may be Oleg has it. Simple and short description 
> is placed at http://www2002.org/CDROM/refereed/643/node7.html.
>
> We change original algorithm to support weight of lexeme, details are on 
> Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking

Actually, we used two papers
http://citeseer.ist.psu.edu/clarke00relevance.html
and 
http://portal.acm.org/ft_gateway.cfm?id=333137&type=pdf&dl=GUIDE&dl=ACM
I can send you the latter if you have no access to the ACM.


>
> Array of DocRepresentation is a representation of document, it contains only 
> lexemes from both tsvector and tsquery, and lexemes are ordered by position - 
> as in original doc. Each DocRepresentation has links to corresponding 
> QueryOperand   to optimize query execution while extent search. When we 
> enlarge current extent for one word then we set istrue flag for corresponding 
> QueryOperand and execution tsquery from cover becomes very simple task.
>
> It's possible to eliminate istrue flag, but it's needed to implement 
> algorithm to execute tsquery over continuos part of document, not over whole 
> document.
>
>
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: invalidly encoded strings
Next
From: "Albe Laurenz"
Date:
Subject: Re: invalidly encoded strings