Re: TS: Limited cover density ranking - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: TS: Limited cover density ranking
Date
Msg-id Pine.LNX.4.64.1201282302520.12612@sn.sai.msu.ru
Whole thread Raw
In response to TS: Limited cover density ranking  (karavelov@mail.bg)
List pgsql-hackers
I suggest you work on more general approach, see 
http://www.sai.msu.su/~megera/wiki/2009-08-12 for example.

btw, I don't like you changed ts_rank_cd arguments.

Oleg
On Fri, 27 Jan 2012, karavelov@mail.bg wrote:

> Hello,
>
> I have developed a variation of cover density ranking functions that counts only covers that are lesser than a
specifiedlimit. It is useful for finding combinations of terms that appear nearby one another. Here is an example of
usage:
>
> -- normal cover density ranking : not changed
> luben=> select ts_rank_cd(to_tsvector('a b c d e g h i j k'), to_tsquery('a&d'));
> ts_rank_cd
> ------------
>  0.0333333
> (1 row)
>
> -- limited to 2
> luben=> select ts_rank_cd(2, to_tsvector('a b c d e g h i j k'), to_tsquery('a&d'));
> ts_rank_cd
> ------------
>          0
> (1 row)
>
> luben=> select ts_rank_cd(2, to_tsvector('a b c d e g h i j k a d'), to_tsquery('a&d'));
> ts_rank_cd
> ------------
>        0.1
> (1 row)
>
> -- limited to 3
> luben=> select ts_rank_cd(3, to_tsvector('a b c d e g h i j k'), to_tsquery('a&d'));
> ts_rank_cd
> ------------
>  0.0333333
> (1 row)
>
> luben=> select ts_rank_cd(3, to_tsvector('a b c d e g h i j k a d'), to_tsquery('a&d'));
> ts_rank_cd
> ------------
>   0.133333
> (1 row)
>
> Find attached a path agains 9.1.2 sources. I preferred to make a patch, not a separate extension because it is only 1
statementchange in calc_rank_cd function. If I have to make an extension a lot of code would be duplicated between
backend/utils/adt/tsrank.cand the extension.
 
>
> I have some questions:
>
> 1. Is it interesting to develop it further (documentation, cleanup, etc) for inclusion in one of the next versions?
Ifthis is the case, there are some further questions:
 
>
> - should I overload ts_rank_cd (as in examples above and the patch) or should I define new set of functions, for
examplets_rank_lcd ?
 
> - should I define define this new sql level functions in core or should I go only with this 2 lines change in
calc_rank_cd()and define the new functions as an extension? If we prefer the later, could I overload core functions
withfunctions defined in extensions?
 
> - and finally there is always the possibility to duplicate the code and make an independent extension.
>
> 2. If I run the patched version on cluster that was initialized with unpatched server, is there a way to register the
newfunctions in the system catalog without reinitializing the cluster?
 
>
> Best regards
> luben
>
> --
> Luben Karavelov
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: initdb and fsync
Next
From: Heikki Linnakangas
Date:
Subject: pg_dumpall and temp_tablespaces dependency problem