Re: TS: Limited cover density ranking - Mailing list pgsql-hackers

From Sushant Sinha
Subject Re: TS: Limited cover density ranking
Date
Msg-id 1327681933.1922.1.camel@dragflick
Whole thread Raw
In response to TS: Limited cover density ranking  (karavelov@mail.bg)
List pgsql-hackers
The rank counts 1/coversize. So bigger covers will not have much impact
anyway. What is the need of the patch?

-Sushant.

On Fri, 2012-01-27 at 18:06 +0200, karavelov@mail.bg wrote:
> Hello, 
> 
> I have developed a variation of cover density ranking functions that
> counts only covers that are lesser than a specified limit. It is
> useful for finding combinations of terms that appear nearby one
> another. Here is an example of usage: 
> 
> -- normal cover density ranking : not changed 
> luben=> select ts_rank_cd(to_tsvector('a b c d e g h i j k'),
> to_tsquery('a&d')); 
> ts_rank_cd 
> ------------ 
> 0.0333333 
> (1 row) 
> 
> -- limited to 2 
> luben=> select ts_rank_cd(2, to_tsvector('a b c d e g h i j k'),
> to_tsquery('a&d')); 
> ts_rank_cd 
> ------------ 
> 0 
> (1 row) 
> 
> luben=> select ts_rank_cd(2, to_tsvector('a b c d e g h i j k a d'),
> to_tsquery('a&d')); 
> ts_rank_cd 
> ------------ 
> 0.1 
> (1 row) 
> 
> -- limited to 3 
> luben=> select ts_rank_cd(3, to_tsvector('a b c d e g h i j k'),
> to_tsquery('a&d')); 
> ts_rank_cd 
> ------------ 
> 0.0333333 
> (1 row) 
> 
> luben=> select ts_rank_cd(3, to_tsvector('a b c d e g h i j k a d'),
> to_tsquery('a&d')); 
> ts_rank_cd 
> ------------ 
> 0.133333 
> (1 row) 
> 
> Find attached a path agains 9.1.2 sources. I preferred to make a
> patch, not a separate extension because it is only 1 statement change
> in calc_rank_cd function. If I have to make an extension a lot of code
> would be duplicated between backend/utils/adt/tsrank.c and the
> extension. 
> 
> I have some questions: 
> 
> 1. Is it interesting to develop it further (documentation, cleanup,
> etc) for inclusion in one of the next versions? If this is the case,
> there are some further questions: 
> 
> - should I overload ts_rank_cd (as in examples above and the patch) or
> should I define new set of functions, for example ts_rank_lcd ? 
> - should I define define this new sql level functions in core or
> should I go only with this 2 lines change in calc_rank_cd() and define
> the new functions as an extension? If we prefer the later, could I
> overload core functions with functions defined in extensions? 
> - and finally there is always the possibility to duplicate the code
> and make an independent extension. 
> 
> 2. If I run the patched version on cluster that was initialized with
> unpatched server, is there a way to register the new functions in the
> system catalog without reinitializing the cluster? 
> 
> Best regards 
> luben 
> 
> -- 
> Luben Karavelov




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: patch for parallel pg_dump
Next
From: Robert Haas
Date:
Subject: Re: patch for parallel pg_dump