Thread: Normalized Ranking example incorrect in text search

Normalized Ranking example incorrect in text search

From
Simon Riggs
Date:
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
Ranking Search Results

shows and example which says

"This is the same example using normalized ranking"

and then gives a query which calculates normalization in an incorrect
manner, yet without using the normalization parameter. A correct example
would be something like this:

SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC LIMIT 10;

I can't rerun the query because I don't have the example data set used.
Is that available?

This section also describes the two ranking functions supplied and
suggests you can write your own also.
- Can we say what the differences are between the two ranking functions?
Why do we have two?
- Can we supply or link to an example ranking function to allow people
to write their own?

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


Re: Normalized Ranking example incorrect in text search

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
> Ranking Search Results
> shows and example which says
> "This is the same example using normalized ranking"
> and then gives a query which calculates normalization in an incorrect
> manner,

On what basis do you claim that's an incorrect manner?  It's exactly
what is described in the paragraph just before the examples.

> A correct example
> would be something like this:

> SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank

Why is that correct (or more correct than other ways)?

> - Can we say what the differences are between the two ranking functions?
> Why do we have two?

We already say that: the _cd function doesn't work without positional
info in the input tsvector.

            regards, tom lane

Re: Normalized Ranking example incorrect in text search

From
Tom Lane
Date:
I wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> and then gives a query which calculates normalization in an incorrect
>> manner,

> On what basis do you claim that's an incorrect manner?  It's exactly
> what is described in the paragraph just before the examples.

... although on reflection, it seems pretty stupid to be recommending
a method that requires two evaluations at each row of an admittedly
expensive function.

Seems like we should add one more normalization flag bit:

    32 --- replace computed rank by rank / (rank + 1)

and then the second example would be

SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE  query @@ textsearch
ORDER BY rank DESC LIMIT 10;

with no change in the example output.

            regards, tom lane