Home > mailing lists

Re: BUG #16235: ts_rank ignores match and only considers lower weighted vector - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: BUG #16235: ts_rank ignores match and only considers lower weighted vector
Date	January 27, 2020 22:34:58
Msg-id	23961.1580164498@sss.pgh.pa.us Whole thread Raw
In response to	BUG #16235: ts_rank ignores match and only considers lower weighted vector (PG Bug reporting form <noreply@postgresql.org>)
Responses	Re: BUG #16235: ts_rank ignores match and only considers lowerweighted vector
List	pgsql-bugs

Tree view

PG Bug reporting form <noreply@postgresql.org> writes:
> The following query shows the problem:

> select ts_rank(doc1, query) as rank_wrong, ts_rank(doc2, query) as
> rank_correct
> from (select setweight(to_tsvector('simple', 'foo something'), 'A') ||
>              setweight(to_tsvector('simple', 'foobar'), 'C')    as doc1,
>              setweight(to_tsvector('simple', 'foo something'), 'A') as
> doc2,
>              to_tsquery('simple', 'foo:* & something')               as
> query) as subquery;

> ts_rank on doc1 is only half of the rank of doc2. ts_rank seems to only
> consider the 'foobar' term with lower weight when calculating the rank. The
> foo:1A is only considered in doc2.

No, that's not correct.  What it actually is doing is taking some sort of
average of the weights of the occurrences, as you can see if you play
around with a few more examples besides these two.  That could be better
documented, perhaps, but I don't think it's obviously broken.

I can see that there might be a use for taking the max or even the sum
of the weights rather than an average --- in many situations it wouldn't
be desirable to rank doc1 of your example lower than doc2.  But really
that'd be a different ranking algorithm, not a bug fix for this one.

The manual claims you can write your own ranking algorithm ... but
AFAICS you'd have to code it in C, because we aren't exposing anything
at SQL level that would let you get at the raw match data :-(.
So there's room for improvement there.

Also, you might try using ts_rank_cd() instead, as that uses a different
algorithm for combining the weights.  At least on this example, doc1
gets a higher score than doc2.

            regards, tom lane

pgsql-bugs by date:

From: PG Bug reporting form
Date: 27 January 2020, 22:21:55
Subject: BUG #16236: Invalid escape encoding

From: Tom Lane
Date: 27 January 2020, 23:05:45
Subject: Re: BUG #16236: Invalid escape encoding

Re: BUG #16235: ts_rank ignores match and only considers lower weighted vector - Mailing list pgsql-bugs

Previous

Next