I've tried both ranking functions. I've tried a variety of the normalization settings. I'm using the standard English
languageconfiguration. Postgres 13.
I do understand your FTS philosophy - I suppose I'm looking for guidance about how best to approximate the search
capabilityin Solr using the FTS pieces you have. One concrete question, I suppose, is: the classic TF/IDF search
strategyrelies on inverse document frequency, which looks across the corpus. I can't tell whether that corpus-wide
frequencyinformation is taken into account in either ranking function.
I don't know if Solr weights earlier tokens more heavily, but I wouldn't be surprised if it does.
On 3/4/22 11:09 AM, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> On Fri, Mar 4, 2022 at 10:41:16AM -0500, Bayer, Samuel wrote:
>>> I apologize for not being able to be more specific.
>
>> I know it is hard to quantify. Is it possible that Postgres is treating
>> all the terms equally, while Solr is prioritizing terms that are earlier
>> in the document?
>
> A few basic questions:
>
> * which ranking function are you using?
>
> * with what options?
>
> * which PG version exactly?
>
> As far as I can see from a quick look at the docs, neither
> ts_rank() nor ts_rank_cd() consider "earlier in the document"
> to be an interesting consideration. They do have the ability
> to prefer terms that have been marked as having a higher weight,
> but you'd need to do some setup work to make that useful ---
> basically, you'd have to separate out the title or other metadata
> and apply setweight() to it while building the tsvectors.
>
> I wouldn't be surprised if Solr has some well-tuned default
> heuristics that mean that you don't have to work hard to get
> good results from it. The current state of our FTS features
> is more like "here's all the parts, but you have to build the
> behavior you want".
>
> ISTM that our FTS features have basically been on autopilot
> since they went in. I'd sort of hoped that we'd see more
> parsers, more ranking functions, etc, over time ... but nothing
> like that has happened. I'm not sure if that's just lack of
> interest, or if people find the code too difficult to work with.
>
> regards, tom lane