Home > mailing lists

Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres - Mailing list pgsql-general

From	Bayer, Samuel
Subject	Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
Date	March 4, 2022 16:39:39
Msg-id	7ee2afc2-dcf7-2bc9-3092-8ca58ed2b880@mitre.org Whole thread Raw
In response to	Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
List	pgsql-general

Tree view

I've tried both ranking functions. I've tried a variety of the normalization settings. I'm using the standard English
languageconfiguration. Postgres 13.
 

I do understand your FTS philosophy - I suppose I'm looking for guidance about how best to approximate the search
capabilityin Solr using the FTS pieces you have. One concrete question, I suppose, is: the classic TF/IDF search
strategyrelies on inverse document frequency, which looks across the corpus. I can't tell whether that corpus-wide
frequencyinformation is taken into account in either ranking function.
 

I don't know if Solr weights earlier tokens more heavily, but I wouldn't be surprised if it does.

On 3/4/22 11:09 AM, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> On Fri, Mar 4, 2022 at 10:41:16AM -0500, Bayer, Samuel wrote:
>>> I apologize for not being able to be more specific.
> 
>> I know it is hard to quantify.  Is it possible that Postgres is treating
>> all the terms equally, while Solr is prioritizing terms that are earlier
>> in the document?
> 
> A few basic questions:
> 
> * which ranking function are you using?
> 
> * with what options?
> 
> * which PG version exactly?
> 
> As far as I can see from a quick look at the docs, neither
> ts_rank() nor ts_rank_cd() consider "earlier in the document"
> to be an interesting consideration.  They do have the ability
> to prefer terms that have been marked as having a higher weight,
> but you'd need to do some setup work to make that useful ---
> basically, you'd have to separate out the title or other metadata
> and apply setweight() to it while building the tsvectors.
> 
> I wouldn't be surprised if Solr has some well-tuned default
> heuristics that mean that you don't have to work hard to get
> good results from it.  The current state of our FTS features
> is more like "here's all the parts, but you have to build the
> behavior you want".
> 
> ISTM that our FTS features have basically been on autopilot
> since they went in.  I'd sort of hoped that we'd see more
> parsers, more ranking functions, etc, over time ... but nothing
> like that has happened.  I'm not sure if that's just lack of
> interest, or if people find the code too difficult to work with.
> 
>             regards, tom lane

pgsql-general by date:

From: Tom Lane
Date: 04 March 2022, 16:09:46
Subject: Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres

From: Tom Lane
Date: 04 March 2022, 16:43:57
Subject: Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres

Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres - Mailing list pgsql-general

Previous

Next