Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres - Mailing list pgsql-general

From Bayer, Samuel
Subject Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
Date
Msg-id 4b5afb83-3e90-17c7-2650-9bd521e958a7@mitre.org
Whole thread Raw
In response to Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres  (Atri Sharma <atri.jiit@gmail.com>)
Responses Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
List pgsql-general
Fair question. Not worried so much about speed. Looking, essentially, at precision by rank (i.e., average precision and
variants).I have not explored the contrasts between the default English language configuration in Postgres and the one
inSolr - I have no reason to believe that there's anything odd going on there. My problem is that I can't provide
specificperformance numbers, or the corpus in question, but my overall impression is that the top N (10, 20) results
fromPostgres, no matter how I configure the ranking, aren't as relevant to the query, as a group, than the ones from
Solr.

Example anecdote: the documents I'm searching come with metadata (e.g., title), which I'm not indexing specially (not a
separatefield, just part of the raw text of the document). When I search even for single terms, and look at the titles
ofthe results, the titles in the Solr results more frequently contain that term than the titles in the Postgres
results.I also FEEL like I've noticed that the problem is more apparent in "OR" queries; if I search for a disjunction
ofterms, the documents that contain all the terms are more likely to be high in the Solr rankings than in the Postgres
rankings.

I apologize for not being able to be more specific.

Thanks in advance, again.

On 3/4/22 10:30 AM, Atri Sharma wrote:
> Can you define what "high quality" is?
> 
> Are you referring to precision? Or recall? Or speed? Or query dialect?
> 
> On Fri, Mar 4, 2022 at 8:59 PM Bayer, Samuel <sam@mitre.org> wrote:
>>
>> Thanks for replying. My problem is that I can't provide enough guidance on what isn't working, because (a) I don't
havegood enough intuitions about how the normalization options are expected to affect the results, and (b) I can't
identifya specific missing function - I'm just observing that I can't make the results as high-quality as Solr.
 
>>
>> My apologies.
>>
>> Sam
>>
>> On 3/4/22 10:25 AM, Bruce Momjian wrote:
>>> On Fri, Mar 4, 2022 at 08:10:48AM -0500, Bayer, Samuel wrote:
>>>> Hi all -
>>>>
>>>> When I have a need for both sophisticated database querying and
>>>> full-text search, I'd rather not stand up a technology stack with
>>>> multiple tools (e.g., Postgres and Apache Solr, or Postgres and
>>>> ElasticSearch with a zomboDB bridge). So I've been looking at the
>>>> Postgres full-text search capability, and comparing it to Apache
>>>> Solr. My experience so far - which has not been entirely anecdotal,
>>>> but hasn't amounted to a formal TREC-style evaluation - is that
>>>> Postgres full-text search, in any ranking/normalization configuration
>>>> I can create, is reliably worse than Solr. Now, I understand that the
>>>> whole point of Solr is search, and this is a sideline for Postgres,
>>>> but I'd like to figure out how close Postgres can get, and while I'm
>>>> knowledgeable about search technologies, I'm not an expert. And I've
>>>> looked for information on the Web about comparing Postgres search
>>>> to other search capabilities, and everything I've found so far is
>>>> extremely basic.
>>>>
>>>> Does anybody have any pointers to resources (people, sites, journal
>>>> articles, blogs, etc.) which are deeply knowledgeable about this
>>>> comparison?
>>>
>>> Uh, most of our full text seach is done by Russian developers, who are
>>> obviously very good at it.  It would be helpful if you could list
>>> exactly what is missing and then we can have a discussion the hackers
>>> list to see what is possible.  I think it would be helpful  if we just
>>> document what we _don't_ have.
>>>
>>
>>
> 
> 



pgsql-general by date:

Previous
From: Atri Sharma
Date:
Subject: Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres
Next
From: Bruce Momjian
Date:
Subject: Re: [EXT] Re: Looking for tips on improving full-text search quality in Postgres