Re: Mailing list search engine: surprising missing results? - Mailing list pgsql-www

From James Addison
Subject Re: Mailing list search engine: surprising missing results?
Date
Msg-id CALDQ5NwjHE6jjmxVPSq00FbTiVVKcb9+fX7nMnrRXtHNZGt+2g@mail.gmail.com
Whole thread Raw
In response to Re: Mailing list search engine: surprising missing results?  (Ivan Panchenko <i.panchenko@postgrespro.ru>)
List pgsql-www
On Tue, 25 Jan 2022 at 21:23, Ivan Panchenko <i.panchenko@postgrespro.ru> wrote:
>
> On 25.01.2022 23:48, James Addison wrote:
> > I'm uncertain why parsing hyphenated query text produces compound tokens?
>
> Because in some cases user wants to search the full hyphenated words,
> not parts of them.

That makes sense, although to refer back to a previous suggestion of
yours, we could allow matching on the full hyphenated words by
emitting an 'OR' condition from the parsed query, instead of 'AND'
(perhaps using an argument?).

In other words:

# expected query to achieve a match (from your previous post in this thread)
'boyers-moore' | ('boyers' & 'moore')

# actual query that does not result in a match today (plainto_tsquery
for 'boyer-moore')
'boyer-moore' & 'boyer' & 'moore'

> >> It seems to me that in both cases we'd be better off generating
> >> "'boyers' <-> 'moore'", without the compound token at all.
> >> Maybe there's a case for the weaker 'boyers' & 'moore' translation,
> >> but I think if people wanted that they'd just enter separate words.
>
> Matching the compond token might be significant for ranking. (?)

Yes that does seem likely.  The knowledge that there is an exact-match
token in the results could be important for various use cases
(including relevance scoring).

> Probably, there is no universal *to_tsquery function and no universal
> parser to fit all users.

That seems possible too, yep.



pgsql-www by date:

Previous
From: Ivan Panchenko
Date:
Subject: Re: Mailing list search engine: surprising missing results?
Next
From: Eric Feng
Date:
Subject: Wiki editor request