Thread: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16744 Logged by: Stas Obydionnov Email address: stas@hellofyllo.com PostgreSQL version: 12.3 Operating system: runs on AWS RDS Description: When running the following code select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha gama')) or select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama')) I would expect the result be not to be highlighted, however the result looks like: <b>Alpha</b> Beta <b>Gama</b> The same behavior is found for the following operator: select ts_headline('Alpha Beta Gama Delta', phraseto_tsquery ('alpha <3> gama'))
Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > When running the following code > select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha gama')) > or > select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama')) > I would expect the result be not to be highlighted, That's operating as designed, I think. Per the code comment: * If we found nothing acceptable, select min_words words starting at * the beginning. The expectation really is that it's on you to not select documents that don't match your search query. Once you've selected a document to display, ts_headline() is just going to do the best it can to produce something useful. "Not highlight anything" wasn't deemed particularly useful, and I agree with that judgment. Also, once it's selected a document fragment to display, it will highlight all words within that fragment that appear in the search query, whether or not the particular occurrence is part of the match-if-any. Thus regression=# select ts_headline('Alpha Beta Gama foo bar alpha gama', phraseto_tsquery ('alpha gama')); ts_headline ---------------------------------------------------------------- <b>Alpha</b> Beta <b>Gama</b> foo bar <b>alpha</b> <b>gama</b> (1 row) Again, this is a value judgment about what's useful. regards, tom lane
Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
From
Stas Obydionnov
Date:
Thanks Tom,
Assuming the following query:
SELECT ts_headline('English',
Probably I provided a bad example.
Here is another one from a similar bug that was opened a couple of years ago and was not answered.
SELECT ts_headline('English',
'This Commercial Bank does not have any Equity in Europe but European Commercial Bank does',
to_tsquery('English','European <-> Commercial <-> Bank')::tsquery);
The returned result is:
This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but <b>European</b> <b>Commercial</b> <b>Bank</b> does
This highlights the words Commercial & Bank separately in addition to European Commercial Bank.
However, the correct output expected should be:
This Commercial Bank does not have any Equity in Europe but <b>European</b> <b>Commercial</b> <b>Bank</b> does
Which only highlights *European Commercial Bank* due to the <-> operator in
phraseto_tsquery.
The returned result is:
This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but <b>European</b> <b>Commercial</b> <b>Bank</b> does
This highlights the words Commercial & Bank separately in addition to European Commercial Bank.
However, the correct output expected should be:
This Commercial Bank does not have any Equity in Europe but <b>European</b> <b>Commercial</b> <b>Bank</b> does
Which only highlights *European Commercial Bank* due to the <-> operator in
phraseto_tsquery.
Regards,
Stas.
On Tue, Nov 24, 2020 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> When running the following code
> select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha gama'))
> or
> select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama'))
> I would expect the result be not to be highlighted,
That's operating as designed, I think. Per the code comment:
* If we found nothing acceptable, select min_words words starting at
* the beginning.
The expectation really is that it's on you to not select documents that
don't match your search query. Once you've selected a document to
display, ts_headline() is just going to do the best it can to produce
something useful. "Not highlight anything" wasn't deemed particularly
useful, and I agree with that judgment.
Also, once it's selected a document fragment to display, it will highlight
all words within that fragment that appear in the search query, whether or
not the particular occurrence is part of the match-if-any. Thus
regression=# select ts_headline('Alpha Beta Gama foo bar alpha gama', phraseto_tsquery ('alpha gama'));
ts_headline
----------------------------------------------------------------
<b>Alpha</b> Beta <b>Gama</b> foo bar <b>alpha</b> <b>gama</b>
(1 row)
Again, this is a value judgment about what's useful.
regards, tom lane
Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators
From
Tom Lane
Date:
Stas Obydionnov <stas@hellofyllo.com> writes: > Probably I provided a bad example. > Here is another one from a similar bug that was opened a couple of years > ago and was not answered. > Assuming the following query: > SELECT ts_headline('English', > 'This Commercial Bank does not have any Equity in Europe but European > Commercial Bank does', > to_tsquery('English','European <-> Commercial <-> Bank')::tsquery); > The returned result is: > This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but > <b>European</b> <b>Commercial</b> <b>Bank</b> does > This highlights the words Commercial & Bank separately in addition > to European Commercial Bank. > However, the correct output expected should be: > This Commercial Bank does not have any Equity in Europe but <b>European</b> > <b>Commercial</b> <b>Bank</b> does [ shrug... ] Whether that's more correct than the current behavior is a matter of opinion. As I said, the ts_headline code highlights all matching words within whatever fragment it selects. It does make an effort to locate a fragment that satisfies the query as written, but that doesn't mean there won't be additional word matches within the fragment. (In fact, if I'm reading the code correctly, it actually gives preference to fragments having more matching words, which is why you don't just get "<b>European</b> <b>Commercial</b> <b>Bank</b>" here.) I think it's reasonable to consider that highlighting the additional matches is a useful thing to do, so I'm disinclined to change this longstanding behavior. regards, tom lane