Thread: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators

BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16744
Logged by:          Stas Obydionnov
Email address:      stas@hellofyllo.com
PostgreSQL version: 12.3
Operating system:   runs on AWS RDS
Description:

When running the following code
    select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha gama'))

or 
    select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama'))
I would expect the result be not to be highlighted, however the result looks
like:
    <b>Alpha</b> Beta <b>Gama</b>

The same behavior is found for the following operator:
    select ts_headline('Alpha Beta Gama Delta', phraseto_tsquery ('alpha <3>
gama'))


PG Bug reporting form <noreply@postgresql.org> writes:
> When running the following code
>     select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha gama'))
> or
>     select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama'))
> I would expect the result be not to be highlighted,

That's operating as designed, I think.  Per the code comment:

         * If we found nothing acceptable, select min_words words starting at
         * the beginning.

The expectation really is that it's on you to not select documents that
don't match your search query.  Once you've selected a document to
display, ts_headline() is just going to do the best it can to produce
something useful.  "Not highlight anything" wasn't deemed particularly
useful, and I agree with that judgment.

Also, once it's selected a document fragment to display, it will highlight
all words within that fragment that appear in the search query, whether or
not the particular occurrence is part of the match-if-any.  Thus

regression=# select ts_headline('Alpha Beta Gama foo bar alpha gama', phraseto_tsquery ('alpha gama'));
                          ts_headline
----------------------------------------------------------------
 <b>Alpha</b> Beta <b>Gama</b> foo bar <b>alpha</b> <b>gama</b>
(1 row)

Again, this is a value judgment about what's useful.

            regards, tom lane



Re: BUG #16744: ts_headline behaves incorrectly with <-> and proximity operators

From
Stas Obydionnov
Date:
Thanks Tom,

Probably I provided a bad example.
Here is another one from a similar bug that was opened a couple of years ago and was not answered.

Assuming the following query:

SELECT ts_headline('English',
    'This Commercial Bank does not have any Equity in Europe but European Commercial Bank does',
    to_tsquery('English','European <-> Commercial <-> Bank')::tsquery);

The returned result is:
This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but <b>European</b> <b>Commercial</b> <b>Bank</b> does

This highlights the words Commercial & Bank separately in addition to European Commercial Bank.

However, the correct output expected should be:
This Commercial Bank does not have any Equity in Europe but <b>European</b> <b>Commercial</b> <b>Bank</b> does

Which only highlights *European Commercial Bank* due to the <-> operator in
phraseto_tsquery.

Regards,
Stas.



On Tue, Nov 24, 2020 at 8:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> When running the following code
>     select ts_headline('Alpha Beta Gama', phraseto_tsquery ('alpha gama'))
> or
>     select ts_headline('Alpha Beta Gama', to_tsquery ('alpha <-> gama'))
> I would expect the result be not to be highlighted,

That's operating as designed, I think.  Per the code comment:

         * If we found nothing acceptable, select min_words words starting at
         * the beginning.

The expectation really is that it's on you to not select documents that
don't match your search query.  Once you've selected a document to
display, ts_headline() is just going to do the best it can to produce
something useful.  "Not highlight anything" wasn't deemed particularly
useful, and I agree with that judgment.

Also, once it's selected a document fragment to display, it will highlight
all words within that fragment that appear in the search query, whether or
not the particular occurrence is part of the match-if-any.  Thus

regression=# select ts_headline('Alpha Beta Gama foo bar alpha gama', phraseto_tsquery ('alpha gama'));
                          ts_headline                           
----------------------------------------------------------------
 <b>Alpha</b> Beta <b>Gama</b> foo bar <b>alpha</b> <b>gama</b>
(1 row)

Again, this is a value judgment about what's useful.

                        regards, tom lane
Stas Obydionnov <stas@hellofyllo.com> writes:
> Probably I provided a bad example.
> Here is another one from a similar bug that was opened a couple of years
> ago and was not answered.

> Assuming the following query:

> SELECT ts_headline('English',
>     'This Commercial Bank does not have any Equity in Europe but European
> Commercial Bank does',
>     to_tsquery('English','European <-> Commercial <-> Bank')::tsquery);

> The returned result is:
> This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but
> <b>European</b> <b>Commercial</b> <b>Bank</b> does

> This highlights the words Commercial & Bank separately in addition
> to European Commercial Bank.

> However, the correct output expected should be:
> This Commercial Bank does not have any Equity in Europe but <b>European</b>
> <b>Commercial</b> <b>Bank</b> does

[ shrug... ]  Whether that's more correct than the current behavior
is a matter of opinion.  As I said, the ts_headline code highlights
all matching words within whatever fragment it selects.  It does
make an effort to locate a fragment that satisfies the query as
written, but that doesn't mean there won't be additional word
matches within the fragment.  (In fact, if I'm reading the code
correctly, it actually gives preference to fragments having more
matching words, which is why you don't just get "<b>European</b>
<b>Commercial</b> <b>Bank</b>" here.)  I think it's reasonable to
consider that highlighting the additional matches is a useful thing
to do, so I'm disinclined to change this longstanding behavior.

            regards, tom lane