Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly
Date
Msg-id ZT1kGIbiALGclTUA@momjian.us
Whole thread Raw
In response to Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly  (Alex Malek <magicagent@gmail.com>)
Responses Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly
List pgsql-bugs
On Wed, Aug  3, 2022 at 02:02:51PM -0400, Alex Malek wrote:
> On Wed, Aug 3, 2022 at 1:58 PM PG Bug reporting form <noreply@postgresql.org>
> wrote:
>     I have a noticed a likely bug when using ts_headline with the <-> operator
> 
>     Assuming the following query:
> 
>     SELECT ts_headline('English','This Commercial Bank does not have any Equity
>     in Europe but European Commercial Bank does',
>                         phraseto_tsquery('English','European Commercial
>     Bank')::tsquery);
> 
>     The returned result is:
>     This <b>Commercial</b> <b>Bank</b> does not have any Equity in Europe but
>     <b>European</b> <b>Commercial</b> <b>Bank</b> does
> 
>     This highlights the words Commercial & Bank separately in addition to
>     European Commercial Bank.
> 
>     However, the correct output expected should be:
>     This Commercial Bank does not have any Equity in Europe but <b>European</b>
>     <b>Commercial</b> <b>Bank</b> does
> 
>     Which only highlights *European Commercial Bank* due to the <-> operator in
>     phraseto_tsquery.
> 
>     SELECT phraseto_tsquery('English','European Commercial Bank');
>     returns 'european' <-> 'commerci' <-> 'bank' as expected indicating the
>     problem is with ts_headline function.

I tested this against Postgres 11 and master (and you tested on PG 10
and 14) and I found the same behavior, plus I found someting even
worse:

    SELECT ts_headline('English',
    'This Commercial Bank does not have any Equity in Europe but European Commercial Bank does',
    ('''equiti'' <-> ''bank''')::tsquery);
                                                      ts_headline
    ----------------------------------------------------------------------------------------------------------------
    
     This Commercial <b>Bank</b> does not have any <b>Equity</b> in Europebut European Commercial <b>Bank</b> does

Notice that "Bank" and "Equity" are not next to each other, but they
still highlight.  In fact, the words appear to be independently checked:

    SELECT ts_headline('English',
    'This Commercial Bank does not have any Equity in Europe but European Commercial Bank does',
    ('''XXX'' <-> ''bank''')::tsquery);
                                                   ts_headline
    ---------------------------------------------------------------------------------------------------------
     This Commercial <b>Bank</b> does not have any Equity in Europe but European Commercial <b>Bank</b> does

Is this documented somewhere?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



pgsql-bugs by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re:BUG #18172: High memory usage in tSRF function context
Next
From: Tom Lane
Date:
Subject: Re: BUG #15172: Postgresql ts_headline with <-> operator does not highlight text properly