BUG #17556: ts_headline does not correctly find matches when separated by 4,999 words - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17556: ts_headline does not correctly find matches when separated by 4,999 words
Date
Msg-id 17556-70b0479170b83b81@postgresql.org
Whole thread Raw
Responses Re: BUG #17556: ts_headline does not correctly find matches when separated by 4,999 words  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17556
Logged by:          Alex Malek
Email address:      magicagent@gmail.com
PostgreSQL version: 14.4
Operating system:   Red Hat
Description:

Correct results when 4,998 words separate search terms:

# select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4998) || '
labor',
           $$'ipsum' & 'labor'$$::tsquery, 'StartSel=>, StopSel=<,
MaxFragments=100, MaxWords=7, MinWords=3') ;
     ts_headline
---------------------
 >ipsum< ... >labor<
(1 row)

Add one more word between terms being searched for, to total 4,999, and
terms are not found:

# select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4999) || '
labor',
           $$'ipsum' & 'labor'$$::tsquery, 'StartSel=>, StopSel=<,
MaxFragments=100, MaxWords=7, MinWords=3') ;
 ts_headline
-------------
 baz baz baz
(1 row)

Works correctly if "&" (AND) is replaced by "|" (OR)

# select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4999) || '
labor',
           $$'ipsum' | 'labor'$$::tsquery, 'StartSel=>, StopSel=<,
MaxFragments=100, MaxWords=7, MinWords=3') ;
     ts_headline
---------------------
 >ipsum< ... >labor<
(1 row)

The "MinWords" argument and the number of words before the first term being
searched for alters the results:
Removing one word before the first search term and ts_headline will match
first term:

# select ts_headline('baz baz ipsum ' || repeat(' foo ',4999) || ' labor',
           $$'ipsum' & 'labor'$$::tsquery, 'StartSel=>, StopSel=<,
MaxFragments=100, MaxWords=7, MinWords=3') ;
   ts_headline
-----------------
 baz baz >ipsum<
(1 row)

Now reducing MinWords from 3 to 2 and terms are once again not found:

# select ts_headline('baz baz ipsum ' || repeat(' foo ',4999) || ' labor',
           $$'ipsum' & 'labor'$$::tsquery, 'StartSel=>, StopSel=<,
MaxFragments=100, MaxWords=7, MinWords=2') ;
 ts_headline
-------------
 baz baz
(1 row)


pgsql-bugs by date:

Previous
From: Zsolt Ero
Date:
Subject: could not link file in wal restore lines
Next
From: PG Bug reporting form
Date:
Subject: BUG #17557: ts_headline will error with "invalid memory alloc request size" for large documents