BUG #17557: ts_headline will error with "invalid memory alloc request size" for large documents - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17557: ts_headline will error with "invalid memory alloc request size" for large documents
Date
Msg-id 17557-6ddc074c8b1bd6df@postgresql.org
Whole thread Raw
Responses Re: BUG #17557: ts_headline will error with "invalid memory alloc request size" for large documents  (Japin Li <japinli@hotmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17557
Logged by:          Alex Malek
Email address:      magicagent@gmail.com
PostgreSQL version: 14.4
Operating system:   Red Hat
Description:

ts_headline when given a documents over a certain size/number of words will
cause "ERROR:  invalid memory alloc request size XXXXXX"

# select ts_headline('b ' || repeat('1 ',16777215), $$'b'$$::tsquery,
'MaxWords=4, MinWords=3') ;
ERROR:  invalid memory alloc request size 1610612736

Not just related to document size but also to number of "words" in a
document:

One less "word" works:

select ts_headline('b ' || repeat('1 ',16777214), $$'b'$$::tsquery,
'MaxWords=4, MinWords=3') ;
  ts_headline
----------------
 <b>b</b> 1 1 1
(1 row)

Mem not an issue for larger "words" up to a point:

# select ts_headline('b ' || repeat('123456789012345 ',16777214),
$$'b'$$::tsquery, 'MaxWords=4, MinWords=3') ;
                       ts_headline
----------------------------------------------------------
 <b>b</b> 123456789012345 123456789012345 123456789012345
(1 row)

# select ts_headline('b ' || repeat('1234567890123456 ',16777214),
$$'b'$$::tsquery, 'MaxWords=4, MinWords=3') ;
ERROR:  invalid memory alloc request size 1140850564

Memory issue appears to be triggered based on total number of words and word
length

# select ts_headline('b ' || repeat('1234567890123456 ',15790000),
$$'b'$$::tsquery, 'MaxWords=4, MinWords=3') ;
                         ts_headline
-------------------------------------------------------------
 <b>b</b> 1234567890123456 1234567890123456 1234567890123456
(1 row)

# select ts_headline('b ' || repeat('1234567890123456 ',15795000),
$$'b'$$::tsquery, 'MaxWords=4, MinWords=3') ;
ERROR:  invalid memory alloc request size 1074060012


I get the same results even when increasing psql GUCs including work_mem,
shared_buffers  and effective_cache_size
Also on machines w/ significantly more RAM, with and w/o HugePages enabled.


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17556: ts_headline does not correctly find matches when separated by 4,999 words
Next
From: Tom Lane
Date:
Subject: Re: If a row-level security policy contains a set returning function, pg_dump returns an incorrect serialization of that policy if the return type of the function was altered