Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB - Mailing list pgsql-bugs

From James Inform
Subject Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
Date
Msg-id 89216301-1ce5-6696-636c-456d6a53bd1a@pharmapp.de
Whole thread Raw
In response to Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-bugs
Well I understand. But in my scenario I am using ansi characters which 
are just 1-byte-utf8. The system should be capable of handling that.
Maybe we could implement a different approach for text column content > 
256 GB, where we implement something like a "streaming" read where 
memory is not allocated based on fixed 4-bytes per character, but on the 
true length of characters found in the text column or returned by a 
function like lower().

But as a user i would expect that I can store as many characters as 
possible into a text field without getting errors in any way.

The following example will give an error although the amount of text 
easily fits into the column because the 'x' is ansi and takes 1 byte.

create temp table mytest(mydata text);
create index on mytest(lower(mydata));
insert into mytest
select repeat('x',300*1024*1024) as mydata;

Without the index I could easily store 800mb in the text column:
select repeat('x',800*1024*1024) as mydata;

So from what I see we should at least extend the documentation and tell 
people that although a text field can hold up to 1GB of string data,
only 256MB can safely be used if one wants to use function-based indexed 
or function on the column inside a where clauses.

So for everyone using text columns in a more than basic way, we should 
simply tell everyone  not not store more than 256 * 1024*1024 characters 
in a text column.

Julien Rouhaud wrote:
> On Mon, Aug 16, 2021 at 4:46 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> When I try to search a text field in a where clause that holds more than 250
>> MB of text, PostgreSQL runs out of memory for the request.
>>
>> You can reproduce this behaviour with the following sql statement:
>>
>> with
>>          q1 as
>>          (
>>                  -- 260*1024*1024 = 260MB
>>                  select repeat('x',260*1024*1024) as mydata
>>          )
>> select count(*) from q1 where lower(mydata) like '%a%'
>> ;
>>
>> ERROR:  invalid memory alloc request size 1090519044
>>
>> Using just a 250MB string:
>>
>> with
>>          q1 as
>>          (
>>                  -- 250*1024*1024 = 250MB
>>                  select repeat('x',250*1024*1024) as mydata
>>          )
>> select count(*) from q1 where lower(mydata) like '%a%'
>> ;
>>
>> Everything is fine!
>>
>> The alloc request size seems to be 4 times the length of the text field.
>> [...]
>> Is this an expected behaviour?
>> Why ist PostgreSQL allocating 4 times the column's memory when doing a
>> search like above?
> This is unfortunately the expected behavior, assuming that you're not
> dealing with C/POSIX  encoding.
>
> This is because in multibyte encoding each character can occupy up to
> 4B.  Postgres needs to allocate a single chunk of memory to hold the
> resulting text, and it has no way to know how many multibyte
> characters are present in the input string or how many character will
> have a different size when down-cased, so it has to allocate the
> maximum size that may be needed, which is 4 times the size of the
> input string.  And there's a strict 1GB limitation for a single field
> size, thus the ~256MB limit.




pgsql-bugs by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: BUG #17146: pg_dump statements are going into pg_stat_statements in 13.4
Next
From: PG Bug reporting form
Date:
Subject: BUG #17147: Why concat_ws and concat is not immutable?