Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB - Mailing list pgsql-bugs

From Julien Rouhaud
Subject Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
Date
Msg-id CAOBaU_ay6r15Ef5BrNT=7Ba3ZUA1ROuw0EmZwfy4aJ5ee7wDsA@mail.gmail.com
Whole thread Raw
In response to BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB  (James Inform <james.inform@pharmapp.de>)
List pgsql-bugs
On Mon, Aug 16, 2021 at 4:46 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> When I try to search a text field in a where clause that holds more than 250
> MB of text, PostgreSQL runs out of memory for the request.
>
> You can reproduce this behaviour with the following sql statement:
>
> with
>         q1 as
>         (
>                 -- 260*1024*1024 = 260MB
>                 select repeat('x',260*1024*1024) as mydata
>         )
> select count(*) from q1 where lower(mydata) like '%a%'
> ;
>
> ERROR:  invalid memory alloc request size 1090519044
>
> Using just a 250MB string:
>
> with
>         q1 as
>         (
>                 -- 250*1024*1024 = 250MB
>                 select repeat('x',250*1024*1024) as mydata
>         )
> select count(*) from q1 where lower(mydata) like '%a%'
> ;
>
> Everything is fine!
>
> The alloc request size seems to be 4 times the length of the text field.
> [...]
> Is this an expected behaviour?
> Why ist PostgreSQL allocating 4 times the column's memory when doing a
> search like above?

This is unfortunately the expected behavior, assuming that you're not
dealing with C/POSIX  encoding.

This is because in multibyte encoding each character can occupy up to
4B.  Postgres needs to allocate a single chunk of memory to hold the
resulting text, and it has no way to know how many multibyte
characters are present in the input string or how many character will
have a different size when down-cased, so it has to allocate the
maximum size that may be needed, which is 4 times the size of the
input string.  And there's a strict 1GB limitation for a single field
size, thus the ~256MB limit.



pgsql-bugs by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: BUG #17146: pg_dump statements are going into pg_stat_statements in 13.4
Next
From: James Inform
Date:
Subject: Re: BUG #17146: pg_dump statements are going into pg_stat_statements in 13.4