Thread: [DOCS] Let's document a bytea bug

[DOCS] Let's document a bytea bug

From
Anna Akenteva
Date:
Hello!

Some time ago I've encountered a problem with the bytea type: we can't 
SELECT
bytea strings whose textual representation is too big to fit into 
StringInfoData.
And as a side effect, pg_dump refuses to dump tables with big bytea 
strings.

It's a bug, it's pretty confusing, but it seems like there's no pretty 
way
to fix it so far. Here's a link to a recent discussion on the issue:

https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru

Since it won't be fixed anytime soon, I thought it could be worth 
documenting.
Attaching a patch for the documentation: I added some text to the 
"Binary Data Types"
part where I tried to describe the issue and to explain how to deal with 
it.

My patch in plain text (for convenience):

It is not recommended to use bytea strings whose textual representation
exceeds 1GB, as it may not be possible to SELECT them due to output size
limitations. Consequently, a table containing such big strings cannot be
properly processed by pg_dump, as pg_dump will try to SELECT these 
values from the
table and fail. The exact size limit advised for bytea strings depends 
on their
content, the external format and encoding that you are using, the 
context in
which they will be selected. The general rule is that when you use 
SELECT,
the returned tuple should not exceed 1GB. Although even if SELECT does 
not
work, you can still retrieve big bytea strings using COPY in binary 
format.

-- 
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: [DOCS] Let's document a bytea bug

From
"Andrey M. Borodin"
Date:
Hi Anna!

> 23 мая 2018 г., в 20:33, Anna Akenteva <a.akenteva@postgrespro.ru> написал(а):
>
>
> Some time ago I've encountered a problem with the bytea type: we can't SELECT
> bytea strings whose textual representation is too big to fit into StringInfoData.
> And as a side effect, pg_dump refuses to dump tables with big bytea strings.
>
> It's a bug, it's pretty confusing, but it seems like there's no pretty way
> to fix it so far. Here's a link to a recent discussion on the issue:
>
https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru
>
> Since it won't be fixed anytime soon, I thought it could be worth documenting.
> Attaching a patch for the documentation: I added some text to the "Binary Data Types"
> part where I tried to describe the issue and to explain how to deal with it.
>
> My patch in plain text (for convenience):
>
> It is not recommended to use bytea strings whose textual representation
> exceeds 1GB, as it may not be possible to SELECT them due to output size
> limitations. Consequently, a table containing such big strings cannot be
> properly processed by pg_dump, as pg_dump will try to SELECT these values from the
> table and fail. The exact size limit advised for bytea strings depends on their
> content, the external format and encoding that you are using, the context in
> which they will be selected. The general rule is that when you use SELECT,
> the returned tuple should not exceed 1GB. Although even if SELECT does not
> work, you can still retrieve big bytea strings using COPY in binary format.

Thanks for this message. It took me a while to find out what was the problem.
+1 for documenting this, maybe even with exact error like
[ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid memory alloc request size 1472599557
It's really really scary. My first feeling was that it's TOAST corruption.

Best regards, Andrey Borodin.


Re: [DOCS] Let's document a bytea bug

From
Bruce Momjian
Date:
On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote:
> Hi Anna!
> 
> > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akenteva@postgrespro.ru> написал(а):
> > 
> > 
> > Some time ago I've encountered a problem with the bytea type: we can't SELECT
> > bytea strings whose textual representation is too big to fit into StringInfoData.
> > And as a side effect, pg_dump refuses to dump tables with big bytea strings.
> > 
> > It's a bug, it's pretty confusing, but it seems like there's no pretty way
> > to fix it so far. Here's a link to a recent discussion on the issue:
> >
https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru
> > 
> > Since it won't be fixed anytime soon, I thought it could be worth documenting.
> > Attaching a patch for the documentation: I added some text to the "Binary Data Types"
> > part where I tried to describe the issue and to explain how to deal with it.
> > 
> > My patch in plain text (for convenience):
> > 
> > It is not recommended to use bytea strings whose textual representation
> > exceeds 1GB, as it may not be possible to SELECT them due to output size
> > limitations. Consequently, a table containing such big strings cannot be
> > properly processed by pg_dump, as pg_dump will try to SELECT these values from the
> > table and fail. The exact size limit advised for bytea strings depends on their
> > content, the external format and encoding that you are using, the context in
> > which they will be selected. The general rule is that when you use SELECT,
> > the returned tuple should not exceed 1GB. Although even if SELECT does not
> > work, you can still retrieve big bytea strings using COPY in binary format.
> 
> Thanks for this message. It took me a while to find out what was the problem.
> +1 for documenting this, maybe even with exact error like
> [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid memory alloc request size 1472599557
> It's really really scary. My first feeling was that it's TOAST corruption.

I still have Anna Akenteva's 2018 email in my mailbox because I wanted
to research this further.  Now that you have re-reported the problem, I
am on it!  ;-)

I looked for a clearer reproducible test case, and I have found this
one:

    $ awk 'BEGIN { printf "\\\\x"; for (i=0; i < 30000000; i++) \
        printf "7f7f7f7f7f7f7f7f7f7f"; print}' > /tmp/x
    
    $ psql test
    
    CREATE TABLE test(x BYTEA);
    
    COPY test FROM '/tmp/x';
    
    COPY test FROM '/tmp/x';
    
    CREATE TABLE big_data AS SELECT (string_agg(x,'')) AS x FROM test;
    
    SELECT length(x) FROM big_data;
      length
    -----------
     600000000
    
    SELECT octet_length(x) FROM big_data;
     octet_length
    --------------
        600000000

    SELECT x FROM big_data;
    ERROR:  invalid memory alloc request size 1200000003

    \q

    $ pg_dump -d test > /rtmp/a
    pg_dump: Dumping the contents of table "big_data" failed: PQgetResult() failed.
    pg_dump: Error message from server: ERROR:  invalid memory alloc request size 1200000003
    pg_dump: The command was: COPY public.big_data (x) TO stdout;

So, the character and octet length is 600 million, but on output, that
will be expanded, and both SELECT and pg_dump fail.  I also can't see
how to improve the error message since it happens so low in the stack.

In reading the previous posts, I understand that expanding the length
limit would be very hard, and it is difficult to see how to know if the
output will error at input time.  I am in favor of the 2018 proposed
patch, attached, perhaps with some small adjustments.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee


Attachment

Re: [DOCS] Let's document a bytea bug

From
Joe Conway
Date:
On 8/3/20 4:20 PM, Bruce Momjian wrote:
> On Fri, Jul 31, 2020 at 10:13:48AM +0500, Andrey M. Borodin wrote:
>> Hi Anna!
>>
>> > 23 мая 2018 г., в 20:33, Anna Akenteva <a.akenteva@postgrespro.ru> написал(а):
>> >
>> >
>> > Some time ago I've encountered a problem with the bytea type: we can't SELECT
>> > bytea strings whose textual representation is too big to fit into StringInfoData.
>> > And as a side effect, pg_dump refuses to dump tables with big bytea strings.
>> >
>> > It's a bug, it's pretty confusing, but it seems like there's no pretty way
>> > to fix it so far. Here's a link to a recent discussion on the issue:
>> >
https://www.postgresql.org/message-id/flat/c8bdf802d41ec37003ec3b726db79428@postgrespro.ru#c8bdf802d41ec37003ec3b726db79428@postgrespro.ru
>> >
>> > Since it won't be fixed anytime soon, I thought it could be worth documenting.
>> > Attaching a patch for the documentation: I added some text to the "Binary Data Types"
>> > part where I tried to describe the issue and to explain how to deal with it.
>> >
>> > My patch in plain text (for convenience):
>> >
>> > It is not recommended to use bytea strings whose textual representation
>> > exceeds 1GB, as it may not be possible to SELECT them due to output size
>> > limitations. Consequently, a table containing such big strings cannot be
>> > properly processed by pg_dump, as pg_dump will try to SELECT these values from the
>> > table and fail. The exact size limit advised for bytea strings depends on their
>> > content, the external format and encoding that you are using, the context in
>> > which they will be selected. The general rule is that when you use SELECT,
>> > the returned tuple should not exceed 1GB. Although even if SELECT does not
>> > work, you can still retrieve big bytea strings using COPY in binary format.
>>
>> Thanks for this message. It took me a while to find out what was the problem.
>> +1 for documenting this, maybe even with exact error like
>> [ 2020-07-30 01:20:32.248 MSK pg_dump - 10.3.3.30,XX000 ]:ERROR:  invalid memory alloc request size 1472599557
>> It's really really scary. My first feeling was that it's TOAST corruption.
>
> I still have Anna Akenteva's 2018 email in my mailbox because I wanted
> to research this further.  Now that you have re-reported the problem, I
> am on it!  ;-)
>
> I looked for a clearer reproducible test case, and I have found this
> one:

It is easier to reproduce than that:

select repeat('x',600000000)::bytea;
ERROR:  invalid memory alloc request size 1200000003

select octet_length(repeat('x',600000000)::bytea);
 octet_length
--------------
    600000000
(1 row)

CREATE TABLE big_data AS
 select repeat('x',600000000)::bytea;
SELECT 1

SELECT repeat FROM big_data;
ERROR:  invalid memory alloc request size 1200000003

\q

pg_dump -d postgres > /tmp/a
pg_dump: error: Dumping the contents of table "big_data" failed: PQgetResult()
failed.
pg_dump: error: Error message from server: ERROR:  invalid memory alloc request
size 1200000003
pg_dump: error: The command was: COPY public.big_data (repeat) TO stdout;


There are other cases that also hit the StringInfo limits (i.e. MaxAllocSize)
with TEXT data type (and I guess others, like JSON or XML, too):

select repeat('x',600000000), repeat('y',600000000);
ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 600000010 bytes by 600000000
more bytes.


Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment