Thread: fetching bytea (blob) data of 850 MB from psql client failed

fetching bytea (blob) data of 850 MB from psql client failed

From
jitesh tiwari
Date:
Hi all, 

I have a use case in postgresql where I have inserted 500 rows in a table with 2 columns as described below. 
create table xyz (
id citext not null primary key, 
col1 bytea
);

The table has 500 rows and each row has around 850 MB of data. The bytea column data in the row has around 830 to 840 MB of data while citext column data has 10 to to 15 bytes approximately.
When I try to fetch all rows with a select query or try to fetch  a single row  with a select statement  using the where clause  for this table, the Database throws the below error -

VERROR; invalid memory alloc request size 1764311653(File mcxt.c; Line 959;
Routine palloc; )


The error above pointing the PostgreSQL database backend code.
May I know why I am getting this error? If I use array fetch using the ODBC driver with fetchsize=1 (Assuming the application will fetch 1 record at a time from the PostgreSQL database server) then also I get that error. Is there any Server configuration which can control this memory allocation error and allow me to fetch one record at time from the PostgreSQL database table?

Please suggest if it is a known issue or limitation in postgresql backend code. If so please point to the documentation link for the same. If there is any workaround as well please update me.

Regards, 
Jitesh

Re: fetching bytea (blob) data of 850 MB from psql client failed

From
John Naylor
Date:
On Mon, Aug 22, 2022 at 4:35 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>
> Hi all,
>
> I have a use case in postgresql where I have inserted 500 rows in a table with 2 columns as described below.
> create table xyz (
> id citext not null primary key,
> col1 bytea
> );
>
> The table has 500 rows and each row has around 850 MB of data. The bytea column data in the row has around 830 to 840
MBof data while citext column data has 10 to to 15 bytes approximately.
 
> When I try to fetch all rows with a select query or try to fetch  a single row  with a select statement  using the
whereclause  for this table, the Database throws the below error -
 
>
> VERROR; invalid memory alloc request size 1764311653(File mcxt.c; Line 959;
> Routine palloc; )

The request size is about 2x bigger than 880MB, which is what we would
expect when outputting using the default setting for "bytea_output",
which is "hex". Hex uses two bytes of text to represent each byte:

https://www.postgresql.org/docs/devel/datatype-binary.html

Unfortunately this retrieval problem has been encountered before, but
is not documented that I can see:

https://www.postgresql.org/message-id/flat/13120.1518793109%40sss.pgh.pa.us#81d92e9e7e15975e606bdb349ec573d4

If the data is largely printable ASCII, then one thing to try is
setting "bytea_output" to "escape", if your driver and client can
handle that:

https://www.postgresql.org/docs/devel/runtime-config-client.html#GUC-BYTEA-OUTPUT

If the data is binary and not printable ASCII, then there is no easy
workaround. For this, possibly the "large object" facility could be
useful, or storing the data in a regular file with the path stored in
the database.

--
John Naylor
EDB: http://www.enterprisedb.com



Re: fetching bytea (blob) data of 850 MB from psql client failed

From
jitesh tiwari
Date:
Hi John, 
Thanks for those details. 
The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by changing the bytea_output to 'escape' it failed again with the below error -
VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ).
BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future?

Regards, 
Jitesh




On Mon, Aug 22, 2022 at 4:22 PM John Naylor <john.naylor@enterprisedb.com> wrote:
On Mon, Aug 22, 2022 at 4:35 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>
> Hi all,
>
> I have a use case in postgresql where I have inserted 500 rows in a table with 2 columns as described below.
> create table xyz (
> id citext not null primary key,
> col1 bytea
> );
>
> The table has 500 rows and each row has around 850 MB of data. The bytea column data in the row has around 830 to 840 MB of data while citext column data has 10 to to 15 bytes approximately.
> When I try to fetch all rows with a select query or try to fetch  a single row  with a select statement  using the where clause  for this table, the Database throws the below error -
>
> VERROR; invalid memory alloc request size 1764311653(File mcxt.c; Line 959;
> Routine palloc; )

The request size is about 2x bigger than 880MB, which is what we would
expect when outputting using the default setting for "bytea_output",
which is "hex". Hex uses two bytes of text to represent each byte:

https://www.postgresql.org/docs/devel/datatype-binary.html

Unfortunately this retrieval problem has been encountered before, but
is not documented that I can see:

https://www.postgresql.org/message-id/flat/13120.1518793109%40sss.pgh.pa.us#81d92e9e7e15975e606bdb349ec573d4

If the data is largely printable ASCII, then one thing to try is
setting "bytea_output" to "escape", if your driver and client can
handle that:

https://www.postgresql.org/docs/devel/runtime-config-client.html#GUC-BYTEA-OUTPUT

If the data is binary and not printable ASCII, then there is no easy
workaround. For this, possibly the "large object" facility could be
useful, or storing the data in a regular file with the path stored in
the database.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: fetching bytea (blob) data of 850 MB from psql client failed

From
John Naylor
Date:
On Mon, Aug 29, 2022 at 1:02 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>
> Hi John,
> Thanks for those details.
> The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by
changingthe bytea_output to 'escape' it failed again with the below error -
 
> VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ).
> BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future?

Unfortunately, there is no plan as there is no consensus on how to
approach the issue. It seems the possible actions are (easiest to
hardest):

1. Document that bytea is unsafe  -- you can insert anything you like,
but you may be unable to retrieve it again.
2. Determine some maximum size that guarantees accessibility and
invent a GUC that by default prevents inserting larger values than
that.
3. Fix the issue properly.

#2 and #3 are challenging for reasons given in the thread I linked to above.



--
John Naylor
EDB: http://www.enterprisedb.com



Re: fetching bytea (blob) data of 850 MB from psql client failed

From
Tomas Vondra
Date:

On 8/31/22 09:27, John Naylor wrote:
> On Mon, Aug 29, 2022 at 1:02 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>>
>> Hi John,
>> Thanks for those details.
>> The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by
changingthe bytea_output to 'escape' it failed again with the below error -
 
>> VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ).
>> BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future?
> 
> Unfortunately, there is no plan as there is no consensus on how to
> approach the issue. It seems the possible actions are (easiest to
> hardest):
> 
> 1. Document that bytea is unsafe  -- you can insert anything you like,
> but you may be unable to retrieve it again.
> 2. Determine some maximum size that guarantees accessibility and
> invent a GUC that by default prevents inserting larger values than
> that.
> 3. Fix the issue properly.
> 
> #2 and #3 are challenging for reasons given in the thread I linked to above.
> 

I haven't tried, but wouldn't it be enough to fetch the data in smaller
chunks? The application would have to re-assemble that, of course.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: fetching bytea (blob) data of 850 MB from psql client failed

From
Pavel Stehule
Date:


st 31. 8. 2022 v 15:20 odesílatel Tomas Vondra <tomas.vondra@enterprisedb.com> napsal:


On 8/31/22 09:27, John Naylor wrote:
> On Mon, Aug 29, 2022 at 1:02 PM jitesh tiwari <jitesh120@gmail.com> wrote:
>>
>> Hi John,
>> Thanks for those details.
>> The database table for bytea columns has binary data data and some are not text. Hence, when it tried to fetch by changing the bytea_output to 'escape' it failed again with the below error -
>> VERROR; result of bytea output conversion is too large(File varlena.c; Line 415; Routine byteaout; ).
>> BTW, is there any plan to fix this issue in the PostgreSQL backend code in near future?
>
> Unfortunately, there is no plan as there is no consensus on how to
> approach the issue. It seems the possible actions are (easiest to
> hardest):
>
> 1. Document that bytea is unsafe  -- you can insert anything you like,
> but you may be unable to retrieve it again.
> 2. Determine some maximum size that guarantees accessibility and
> invent a GUC that by default prevents inserting larger values than
> that.
> 3. Fix the issue properly.
>
> #2 and #3 are challenging for reasons given in the thread I linked to above.
>

I haven't tried, but wouldn't it be enough to fetch the data in smaller
chunks? The application would have to re-assemble that, of course.


yes - there are possible both direction conversions to LO. So anybody can convert any bytea to temp LO, and then it can download to any client

Regards

Pavel
 

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: fetching bytea (blob) data of 850 MB from psql client failed

From
Noah Misch
Date:
On Mon, Aug 22, 2022 at 05:52:29PM +0700, John Naylor wrote:
> If the data is binary and not printable ASCII, then there is no easy
> workaround.

I would say the chief solution is to request binary transmission in your
application.  This cuts your network traffic in half for retrieving a bytea,
so it's a good change even if the server improves someday.  If you're using
libpq, search for show_binary_results in
https://www.postgresql.org/docs/devel/libpq-example.html to see an example of
doing this.  I don't know of an equivalent option in ODBC; if there is none,
one could modify the ODBC driver to use the same protocol feature that libpq
uses.