Thread: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

[HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Anna Akenteva

Date:

16 February 2018, 17:23:50

Hello!

If I create a big bytea value and try to select it from a table, I get 
an error, something like: "ERROR:  invalid memory alloc request size 
...".

So basically we can insert data into a table but then we can't even work 
with it. Sounds like a bug. Attaching a patch that fixes it (applies to 
2a41507dab0f293ff241fe8ae326065998668af8).

And as it seems like quite a serious issue, would it be possible to 
backport a fix for it to earlier versions?




HOW TO RECREATE:
1) generate some random data (in this case, 600 MB):
dd if=/dev/urandom of=rand.dat bs=1M count=600

2) postgres=# select lo_import('/PATH/TO/rand.dat');
  lo_import
-----------
      16397 [USE THIS ID FOR THE NEXT STEP]
(1 row)

3) postgres=# create table big_data as select (string_agg(data,'')) as 
data from pg_largeobject where loid =16397;
SELECT 1

4) postgres=# select * from big_data;
ERROR:  invalid memory alloc request size 1468006403



-- 
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

string_info_master.patch

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Tom Lane

Date:

16 February 2018, 17:58:29

Anna Akenteva <a.akenteva@postgrespro.ru> writes:
> [ widen StringInfoData max length to size_t ]

I find this scary as heck.  Have you spent any time looking at the
side effects?  There are probably hundreds of places that expect that
stringinfos won't get larger than 1GB.

Also, I don't entirely see how this fixes your stated goal of being
able to select a bytea value whose textual representation exceeds
1GB.  The wire protocol can't support that either, and even if it did,
I wonder how many client programs could cope.  Extremely wide tuple
values create pain points in many places.

> And as it seems like quite a serious issue, would it be possible to 
> backport a fix for it to earlier versions?

Since this is an ABI break with very widely visible effects, there is
no chance whatsoever that it would be back-patched.

            regards, tom lane

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Andres Freund

Date:

16 February 2018, 19:24:08

Hi,

On 2018-02-16 09:58:29 -0500, Tom Lane wrote:
> Anna Akenteva <a.akenteva@postgrespro.ru> writes:
> > [ widen StringInfoData max length to size_t ]
> 
> I find this scary as heck.  Have you spent any time looking at the
> side effects?  There are probably hundreds of places that expect that
> stringinfos won't get larger than 1GB.

FWIW, I think we're going to have to bite that bullet sooner rather than
later. I do agree it's not going to fix this issue for real, and that
we're not going to backpatch it.

Greetings,

Andres Freund

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Tom Lane

Date:

16 February 2018, 19:41:33

Andres Freund <andres@anarazel.de> writes:
> On 2018-02-16 09:58:29 -0500, Tom Lane wrote:
>> Anna Akenteva <a.akenteva@postgrespro.ru> writes:
>>> [ widen StringInfoData max length to size_t ]

>> I find this scary as heck.  Have you spent any time looking at the
>> side effects?  There are probably hundreds of places that expect that
>> stringinfos won't get larger than 1GB.

> FWIW, I think we're going to have to bite that bullet sooner rather than
> later. I do agree it's not going to fix this issue for real, and that
> we're not going to backpatch it.

I'm not necessarily saying we shouldn't consider widening this.
I'm just saying it's going to take a good deal of cross-checking for
consequences, in particular that nothing is at risk of integer overflow
if it's presented with a very long StringInfo.

One way to limit the side effects would be to have StringInfos default to
only allowing 1GB of content as before, and you have to do something extra
at creation time to let one get bigger.

There's still the problem that the wire protocol will limit us to 2GB
(or maybe 4GB if you want to be brave^Wfoolhardy and assume clients think
the width fields are unsigned).  I can't get hugely excited about moving
the goalposts only from 1GB to 2GB ...

            regards, tom lane

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Alvaro Herrera

Date:

16 February 2018, 20:25:29

Tom Lane wrote:
> Anna Akenteva <a.akenteva@postgrespro.ru> writes:
> > [ widen StringInfoData max length to size_t ]
> 
> I find this scary as heck.  Have you spent any time looking at the
> side effects?  There are probably hundreds of places that expect that
> stringinfos won't get larger than 1GB.

See these commits:
    fa2fa9955280 42f50cb8fa98 b66adb7b0c83
and the discussion threads linked in the commit messages.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Anna Akenteva

Date:

16 February 2018, 22:00:04

Tom Lane writes 2018-02-16 17:58:
> Also, I don't entirely see how this fixes your stated goal of being
> able to select a bytea value whose textual representation exceeds 1GB.
It's not necessarily my goal. My goal is to avoid the confusing 
situation where you insert something into a table and suddenly 
everything seems to break for no reason and you don't get any 
information on what to do next. As I see it, it could be solved with:
a) allowing including big bytea values but making sure that it doesn't 
cause problems (which I tried to do with my patch)
b) prohibiting inserting the kind of data that will cause problems
c) informing the user about the issue (maybe documenting this behaviour 
or giving a more informative error message)

So far the weird behaviour of big bytea values that I see boils down to:
1) We can't SELECT it
after INSERTing it and there's no clear explanation as to why. It does 
make sense that we can insert a 900MB value into a table and then we 
can't select it due to its textual representation taking up more than 
1GB. It's confusing for whoever uses Postgres though. It doesn't seem to 
be documented anywhere (correct me if I'm wrong) and you don't get to 
see any hints like "don't worry, you can retrieve the data, but use COPY 
in binary format for that".

2) We can't use pg_dump
on a database that has a big bytea value, it will just show the same 
error as when we try to select the value. And again, it doesn't explain 
anything in the error message and I couldn't find it documented 
anywhere. It's weird that it would just allow me to insert a value that 
will make pg_dump unusable (although maybe there is a good enough way to 
workaround it that I'm not aware of).

> The wire protocol can't support that either, and even if it did,
> I wonder how many client programs could cope.  Extremely wide tuple
> values create pain points in many places.
I see how it can create a lot of problems. I do agree that making the 
max length bigger doesn't really seem to be a good solution and I see 
now how it's hard to implement properly. I don't see other ways to make 
it work so far though. If it can't be fixed anytime soon, do you think 
that documenting this behavior could be worth it?

-- 
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Robert Haas

Date:

27 February 2018, 21:47:45

On Fri, Feb 16, 2018 at 2:00 PM, Anna Akenteva
<a.akenteva@postgrespro.ru> wrote:
> It's not necessarily my goal. My goal is to avoid the confusing situation
> where you insert something into a table and suddenly everything seems to
> break for no reason and you don't get any information on what to do next. As
> I see it, it could be solved with:
> a) allowing including big bytea values but making sure that it doesn't cause
> problems (which I tried to do with my patch)
> b) prohibiting inserting the kind of data that will cause problems
> c) informing the user about the issue (maybe documenting this behaviour or
> giving a more informative error message)

+1.  We don't have to support everything, but things that don't work
should fail on insertion, not retrieval.  Otherwise what we have is
less a database and more a data black hole.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Tom Lane

Date:

27 February 2018, 22:17:55

Robert Haas <robertmhaas@gmail.com> writes:
> +1.  We don't have to support everything, but things that don't work
> should fail on insertion, not retrieval.  Otherwise what we have is
> less a database and more a data black hole.

That sounds nice as a principle but I'm not sure how workable it really
is.  Do you want to reject text strings that fit fine in, say, LATIN1
encoding, but might be overlength if some client tries to read them in
UTF8 encoding?  (bytea would have a comparable problem with escape vs hex
representation, for instance.)  Should the limit vary depending on how
many columns are in the table?  Should we account for client-side tuple
length restrictions?

Anyway, as Alvaro pointed out upthread, we've been down this particular
path before and it didn't work out.  We need to learn something from that
failure and decide how to move forward.

            regards, tom lane

Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

From

Robert Haas

Date:

27 February 2018, 23:58:34

On Tue, Feb 27, 2018 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> +1.  We don't have to support everything, but things that don't work
>> should fail on insertion, not retrieval.  Otherwise what we have is
>> less a database and more a data black hole.
>
> That sounds nice as a principle but I'm not sure how workable it really
> is.  Do you want to reject text strings that fit fine in, say, LATIN1
> encoding, but might be overlength if some client tries to read them in
> UTF8 encoding?  (bytea would have a comparable problem with escape vs hex
> representation, for instance.)  Should the limit vary depending on how
> many columns are in the table?  Should we account for client-side tuple
> length restrictions?

I suppose what I really want is to have a limit that's large enough
for how big the retrieved data can be that people stop hitting it.

> Anyway, as Alvaro pointed out upthread, we've been down this particular
> path before and it didn't work out.  We need to learn something from that
> failure and decide how to move forward.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company