Thread: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Hello! If I create a big bytea value and try to select it from a table, I get an error, something like: "ERROR: invalid memory alloc request size ...". So basically we can insert data into a table but then we can't even work with it. Sounds like a bug. Attaching a patch that fixes it (applies to 2a41507dab0f293ff241fe8ae326065998668af8). And as it seems like quite a serious issue, would it be possible to backport a fix for it to earlier versions? HOW TO RECREATE: 1) generate some random data (in this case, 600 MB): dd if=/dev/urandom of=rand.dat bs=1M count=600 2) postgres=# select lo_import('/PATH/TO/rand.dat'); lo_import ----------- 16397 [USE THIS ID FOR THE NEXT STEP] (1 row) 3) postgres=# create table big_data as select (string_agg(data,'')) as data from pg_largeobject where loid =16397; SELECT 1 4) postgres=# select * from big_data; ERROR: invalid memory alloc request size 1468006403 -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Anna Akenteva <a.akenteva@postgrespro.ru> writes: > [ widen StringInfoData max length to size_t ] I find this scary as heck. Have you spent any time looking at the side effects? There are probably hundreds of places that expect that stringinfos won't get larger than 1GB. Also, I don't entirely see how this fixes your stated goal of being able to select a bytea value whose textual representation exceeds 1GB. The wire protocol can't support that either, and even if it did, I wonder how many client programs could cope. Extremely wide tuple values create pain points in many places. > And as it seems like quite a serious issue, would it be possible to > backport a fix for it to earlier versions? Since this is an ABI break with very widely visible effects, there is no chance whatsoever that it would be back-patched. regards, tom lane
Hi, On 2018-02-16 09:58:29 -0500, Tom Lane wrote: > Anna Akenteva <a.akenteva@postgrespro.ru> writes: > > [ widen StringInfoData max length to size_t ] > > I find this scary as heck. Have you spent any time looking at the > side effects? There are probably hundreds of places that expect that > stringinfos won't get larger than 1GB. FWIW, I think we're going to have to bite that bullet sooner rather than later. I do agree it's not going to fix this issue for real, and that we're not going to backpatch it. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-02-16 09:58:29 -0500, Tom Lane wrote: >> Anna Akenteva <a.akenteva@postgrespro.ru> writes: >>> [ widen StringInfoData max length to size_t ] >> I find this scary as heck. Have you spent any time looking at the >> side effects? There are probably hundreds of places that expect that >> stringinfos won't get larger than 1GB. > FWIW, I think we're going to have to bite that bullet sooner rather than > later. I do agree it's not going to fix this issue for real, and that > we're not going to backpatch it. I'm not necessarily saying we shouldn't consider widening this. I'm just saying it's going to take a good deal of cross-checking for consequences, in particular that nothing is at risk of integer overflow if it's presented with a very long StringInfo. One way to limit the side effects would be to have StringInfos default to only allowing 1GB of content as before, and you have to do something extra at creation time to let one get bigger. There's still the problem that the wire protocol will limit us to 2GB (or maybe 4GB if you want to be brave^Wfoolhardy and assume clients think the width fields are unsigned). I can't get hugely excited about moving the goalposts only from 1GB to 2GB ... regards, tom lane
Tom Lane wrote: > Anna Akenteva <a.akenteva@postgrespro.ru> writes: > > [ widen StringInfoData max length to size_t ] > > I find this scary as heck. Have you spent any time looking at the > side effects? There are probably hundreds of places that expect that > stringinfos won't get larger than 1GB. See these commits: fa2fa9955280 42f50cb8fa98 b66adb7b0c83 and the discussion threads linked in the commit messages. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tom Lane writes 2018-02-16 17:58: > Also, I don't entirely see how this fixes your stated goal of being > able to select a bytea value whose textual representation exceeds 1GB. It's not necessarily my goal. My goal is to avoid the confusing situation where you insert something into a table and suddenly everything seems to break for no reason and you don't get any information on what to do next. As I see it, it could be solved with: a) allowing including big bytea values but making sure that it doesn't cause problems (which I tried to do with my patch) b) prohibiting inserting the kind of data that will cause problems c) informing the user about the issue (maybe documenting this behaviour or giving a more informative error message) So far the weird behaviour of big bytea values that I see boils down to: 1) We can't SELECT it after INSERTing it and there's no clear explanation as to why. It does make sense that we can insert a 900MB value into a table and then we can't select it due to its textual representation taking up more than 1GB. It's confusing for whoever uses Postgres though. It doesn't seem to be documented anywhere (correct me if I'm wrong) and you don't get to see any hints like "don't worry, you can retrieve the data, but use COPY in binary format for that". 2) We can't use pg_dump on a database that has a big bytea value, it will just show the same error as when we try to select the value. And again, it doesn't explain anything in the error message and I couldn't find it documented anywhere. It's weird that it would just allow me to insert a value that will make pg_dump unusable (although maybe there is a good enough way to workaround it that I'm not aware of). > The wire protocol can't support that either, and even if it did, > I wonder how many client programs could cope. Extremely wide tuple > values create pain points in many places. I see how it can create a lot of problems. I do agree that making the max length bigger doesn't really seem to be a good solution and I see now how it's hard to implement properly. I don't see other ways to make it work so far though. If it can't be fixed anytime soon, do you think that documenting this behavior could be worth it? -- Anna Akenteva Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Feb 16, 2018 at 2:00 PM, Anna Akenteva <a.akenteva@postgrespro.ru> wrote: > It's not necessarily my goal. My goal is to avoid the confusing situation > where you insert something into a table and suddenly everything seems to > break for no reason and you don't get any information on what to do next. As > I see it, it could be solved with: > a) allowing including big bytea values but making sure that it doesn't cause > problems (which I tried to do with my patch) > b) prohibiting inserting the kind of data that will cause problems > c) informing the user about the issue (maybe documenting this behaviour or > giving a more informative error message) +1. We don't have to support everything, but things that don't work should fail on insertion, not retrieval. Otherwise what we have is less a database and more a data black hole. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > +1. We don't have to support everything, but things that don't work > should fail on insertion, not retrieval. Otherwise what we have is > less a database and more a data black hole. That sounds nice as a principle but I'm not sure how workable it really is. Do you want to reject text strings that fit fine in, say, LATIN1 encoding, but might be overlength if some client tries to read them in UTF8 encoding? (bytea would have a comparable problem with escape vs hex representation, for instance.) Should the limit vary depending on how many columns are in the table? Should we account for client-side tuple length restrictions? Anyway, as Alvaro pointed out upthread, we've been down this particular path before and it didn't work out. We need to learn something from that failure and decide how to move forward. regards, tom lane
On Tue, Feb 27, 2018 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> +1. We don't have to support everything, but things that don't work >> should fail on insertion, not retrieval. Otherwise what we have is >> less a database and more a data black hole. > > That sounds nice as a principle but I'm not sure how workable it really > is. Do you want to reject text strings that fit fine in, say, LATIN1 > encoding, but might be overlength if some client tries to read them in > UTF8 encoding? (bytea would have a comparable problem with escape vs hex > representation, for instance.) Should the limit vary depending on how > many columns are in the table? Should we account for client-side tuple > length restrictions? I suppose what I really want is to have a limit that's large enough for how big the retrieved data can be that people stop hitting it. > Anyway, as Alvaro pointed out upthread, we've been down this particular > path before and it didn't work out. We need to learn something from that > failure and decide how to move forward. Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company