Thread: A bug when use get_bit() function for a long bytea string
Hello hackers,
I found an issue about get_bit() and set_bit() function,here it is:
################################
postgres=# select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
2020-03-12 10:05:23.296 CST [10549] ERROR: index 0 out of valid range, 0..-1
2020-03-12 10:05:23.296 CST [10549] STATEMENT: select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0);
ERROR: index 0 out of valid range, 0..-1
postgres=# select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
2020-03-12 10:05:27.959 CST [10549] ERROR: index 0 out of valid range, 0..-1
2020-03-12 10:05:27.959 CST [10549] STATEMENT: select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1);
ERROR: index 0 out of valid range, 0..-1
postgres=#
################################
PostgreSQL can handle bytea size nearby 1G, but now it reports an
error when 512M. And I research it and found it is byteaSetBit() and
byteaGetBit(), it uses an 'int32 len' to hold bit numbers for the long
bytea data, and obvious 512M * 8bit is an overflow for an int32.
So I fix it and test ok, as below.
################################
postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1),0); get_bit --------- 1 (1 row) postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,0),0); get_bit --------- 0 (1 row) postgres=#
################################
And I do a check about if anything else related bytea has this issue, several codes have the same issue:
1. byteaout() When formatting bytea as an escape, the 'len' variable should be int64, or
it may use an overflowing number. 2. esc_enc_len() Same as above, the 'len' variable should be int64, and the return type
should change as int64. Due to esc_enc_len() has same call struct with pg_base64_enc_len() and hex_enc_len(), so I want to change the return value of the two function. And the opposite function esc_dec_len() seem nothing wrong. 3. binary_encode() and binary_decode() Here use an 'int32 resultlen' to accept an 'unsigned int' function return, which seem unfortable.
I fix all mentioned above, and patch attachments.
How do you think about that?
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
Hi On Thu, Mar 12, 2020 at 9:21 AM movead.li@highgo.ca <movead.li@highgo.ca> wrote: > > Hello hackers, > > I found an issue about get_bit() and set_bit() function,here it is: > ################################ > postgres=# select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0); > 2020-03-12 10:05:23.296 CST [10549] ERROR: index 0 out of valid range, 0..-1 > 2020-03-12 10:05:23.296 CST [10549] STATEMENT: select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'),0); > ERROR: index 0 out of valid range, 0..-1 > postgres=# select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1); > 2020-03-12 10:05:27.959 CST [10549] ERROR: index 0 out of valid range, 0..-1 > 2020-03-12 10:05:27.959 CST [10549] STATEMENT: select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'),0,1); > ERROR: index 0 out of valid range, 0..-1 > postgres=# > ################################ > PostgreSQL can handle bytea size nearby 1G, but now it reports an > error when 512M. And I research it and found it is byteaSetBit() and > byteaGetBit(), it uses an 'int32 len' to hold bit numbers for the long > bytea data, and obvious 512M * 8bit is an overflow for an int32. > So I fix it and test ok, as below. > ################################ Thanks for the bug report and the analysis. The analysis looks correct. > postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1),0); get_bit ---------1 (1 row) postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,0),0);get_bit --------- 0 (1 row) postgres=# > ################################ > > > And I do a check about if anything else related bytea has this issue, several codes have the same issue: > 1. byteaout() When formatting bytea as an escape, the 'len' variable should be int64, or > it may use an overflowing number. 2. esc_enc_len() Same as above, the 'len' variable should be int64, and the return type > should change as int64. Due to esc_enc_len() has same call struct with pg_base64_enc_len() and hex_enc_len(), so I wantto change the return value of the two function. And the opposite function esc_dec_len() seem nothing wrong. 3. binary_encode()and binary_decode() Here use an 'int32 resultlen' to accept an 'unsigned int' function return, which seemunfortable. > I fix all mentioned above, and patch attachments. > How do you think about that? Why have you used size? Shouldn't we use int64? Also in the change @@ -3458,15 +3458,15 @@ byteaGetBit(PG_FUNCTION_ARGS) int32 n = PG_GETARG_INT32(1); int byteNo, bitNo; - int len; + Size len; If get_bit()/set_bit() accept the second argument as int32, it can not be used to set bits whose number does not fit 32 bits. I think we need to change the type of the second argument as well. Also, I think declaring len to be int is fine since 1G would fit an int, but what does not fit is len * 8, when performing that calculation, we have to widen the result. So, instead of changing the datatype of len, it might be better to perform the calculation as (int64)len * 8. If we use int64, we could also use INT64_FORMAT instead of using %ld. Since this is a bug it shouldn't wait another commitfest, but still add this patch to the commitfest so that it's not forgotten. -- Best Wishes, Ashutosh Bapat
Thanks for the reply.
>Why have you used size? Shouldn't we use int64?
Yes, thanks for the point, I have changed the patch.
>If get_bit()/set_bit() accept the second argument as int32, it can not
>be used to set bits whose number does not fit 32 bits. I think we need
>to change the type of the second argument as well.
Because int32 can cover the length of bytea that PostgreSQL support,
and I have decided to follow your next point 'not use 64bit int for len',
so I think the second argument can keep int32.
>Also, I think declaring len to be int is fine since 1G would fit an
>int, but what does not fit is len * 8, when performing that
>calculation, we have to widen the result. So, instead of changing the
>datatype of len, it might be better to perform the calculation as
>(int64)len * 8. If we use int64, we could also use INT64_FORMAT
>instead of using %ld.
Have followed and changed the patch.
>Since this is a bug it shouldn't wait another commitfest, but still
>add this patch to the commitfest so that it's not forgotten.
Will do.
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
Thanks for the reply.>Why have you used size? Shouldn't we use int64?Yes, thanks for the point, I have changed the patch.
Thanks for the patch.
>If get_bit()/set_bit() accept the second argument as int32, it can not>be used to set bits whose number does not fit 32 bits. I think we need>to change the type of the second argument as well.Because int32 can cover the length of bytea that PostgreSQL support,
I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument needs to be wider i.e. int64.
Some more comments on the patch
struct pg_encoding
{
- unsigned (*encode_len) (const char *data, unsigned dlen);
+ int64 (*encode_len) (const char *data, unsigned dlen);
unsigned (*decode_len) (const char *data, unsigned dlen);
unsigned (*encode) (const char *data, unsigned dlen, char *res);
unsigned (*decode) (const char *data, unsigned dlen, char *res);
Why not use return type of int64 for rest of the functions here as well?
res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
/* Make this FATAL 'cause we've trodden on memory ... */
- if (res > resultlen)
+ if ((int64)res > resultlen)
if we change return type of all those functions to int64, we won't need this cast.
Right now we are using int64 because bytea can be 1GB, but what if we increase
that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
future, but nevertheless a possibility. Should we then define a new datatype
which resolves to int64 right now but really depends upon the bytea length. I
am not suggesting that we have to do it right now, but may be something to
think about.
hex_enc_len(const char *src, unsigned srclen)
{
- return srclen << 1;
+ return (int64)(srclen << 1);
why not to convert srclen also to int64. That will also change the pg_encoding
member definitions. But if encoded length requires int64 to fit the possibly
values, same would be true for the source lengths. Why can't the source length
also be int64?
If still we want the cast, I think it should be ((int64)srclen << 1) rather
than casting the result.
/* 3 bytes will be converted to 4, linefeed after 76 chars */
- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
similar comments as above.
SELECT set_bit(B'0101011000100100', 16, 1); -- fail
ERROR: bit index 16 out of valid range (0..15)
+SELECT get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 0)
+ ,0);
+ get_bit
+---------
+ 0
+(1 row)
It might help to add a test where we could pass the second argument something
greater than 1G. But it may be difficult to write such a test case.
{
- unsigned (*encode_len) (const char *data, unsigned dlen);
+ int64 (*encode_len) (const char *data, unsigned dlen);
unsigned (*decode_len) (const char *data, unsigned dlen);
unsigned (*encode) (const char *data, unsigned dlen, char *res);
unsigned (*decode) (const char *data, unsigned dlen, char *res);
Why not use return type of int64 for rest of the functions here as well?
res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
/* Make this FATAL 'cause we've trodden on memory ... */
- if (res > resultlen)
+ if ((int64)res > resultlen)
if we change return type of all those functions to int64, we won't need this cast.
Right now we are using int64 because bytea can be 1GB, but what if we increase
that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
future, but nevertheless a possibility. Should we then define a new datatype
which resolves to int64 right now but really depends upon the bytea length. I
am not suggesting that we have to do it right now, but may be something to
think about.
hex_enc_len(const char *src, unsigned srclen)
{
- return srclen << 1;
+ return (int64)(srclen << 1);
why not to convert srclen also to int64. That will also change the pg_encoding
member definitions. But if encoded length requires int64 to fit the possibly
values, same would be true for the source lengths. Why can't the source length
also be int64?
If still we want the cast, I think it should be ((int64)srclen << 1) rather
than casting the result.
/* 3 bytes will be converted to 4, linefeed after 76 chars */
- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
similar comments as above.
SELECT set_bit(B'0101011000100100', 16, 1); -- fail
ERROR: bit index 16 out of valid range (0..15)
+SELECT get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 0)
+ ,0);
+ get_bit
+---------
+ 0
+(1 row)
It might help to add a test where we could pass the second argument something
greater than 1G. But it may be difficult to write such a test case.
Best Wishes,
Ashutosh
Hello thanks for the detailed review,>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.Yes, it makes sence and followed.> Some more comments on the patch> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
> unsigned (*decode_len) (const char *data, unsigned dlen);> unsigned (*encode) (const char *data, unsigned dlen, char *res);> unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
> /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this cast.I change the 'encode' function, it needs an int64 return type, but keep otherfunctions in 'pg_encoding', because I think it of no necessary reason.
>Right now we are using int64 because bytea can be 1GB, but what if we increase
>that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
>future, but nevertheless a possibility. Should we then define a new datatype
>which resolves to int64 right now but really depends upon the bytea length. I
>am not suggesting that we have to do it right now, but may be something to
>think about.I decide to use int64 because if we really want to increase the limit, it should bethe same change with 'text', 'varchar' which have the same limit. So it may needa more general struct. Hence I give up the idea.
> hex_enc_len(const char *src, unsigned srclen)
> {
>- return srclen << 1;
>+ return (int64)(srclen << 1);
>
>why not to convert srclen also to int64. That will also change the pg_encoding
>member definitions. But if encoded length requires int64 to fit the possibly
>values, same would be true for the source lengths. Why can't the source length
>also be int64?>If still we want the cast, I think it should be ((int64)srclen << 1) rather
>than casting the result.
I prefer the '((int64)srclen << 1)' way.
> /* 3 bytes will be converted to 4, linefeed after 76 chars */
>- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
>+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
>similar comments as above.
Followed.
>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.Add two test cases.Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
Hello thanks for the detailed review,>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.Yes, it makes sence and followed.
I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
> Some more comments on the patch> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
> unsigned (*decode_len) (const char *data, unsigned dlen);> unsigned (*encode) (const char *data, unsigned dlen, char *res);> unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
> /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this cast.I change the 'encode' function, it needs an int64 return type, but keep other
functions in 'pg_encoding', because I think it of no necessary reason.
Ok, let's leave it for a committer to decide.
>Right now we are using int64 because bytea can be 1GB, but what if we increase
>that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
>future, but nevertheless a possibility. Should we then define a new datatype
>which resolves to int64 right now but really depends upon the bytea length. I
>am not suggesting that we have to do it right now, but may be something to
>think about.I decide to use int64 because if we really want to increase the limit, it should bethe same change with 'text', 'varchar' which have the same limit. So it may needa more general struct. Hence I give up the idea.
Hmm, Let's see what a committer says.
Some more review comments.
+ int64 res,resultlen;
We need those on separate lines, possibly.
+ byteNo = (int32)(n / BITS_PER_BYTE);
Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
add a comment explaining the reason for the cast. The comment applies at other
places where this change appears.
- int len;
+ int64 len;
Why do we need this change?
int i;
We need those on separate lines, possibly.
+ byteNo = (int32)(n / BITS_PER_BYTE);
Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
add a comment explaining the reason for the cast. The comment applies at other
places where this change appears.
- int len;
+ int64 len;
Why do we need this change?
int i;
>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.Add two test cases.
+
+select get_bit(
+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 1024 * 1024 + 1, 0)
+ ,1024 * 1024 * 1024 + 1);
This bit position is still within int4.
postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
pg_column_size
----------------
4
(1 row)
You want something like
postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
pg_column_size
----------------
8
(1 row)
Best Wishes,
Ashutosh
Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> writes: > On Wed, 18 Mar 2020 at 08:18, movead.li@highgo.ca <movead.li@highgo.ca> > wrote: >> if we change return type of all those functions to int64, we won't need >> this cast. >> I change the 'encode' function, it needs an int64 return type, but keep >> other >> functions in 'pg_encoding', because I think it of no necessary reason. > Ok, let's leave it for a committer to decide. If I'm grasping the purpose of these correctly, wouldn't Size or size_t be a more appropriate type? And I definitely agree with changing all of these APIs at once, if they're all dealing with the same kind of value. regards, tom lane
On Thu, 26 Mar 2020 at 19:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> writes:
> On Wed, 18 Mar 2020 at 08:18, movead.li@highgo.ca <movead.li@highgo.ca>
> wrote:
>> if we change return type of all those functions to int64, we won't need
>> this cast.
>> I change the 'encode' function, it needs an int64 return type, but keep
>> other
>> functions in 'pg_encoding', because I think it of no necessary reason.
> Ok, let's leave it for a committer to decide.
If I'm grasping the purpose of these correctly, wouldn't Size or size_t
be a more appropriate type?
Andy had used Size in his earlier patch. But I didn't understand the reason behind it and Andy didn't give any reason. From the patch and the code around the changes some kind of int (so int64) looked better. But if there's a valid reason for using Size, I am fine with it too. Do we have a SQL datatype corresponding to Size?
Best Wishes,
Ashutosh
Ashutosh Bapat wrote: > I think we need a similar change in byteaGetBit() and byteaSetBit() > as well. get_bit() and set_bit() as SQL functions take an int4 as the "offset" argument representing the bit number, meaning that the maximum value that can be passed is 2^31-1. But the maximum theorical size of a bytea value being 1 gigabyte or 2^30 bytes, the real maximum bit number in a bytea equals 2^33-1 (2^33=8*2^30), which doesn't fit into an "int4". As a result, the part of a bytea beyond the first 256MB is inaccessible to get_bit() and set_bit(). So aside from the integer overflow bug, isn't there the issue that the "offset" argument of get_bit() and set_bit() should have been an int8 in the first place? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
"Daniel Verite" <daniel@manitou-mail.org> writes: > So aside from the integer overflow bug, isn't there the issue that the > "offset" argument of get_bit() and set_bit() should have been an > int8 in the first place? Good point, but a fix for that wouldn't be back-patchable. It does suggest that we should just make all the internal logic use int8 for these values (as the solution to the overflow issue), and then in HEAD only, adjust the function signatures so that int8 can be passed in. regards, tom lane
>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.Yes, it makes sence and followed.>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.Sorry, I think it's my mistake, it is the two functions above should be changed.> Some more comments on the patch> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
> unsigned (*decode_len) (const char *data, unsigned dlen);> unsigned (*encode) (const char *data, unsigned dlen, char *res);> unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
> /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this cast.I change the 'encode' function, it needs an int64 return type, but keep otherfunctions in 'pg_encoding', because I think it of no necessary reason.>Ok, let's leave it for a committer to decide.Well, I change all of them this time, because Tom Lane supports on next mail.>Some more review comments.>+ int64 res,resultlen;
Done
>We need those on separate lines, possibly.
>+ byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>- int len;
>+ int64 len;
>Why do we need this change?
> int i;It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()' to be changed.
>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.Add two test cases.
>+
>+select get_bit(
>+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 1024 * 1024 + 1, 0)
>+ ,1024 * 1024 * 1024 + 1);
>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
> pg_column_size
>----------------
> 4
>(1 row)
>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
> pg_column_size
>----------------
> 8
>(1 row)I intend to test size large then 1G, and now I think you give a better idea and followed.Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
I want to resent the mail, because last one is in wrong format and hardly to read.In addition, I think 'Size' or 'size_t' is rely on platform, they may can't work on 32bitsystem. So I choose 'int64' after ashutosh's review.>>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.>Yes, it makes sence and followed.
>>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.
Sorry, I think it's my mistake, it is the two functions above should be changed.
>>Some more comments on the patch>> struct pg_encoding>> {>>- unsigned (*encode_len) (const char *data, unsigned dlen);>>+ int64 (*encode_len) (const char *data, unsigned dlen);>> unsigned (*decode_len) (const char *data, unsigned dlen);>> unsigned (*encode) (const char *data, unsigned dlen, char *res);>> unsigned (*decode) (const char *data, unsigned dlen, char *res);>> Why not use return type of int64 for rest of the functions here as well?>> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));>> /* Make this FATAL 'cause we've trodden on memory ... */>>- if (res > resultlen)>>+ if ((int64)res > resultlen)>>
>>if we change return type of all those functions to int64, we won't need this cast.>I change the 'encode' function, it needs an int64 return type, but keep other
>functions in 'pg_encoding', because I think it of no necessary reason.
>>Ok, let's leave it for a committer to decide.
Well, I change all of them this time, because Tom Lane supports on next mail.
>Some more review comments.
>+ int64 res,resultlen;
>We need those on separate lines, possibly.
>We need those on separate lines, possibly.
Done
>+ byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>- int len;
>+ int64 len;
>Why do we need this change?
> int i;
It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()' to be changed.
>>It might help to add a test where we could pass the second argument something
>>greater than 1G. But it may be difficult to write such a test case.>Add two test cases.
>+
>+select get_bit(
>+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 1024 * 1024 + 1, 0)
>+ ,1024 * 1024 * 1024 + 1);
>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
> pg_column_size
>----------------
> 4
>(1 row)
>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
> pg_column_size
>----------------
> 8
>(1 row)
I intend to test size large then 1G, and now I think you give a better idea and followed.
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Thanks for the changes,
+ int64 res,resultlen;
It's better to have them on separate lines.
-unsigned
+int64
hex_decode(const char *src, unsigned len, char *dst)
+int64
hex_decode(const char *src, unsigned len, char *dst)
Do we want to explicitly cast the return value to int64? Will build on some platform crib if not done so? I don't know of such a platform but my knowledge in this area is not great.
+ byteNo = (int)(n / 8);
+ bitNo = (int)(n % 8);
+ bitNo = (int)(n % 8);
some comment explaining why this downcasting is safe here?
- proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int4',
+ proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
prosrc => 'byteaGetBit' },
{ oid => '724', descr => 'set bit',
- proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int4 int4',
+ proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int8 int4',
prosrc => 'byteaSetBit' },
+ proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
prosrc => 'byteaGetBit' },
{ oid => '724', descr => 'set bit',
- proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int4 int4',
+ proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int8 int4',
prosrc => 'byteaSetBit' },
Shouldn't we have similar changes for following entries as well?
{ oid => '3032', descr => 'get bit',
proname => 'get_bit', prorettype => 'int4', proargtypes => 'bit int4',
prosrc => 'bitgetbit' },
{ oid => '3033', descr => 'set bit',
proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
prosrc => 'bitsetbit' },
proname => 'get_bit', prorettype => 'int4', proargtypes => 'bit int4',
prosrc => 'bitgetbit' },
{ oid => '3033', descr => 'set bit',
proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
prosrc => 'bitsetbit' },
The tests you have added are for bytea variant which ultimately calles byteaGet/SetBit(). But I think we also need tests for bit variants which will ultimately call bitgetbit and bitsetbit functions.
Once you address these comments, I think the patch is good for a committer. So please mark the commitfest entry as such when you post the next version of patch.
I want to resent the mail, because last one is in wrong format and hardly to read.In addition, I think 'Size' or 'size_t' is rely on platform, they may can't work on 32bitsystem. So I choose 'int64' after ashutosh's review.>>I think the second argument indicates the bit position, which would be max bytea length * 8. If max bytea length covers whole int32, the second argument >needs to be wider i.e. int64.>Yes, it makes sence and followed.>>I think we need a similar change in byteaGetBit() and byteaSetBit() as well.Sorry, I think it's my mistake, it is the two functions above should be changed.>>Some more comments on the patch>> struct pg_encoding>> {>>- unsigned (*encode_len) (const char *data, unsigned dlen);>>+ int64 (*encode_len) (const char *data, unsigned dlen);>> unsigned (*decode_len) (const char *data, unsigned dlen);>> unsigned (*encode) (const char *data, unsigned dlen, char *res);>> unsigned (*decode) (const char *data, unsigned dlen, char *res);>> Why not use return type of int64 for rest of the functions here as well?>> res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));>> /* Make this FATAL 'cause we've trodden on memory ... */>>- if (res > resultlen)>>+ if ((int64)res > resultlen)>>
>>if we change return type of all those functions to int64, we won't need this cast.>I change the 'encode' function, it needs an int64 return type, but keep other>functions in 'pg_encoding', because I think it of no necessary reason.>>Ok, let's leave it for a committer to decide.Well, I change all of them this time, because Tom Lane supports on next mail.>Some more review comments.>+ int64 res,resultlen;
>We need those on separate lines, possibly.Done
>+ byteNo = (int32)(n / BITS_PER_BYTE);
>Does it hurt to have byteNo as int64 so as to avoid a cast. Otherwise, please
>add a comment explaining the reason for the cast. The comment applies at other
>places where this change appears.
>- int len;
>+ int64 len;
>Why do we need this change?
> int i;It is my mistake as describe above, it should not be 'bitgetbit()/bitsetbit()' to be changed.
>>It might help to add a test where we could pass the second argument something
>>greater than 1G. But it may be difficult to write such a test case.>Add two test cases.
>+
>+select get_bit(
>+ set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 1024 * 1024 * 1024 + 1, 0)
>+ ,1024 * 1024 * 1024 + 1);
>This bit position is still within int4.
>postgres=# select pg_column_size(1024 * 1024 * 1024 + 1);
> pg_column_size
>----------------
> 4
>(1 row)
>You want something like
>postgres=# select pg_column_size(512::bigint * 1024 * 1024 * 8);
> pg_column_size
>----------------
> 8
>(1 row)I intend to test size large then 1G, and now I think you give a better idea and followed.Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Best Wishes,
Ashutosh
>+ int64 res,resultlen;
>It's better to have them on separate lines.
Sorry for that, done.
>-unsigned
>+int64
> hex_decode(const char *src, unsigned len, char *dst)
>+int64
> hex_decode(const char *src, unsigned len, char *dst)
>Do we want to explicitly cast the return value to int64? Will build on some platform crib if not done so?
>I don't know of such a platform but my knowledge in this area is not great.
I think current change can make sure nothing wrong.
>+ byteNo = (int)(n / 8);
>+ bitNo = (int)(n % 8);
>+ bitNo = (int)(n % 8);
>some comment explaining why this downcasting is safe here?
Done
>- proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int4',
>+ proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
> prosrc => 'byteaGetBit' },
> { oid => '724', descr => 'set bit',
>+ proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
> prosrc => 'byteaGetBit' },
> { oid => '724', descr => 'set bit',
>- proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int4 int4',
>+ proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int8 int4',
> prosrc => 'byteaSetBit' },
>+ proname => 'set_bit', prorettype => 'bytea', proargtypes => 'bytea int8 int4',
> prosrc => 'byteaSetBit' },
>Shouldn't we have similar changes for following entries as well?
>{ oid => '3032', descr => 'get bit',
> proname => 'get_bit', prorettype => 'int4', proargtypes => 'bit int4',
> proname => 'get_bit', prorettype => 'int4', proargtypes => 'bit int4',
> prosrc => 'bitgetbit' },
>{ oid => '3033', descr => 'set bit',
> proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
> prosrc => 'bitsetbit' },
>{ oid => '3033', descr => 'set bit',
> proname => 'set_bit', prorettype => 'bit', proargtypes => 'bit int4 int4',
> prosrc => 'bitsetbit' },
Because 'bitsetbit' and 'bitgetbit' do not have to calculate bit size by 'multiply 8',
so I think it seems need not to change it.
>The tests you have added are for bytea variant which ultimately calles byteaGet/SetBit().
>But I think we also need tests for bit variants which will ultimately call bitgetbit and bitsetbit functions.
As above, it need not to touch 'bitgetbit' and 'bitsetbit'.
>Once you address these comments, I think the patch is good for a committer.
>So please mark the commitfest entry as such when you post the next version of patch.
Thanks a lot for the detailed review again, and changed patch attached.
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
"movead.li@highgo.ca" <movead.li@highgo.ca> writes: > [ long_bytea_string_bug_fix_ver5.patch ] I don't think this has really solved the overflow hazards. For example, in binary_encode we've got resultlen = enc->encode_len(VARDATA_ANY(data), datalen); result = palloc(VARHDRSZ + resultlen); and all you've done about that is changed resultlen from int to int64. On a 64-bit machine, sure, palloc will be able to detect if the result exceeds what can be allocated --- but on a 32-bit machine it'd be possible for the size_t argument to overflow altogether. (Or if you want to argue that it can't overflow because no encoder expands the data more than 4X, then we don't need to be making this change at all.) I don't think there's really any way to do that safely without an explicit check before we call palloc. I also still find the proposed signatures for the encoding-specific functions to be just plain weird: - unsigned (*encode_len) (const char *data, unsigned dlen); - unsigned (*decode_len) (const char *data, unsigned dlen); - unsigned (*encode) (const char *data, unsigned dlen, char *res); - unsigned (*decode) (const char *data, unsigned dlen, char *res); + int64 (*encode_len) (const char *data, unsigned dlen); + int64 (*decode_len) (const char *data, unsigned dlen); + int64 (*encode) (const char *data, unsigned dlen, char *res); + int64 (*decode) (const char *data, unsigned dlen, char *res); Why did you change the outputs from unsigned to signed? Why didn't you change the dlen inputs? I grant that we know that the input can't exceed 1GB in Postgres' usage, but these signatures are just randomly inconsistent, and you didn't even add a comment explaining why. Personally I think I'd make them like uint64 (*encode_len) (const char *data, size_t dlen); which makes it explicit that the dlen argument describes the length of a chunk of allocated memory, while the result might exceed that. Lastly, there is a component of this that can be back-patched and a component that can't --- we do not change system catalog contents in released branches. Cramming both parts into the same patch is forcing the committer to pull them apart, which is kind of unfriendly. regards, tom lane
>I don't think this has really solved the overflow hazards. For example,
>in binary_encode we've got
>resultlen = enc->encode_len(VARDATA_ANY(data), datalen);
>result = palloc(VARHDRSZ + resultlen);
>and all you've done about that is changed resultlen from int to int64.
>On a 64-bit machine, sure, palloc will be able to detect if the
>result exceeds what can be allocated --- but on a 32-bit machine
>it'd be possible for the size_t argument to overflow altogether.
>(Or if you want to argue that it can't overflow because no encoder
>expands the data more than 4X, then we don't need to be making this
>change at all.)
>I don't think there's really any way to do that safely without an
>explicit check before we call palloc.
I am sorry I do not very understand these words, and especially
what's the mean by 'size_t'.
Here I change resultlen from int to int64, is because we can get a right
error report value other than '-1' or another strange number.
>Why did you change the outputs from unsigned to signed? Why didn't
>you change the dlen inputs? I grant that we know that the input
>can't exceed 1GB in Postgres' usage, but these signatures are just
>randomly inconsistent, and you didn't even add a comment explaining
>why. Personally I think I'd make them like
>uint64 (*encode_len) (const char *data, size_t dlen);
>which makes it explicit that the dlen argument describes the length
>of a chunk of allocated memory, while the result might exceed that.
I think it makes sence and followed.
>Lastly, there is a component of this that can be back-patched and
>a component that can't --- we do not change system catalog contents
>in released branches. Cramming both parts into the same patch
>is forcing the committer to pull them apart, which is kind of
>unfriendly.
Sorry about that, attached is the changed patch for PG13, and the one
for older branches will send sooner.
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
>Sorry about that, attached is the changed patch for PG13, and the one
>for older branches will send sooner.
A little update for the patch, and patches for all stable avilable.
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
movead.li@highgo.ca wrote: > A little update for the patch, and patches for all stable avilable. Some comments about the set_bit/get_bit parts. I'm reading long_bytea_string_bug_fix_ver6_pg10.patch, but that applies probably to the other files meant for the existing releases (I think you could get away with only one patch for backpatching and one patch for v13, and committers will sort out how to deal with release branches). byteaSetBit(PG_FUNCTION_ARGS) { bytea *res = PG_GETARG_BYTEA_P_COPY(0); - int32 n = PG_GETARG_INT32(1); + int64 n = PG_GETARG_INT64(1); int32 newBit = PG_GETARG_INT32(2); The 2nd argument is 32-bit, not 64. PG_GETARG_INT32(1) must be used. + errmsg("index "INT64_FORMAT" out of valid range, 0.."INT64_FORMAT, + n, (int64)len * 8 - 1))); The cast to int64 avoids the overflow, but it may also produce a result that does not reflect the actual range, which is limited to 2^31-1, again because the bit number is a signed 32-bit value. I believe the formula for the upper limit in the 32-bit case is: (len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX; These functions could benefit from a comment mentioning that they cannot reach the full extent of a bytea, because of the size limit on the bit number. --- a/src/test/regress/expected/bit.out +++ b/src/test/regress/expected/bit.out @@ -656,6 +656,40 @@ SELECT set_bit(B'0101011000100100', 15, 1); SELECT set_bit(B'0101011000100100', 16, 1); -- fail ERROR: bit index 16 out of valid range (0..15)+SELECT get_bit( + set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 0) + ,0); + get_bit +--------- + 0 +(1 row) + +SELECT get_bit( + set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, 0, 1) + ,0); + get_bit +--------- + 1 +(1 row) + These 2 tests need to allocate big chunks of contiguous memory, so they might fail for lack of memory on tiny machines, and even when not failing, they're pretty slow to run. Are they worth the trouble? +select get_bit( + set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, + 512::bigint * 1024 * 1024 * 8 - 1, 0) + ,512::bigint * 1024 * 1024 * 8 - 1); + get_bit +--------- + 0 +(1 row) + +select get_bit( + set_bit((repeat('Postgres', 512 * 1024 * 1024 / 8))::bytea, + 512::bigint * 1024 * 1024 * 8 - 1, 1) + ,512::bigint * 1024 * 1024 * 8 - 1); + get_bit +--------- + 1 +(1 row) These 2 tests are supposed to fail in existing releases because set_bit() and get_bit() don't take a bigint as the 2nd argument. Also, the same comment as above on how much they allocate. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
"Daniel Verite" <daniel@manitou-mail.org> writes: > These 2 tests need to allocate big chunks of contiguous memory, so they > might fail for lack of memory on tiny machines, and even when not failing, > they're pretty slow to run. Are they worth the trouble? Yeah, I'd noticed those on previous readings of the patch. They'd almost certainly fail on some of our older/smaller buildfarm members, so they're not getting committed, even if they didn't require multiple seconds apiece to run (even on a machine with plenty of memory). It's useful to have them for initial testing though. It'd be great if there was a way to test get_bit/set_bit on large indexes without materializing a couple of multi-hundred-MB objects. Can't think of one offhand though. regards, tom lane
Re: A bug when use get_bit() function for a long bytea string
From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > "Daniel Verite" <daniel@manitou-mail.org> writes: >> These 2 tests need to allocate big chunks of contiguous memory, so they >> might fail for lack of memory on tiny machines, and even when not failing, >> they're pretty slow to run. Are they worth the trouble? > > Yeah, I'd noticed those on previous readings of the patch. They'd almost > certainly fail on some of our older/smaller buildfarm members, so they're > not getting committed, even if they didn't require multiple seconds apiece > to run (even on a machine with plenty of memory). It's useful to have > them for initial testing though. Perl's test suite has a similar issue with tests for handling of huge strings, hashes, arrays, regexes etc. We've taken the approach of checking the environment variable PERL_TEST_MEMORY and skipping tests that need more than that many gigabytes. We currently have tests that check for values from 1 all the way up to 96 GiB. This would be trivial to do in the Postgres TAP tests, but something similar might feasible in the pg_regress too? > It'd be great if there was a way to test get_bit/set_bit on large > indexes without materializing a couple of multi-hundred-MB objects. > Can't think of one offhand though. For this usecase it might make sense to express the limit in megabytes, and have a policy for how much memory tests can assume without explicit opt-in from the developer or buildfarm animal. - ilmari -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law
ilmari@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Yeah, I'd noticed those on previous readings of the patch. They'd almost >> certainly fail on some of our older/smaller buildfarm members, so they're >> not getting committed, even if they didn't require multiple seconds apiece >> to run (even on a machine with plenty of memory). It's useful to have >> them for initial testing though. > Perl's test suite has a similar issue with tests for handling of huge > strings, hashes, arrays, regexes etc. We've taken the approach of > checking the environment variable PERL_TEST_MEMORY and skipping tests > that need more than that many gigabytes. We currently have tests that > check for values from 1 all the way up to 96 GiB. > This would be trivial to do in the Postgres TAP tests, but something > similar might feasible in the pg_regress too? Meh. The memory is only part of it; the other problem is that multiple seconds expended in every future run of the regression tests is a price that's many orders of magnitude higher than the potential value of this test case. regards, tom lane
>Some comments about the set_bit/get_bit parts.
>I'm reading long_bytea_string_bug_fix_ver6_pg10.patch, but that
>applies probably to the other files meant for the existing releases
>(I think you could get away with only one patch for backpatching
>and one patch for v13, and committers will sort out how
>to deal with release branches).
Thanks for teaching me.
>byteaSetBit(PG_FUNCTION_ARGS)
>{
>bytea *res = PG_GETARG_BYTEA_P_COPY(0);
>- int32 n = PG_GETARG_INT32(1);
>+ int64 n = PG_GETARG_INT64(1);
>int32 newBit = PG_GETARG_INT32(2);
>The 2nd argument is 32-bit, not 64. PG_GETARG_INT32(1) must be used.
>+ errmsg("index "INT64_FORMAT" out of valid range, 0.."INT64_FORMAT,
>+ n, (int64)len * 8 - 1)));
>The cast to int64 avoids the overflow, but it may also produce a
>result that does not reflect the actual range, which is limited to
>2^31-1, again because the bit number is a signed 32-bit value.
>I believe the formula for the upper limit in the 32-bit case is:
> (len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX;
>These functions could benefit from a comment mentioning that
>they cannot reach the full extent of a bytea, because of the size limit
>on the bit number.
Because the 2nd argument is describing 'bit' location of every byte in bytea
string, so an int32 is not enough for that. I think the change is nothing wrong,
or I have not caught your means?
>These 2 tests need to allocate big chunks of contiguous memory, so they
>might fail for lack of memory on tiny machines, and even when not failing,
>they're pretty slow to run. Are they worth the trouble?
>These 2 tests are supposed to fail in existing releases because set_bit()
>and get_bit() don't take a bigint as the 2nd argument.
>Also, the same comment as above on how much they allocate.
I have deleted the four test cases because it is not worth the memory and time,
and no new test cases added because it needs time to generate lots of data.
New patch attached.
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Attachment
movead.li@highgo.ca wrote: > >I believe the formula for the upper limit in the 32-bit case is: > > (len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX; > > >These functions could benefit from a comment mentioning that > >they cannot reach the full extent of a bytea, because of the size limit > >on the bit number. > > Because the 2nd argument is describing 'bit' location of every byte in bytea > string, so an int32 is not enough for that. I think the change is nothing > wrong, > or I have not caught your means? In existing releases, the SQL definitions are set_bit(bytea,int4,int4) and get_bit(bytea,int4) and cannot be changed to not break the API. So the patch meant for existing releases has to deal with a too-narrow int32 bit number. Internally in the C functions, you may convert that number to int64 if you think it's necessary, but you may not use PG_GETARG_INT64 to pick a 32-bit argument. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
>In existing releases, the SQL definitions are set_bit(bytea,int4,int4)
>and get_bit(bytea,int4) and cannot be changed to not break the API.
>So the patch meant for existing releases has to deal with a too-narrow
>int32 bit number.
>Internally in the C functions, you may convert that number to int64
>if you think it's necessary, but you may not use PG_GETARG_INT64
>to pick a 32-bit argument.
The input parameter of 'set_bit()' function for 'byteaGetBit' has changed
to 'bytea int8 int4', but maybe another 'set_bit()' for 'bitgetbit' need not
changed. The same with 'get_bit()'.
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Hello hackers,
After several patch change by hacker's proposal, I think it's ready to
commit, can we commit it before doing the code freeze for pg-13?
Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
"movead.li@highgo.ca" <movead.li@highgo.ca> writes: > After several patch change by hacker's proposal, I think it's ready to > commit, can we commit it before doing the code freeze for pg-13? It would be easier to get this done if you had addressed any of the objections to the patch as given. Integer-overflow handling is still missing, and you still are assuming that it's okay to change catalog entries in released branches. regards, tom lane
I wrote: > It would be easier to get this done if you had addressed any of the > objections to the patch as given. Integer-overflow handling is still > missing, and you still are assuming that it's okay to change catalog > entries in released branches. Since we are hard upon the feature freeze deadline, I took it on myself to split this apart. As far as I can see, the only part we really want to back-patch is the adjustment of the range-limit comparisons in byteaGetBit and byteaSetBit to use int64 arithmetic, so they don't go wacko when the input bytea exceeds 256MB. The other changes are not live bugs because in current usage the estimated result size of an encoding or decoding transform couldn't exceed 4 times 1GB. Hence it won't overflow size_t even on 32-bit machines, thus the check in palloc() is sufficient to deal with overlength values. But it's worth making those changes going forward, I suppose, in case somebody wants to deal with longer strings someday. There were some other minor problems too, but I think I fixed everything. regards, tom lane