Re: A bug when use get_bit() function for a long bytea string - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: A bug when use get_bit() function for a long bytea string |
Date | |
Msg-id | CAExHW5vUSi4-oPvgMdFWvzuyaTmMtbY=JgdZ+u3T7ZdzQmubig@mail.gmail.com Whole thread Raw |
In response to | A bug when use get_bit() function for a long bytea string ("movead.li@highgo.ca" <movead.li@highgo.ca>) |
List | pgsql-hackers |
Hi On Thu, Mar 12, 2020 at 9:21 AM movead.li@highgo.ca <movead.li@highgo.ca> wrote: > > Hello hackers, > > I found an issue about get_bit() and set_bit() function,here it is: > ################################ > postgres=# select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0); > 2020-03-12 10:05:23.296 CST [10549] ERROR: index 0 out of valid range, 0..-1 > 2020-03-12 10:05:23.296 CST [10549] STATEMENT: select get_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'),0); > ERROR: index 0 out of valid range, 0..-1 > postgres=# select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1); > 2020-03-12 10:05:27.959 CST [10549] ERROR: index 0 out of valid range, 0..-1 > 2020-03-12 10:05:27.959 CST [10549] STATEMENT: select set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'),0,1); > ERROR: index 0 out of valid range, 0..-1 > postgres=# > ################################ > PostgreSQL can handle bytea size nearby 1G, but now it reports an > error when 512M. And I research it and found it is byteaSetBit() and > byteaGetBit(), it uses an 'int32 len' to hold bit numbers for the long > bytea data, and obvious 512M * 8bit is an overflow for an int32. > So I fix it and test ok, as below. > ################################ Thanks for the bug report and the analysis. The analysis looks correct. > postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,1),0); get_bit ---------1 (1 row) postgres=# select get_bit(set_bit(pg_read_binary_file('/home/movead/temp/file_seek/f_512M'), 0,0),0);get_bit --------- 0 (1 row) postgres=# > ################################ > > > And I do a check about if anything else related bytea has this issue, several codes have the same issue: > 1. byteaout() When formatting bytea as an escape, the 'len' variable should be int64, or > it may use an overflowing number. 2. esc_enc_len() Same as above, the 'len' variable should be int64, and the return type > should change as int64. Due to esc_enc_len() has same call struct with pg_base64_enc_len() and hex_enc_len(), so I wantto change the return value of the two function. And the opposite function esc_dec_len() seem nothing wrong. 3. binary_encode()and binary_decode() Here use an 'int32 resultlen' to accept an 'unsigned int' function return, which seemunfortable. > I fix all mentioned above, and patch attachments. > How do you think about that? Why have you used size? Shouldn't we use int64? Also in the change @@ -3458,15 +3458,15 @@ byteaGetBit(PG_FUNCTION_ARGS) int32 n = PG_GETARG_INT32(1); int byteNo, bitNo; - int len; + Size len; If get_bit()/set_bit() accept the second argument as int32, it can not be used to set bits whose number does not fit 32 bits. I think we need to change the type of the second argument as well. Also, I think declaring len to be int is fine since 1G would fit an int, but what does not fit is len * 8, when performing that calculation, we have to widen the result. So, instead of changing the datatype of len, it might be better to perform the calculation as (int64)len * 8. If we use int64, we could also use INT64_FORMAT instead of using %ld. Since this is a bug it shouldn't wait another commitfest, but still add this patch to the commitfest so that it's not forgotten. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: