Thread: BUG #5237: strange int->bit and bit->int conversions
The following bug has been logged online: Bug reference: 5237 Logged by: Roman Kononov Email address: kononov@ftml.net PostgreSQL version: 8.4.1 Operating system: GNU/Linux x86_64 Description: strange int->bit and bit->int conversions Details: test=# select (11::int4<<23 | 11::int4)::bit(32); 00000101100000000000000000001011 test=# select (11::int4<<23 | 11::int4)::bit(33); 000001011100000000000000000001011 test=# select (11::int4<<23 | 11::int4)::bit(39); 000001010000101100000000000000000001011 test=# select (11::int4<<23 | 11::int4)::bit(40); 0000000000000101100000000000000000001011 The ::bit(33) and ::bit(39) conversions seem wrong. test-std=# select 1::int4::bit(32)::int4; 1 test-std=# select 1::int4::bit(33)::int4; ERROR: integer out of range Why is it out of range?
The bitfromint8() and bitfromint4() are hosed. They produce wrong results when the BIT size is more than 64 and 32 respectively, and the BIT size is not multiple of 8, and the most significant byte of the integer value is not 0x00 or 0xff. For example: test=# select (11::int4<<23 | 11::int4)::bit(32); 00000101100000000000000000001011 test=# select (11::int4<<23 | 11::int4)::bit(33); 000001011100000000000000000001011 test=# select (11::int4<<23 | 11::int4)::bit(39); 000001010000101100000000000000000001011 test=# select (11::int4<<23 | 11::int4)::bit(40); 0000000000000101100000000000000000001011 The ::bit(33) and ::bit(39) conversions are wrong. The patch re-implements the conversion functions. diff -urp -x '*.o' -x '*.txt' -x '*.so' postgresql-8.4.1-orig/src/backend/utils/adt/varbit.c postgresql-8.4.1/src/backend/utils/adt/varbit.c --- postgresql-8.4.1-orig/src/backend/utils/adt/varbit.c 2009-12-12 09:19:13.000000000 -0600 +++ postgresql-8.4.1/src/backend/utils/adt/varbit.c 2009-12-12 10:29:59.000000000 -0600 @@ -1321,8 +1321,8 @@ bitfromint4(PG_FUNCTION_ARGS) VarBit *result; bits8 *r; int rlen; - int destbitsleft, - srcbitsleft; + int const srcbits=sizeof(a)*BITS_PER_BYTE; + int i; if (typmod <= 0) typmod = 1; /* default bit length */ @@ -1333,32 +1333,21 @@ bitfromint4(PG_FUNCTION_ARGS) VARBITLEN(result) = typmod; r = VARBITS(result); - destbitsleft = typmod; - srcbitsleft = 32; - /* drop any input bits that don't fit */ - srcbitsleft = Min(srcbitsleft, destbitsleft); - /* sign-fill any excess bytes in output */ - while (destbitsleft >= srcbitsleft + 8) - { - *r++ = (bits8) ((a < 0) ? BITMASK : 0); - destbitsleft -= 8; - } - /* store first fractional byte */ - if (destbitsleft > srcbitsleft) - { - *r++ = (bits8) ((a >> (srcbitsleft - 8)) & BITMASK); - destbitsleft -= 8; - } - /* Now srcbitsleft and destbitsleft are the same, need not track both */ - /* store whole bytes */ - while (destbitsleft >= 8) - { - *r++ = (bits8) ((a >> (destbitsleft - 8)) & BITMASK); - destbitsleft -= 8; + rlen=(typmod+BITS_PER_BYTE-1)/BITS_PER_BYTE; + + if (srcbits>=typmod) { + a<<=srcbits-typmod; + for (i=0; i!=rlen; ++i, a<<=BITS_PER_BYTE) r[i]=a>>(srcbits-BITS_PER_BYTE); + } else { + int sh=typmod%BITS_PER_BYTE; + int32 h=a>>sh; + int32 l=a<<(srcbits-sh); + size_t const zsize=rlen-sizeof(a)-(sh!=0); + bits8 sx=(a>=0)-1; + memset(r,sx,zsize); + for (i=0; i!=sizeof(a); ++i, h<<=8) r[zsize+i]=h>>(srcbits-BITS_PER_BYTE); + if (sh!=0) r[zsize+sizeof(a)]=l>>(srcbits-BITS_PER_BYTE); } - /* store last fractional byte */ - if (destbitsleft > 0) - *r = (bits8) ((a << (8 - destbitsleft)) & BITMASK); PG_RETURN_VARBIT_P(result); } @@ -1396,8 +1385,8 @@ bitfromint8(PG_FUNCTION_ARGS) VarBit *result; bits8 *r; int rlen; - int destbitsleft, - srcbitsleft; + int const srcbits=sizeof(a)*BITS_PER_BYTE; + int i; if (typmod <= 0) typmod = 1; /* default bit length */ @@ -1408,36 +1397,21 @@ bitfromint8(PG_FUNCTION_ARGS) VARBITLEN(result) = typmod; r = VARBITS(result); - destbitsleft = typmod; -#ifndef INT64_IS_BUSTED - srcbitsleft = 64; -#else - srcbitsleft = 32; /* don't try to shift more than 32 */ -#endif - /* drop any input bits that don't fit */ - srcbitsleft = Min(srcbitsleft, destbitsleft); - /* sign-fill any excess bytes in output */ - while (destbitsleft >= srcbitsleft + 8) - { - *r++ = (bits8) ((a < 0) ? BITMASK : 0); - destbitsleft -= 8; - } - /* store first fractional byte */ - if (destbitsleft > srcbitsleft) - { - *r++ = (bits8) ((a >> (srcbitsleft - 8)) & BITMASK); - destbitsleft -= 8; - } - /* Now srcbitsleft and destbitsleft are the same, need not track both */ - /* store whole bytes */ - while (destbitsleft >= 8) - { - *r++ = (bits8) ((a >> (destbitsleft - 8)) & BITMASK); - destbitsleft -= 8; + rlen=(typmod+BITS_PER_BYTE-1)/BITS_PER_BYTE; + + if (srcbits>=typmod) { + a<<=srcbits-typmod; + for (i=0; i!=rlen; ++i, a<<=BITS_PER_BYTE) r[i]=a>>(srcbits-BITS_PER_BYTE); + } else { + int sh=typmod%BITS_PER_BYTE; + int64 h=a>>sh; + int64 l=a<<(srcbits-sh); + size_t const zsize=rlen-sizeof(a)-(sh!=0); + bits8 sx=(a>=0)-1; + memset(r,sx,zsize); + for (i=0; i!=sizeof(a); ++i, h<<=8) r[zsize+i]=h>>(srcbits-BITS_PER_BYTE); + if (sh!=0) r[zsize+sizeof(a)]=l>>(srcbits-BITS_PER_BYTE); } - /* store last fractional byte */ - if (destbitsleft > 0) - *r = (bits8) ((a << (8 - destbitsleft)) & BITMASK); PG_RETURN_VARBIT_P(result); }
Roman Kononov <kononov@ftml.net> writes: > The bitfromint8() and bitfromint4() are hosed. They produce wrong > results when the BIT size is more than 64 and 32 respectively, and the > BIT size is not multiple of 8, and the most significant byte of the > integer value is not 0x00 or 0xff. Hm, you're right, it's definitely busted ... > The patch re-implements the conversion functions. ... but I don't much care for this patch. It's unreadable, uncommented, and doesn't even try to follow Postgres coding conventions. It looks to me like the actual bug is that the "first fractional byte" case ought to shift by destbitsleft - 8 not srcbitsleft - 8. A secondary problem (which your patch fails to fix) is that if the compiler chooses to implement signed >> as zero-fill rather than sign-bit-fill (which it is allowed to do per C99) then we'll get wrong, or at least not the desired, results. So I arrive at the attached patch. Unfortunately we've just missed the window for 8.4.2 etc, but I'll get this committed for the next updates. regards, tom lane Index: varbit.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/varbit.c,v retrieving revision 1.58 diff -c -r1.58 varbit.c *** varbit.c 1 Jan 2009 17:23:50 -0000 1.58 --- varbit.c 12 Dec 2009 18:53:43 -0000 *************** *** 1346,1352 **** /* store first fractional byte */ if (destbitsleft > srcbitsleft) { ! *r++ = (bits8) ((a >> (srcbitsleft - 8)) & BITMASK); destbitsleft -= 8; } /* Now srcbitsleft and destbitsleft are the same, need not track both */ --- 1346,1357 ---- /* store first fractional byte */ if (destbitsleft > srcbitsleft) { ! int val = (int) (a >> (destbitsleft - 8)); ! ! /* Force sign-fill in case the compiler implements >> as zero-fill */ ! if (a < 0) ! val |= (-1) << (srcbitsleft + 8 - destbitsleft); ! *r++ = (bits8) (val & BITMASK); destbitsleft -= 8; } /* Now srcbitsleft and destbitsleft are the same, need not track both */ *************** *** 1425,1431 **** /* store first fractional byte */ if (destbitsleft > srcbitsleft) { ! *r++ = (bits8) ((a >> (srcbitsleft - 8)) & BITMASK); destbitsleft -= 8; } /* Now srcbitsleft and destbitsleft are the same, need not track both */ --- 1430,1441 ---- /* store first fractional byte */ if (destbitsleft > srcbitsleft) { ! int val = (int) (a >> (destbitsleft - 8)); ! ! /* Force sign-fill in case the compiler implements >> as zero-fill */ ! if (a < 0) ! val |= (-1) << (srcbitsleft + 8 - destbitsleft); ! *r++ = (bits8) (val & BITMASK); destbitsleft -= 8; } /* Now srcbitsleft and destbitsleft are the same, need not track both */
Woops, forgot to copy the list. On Sat, Dec 12, 2009 at 2:15 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Dec 12, 2009 at 2:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Roman Kononov <kononov@ftml.net> writes: >>> The bitfromint8() and bitfromint4() are hosed. They produce wrong >>> results when the BIT size is more than 64 and 32 respectively, and the >>> BIT size is not multiple of 8, and the most significant byte of the >>> integer value is not 0x00 or 0xff. >> >> Hm, you're right, it's definitely busted ... >> >>> The patch re-implements the conversion functions. >> >> ... but I don't much care for this patch. =A0It's unreadable, uncommente= d, >> and doesn't even try to follow Postgres coding conventions. >> >> It looks to me like the actual bug is that the "first fractional byte" >> case ought to shift by destbitsleft - 8 not srcbitsleft - 8. =A0A >> secondary problem (which your patch fails to fix) is that if the >> compiler chooses to implement signed >> as zero-fill rather than >> sign-bit-fill (which it is allowed to do per C99) then we'll get >> wrong, or at least not the desired, results. =A0So I arrive at >> the attached patch. > > I'm not sure this fixes it, although I haven't tested. =A0When we take > the /* store first fractional byte */ branch, destbitsleft is between > 1 and 7 bits greater than srcbitsleft. =A0We then subtract 8 from > destbitsleft, and the comment on the next line now asserts that the > two are equal. =A0That doesn't seem right. > > Also, I thought about the sign extension problem, but aren't we > chopping those bits off anyway on the next line? > > ...Robert >
Robert Haas <robertmhaas@gmail.com> writes: >> I'm not sure this fixes it, although I haven't tested. When we take >> the /* store first fractional byte */ branch, destbitsleft is between >> 1 and 7 bits greater than srcbitsleft. We then subtract 8 from >> destbitsleft, and the comment on the next line now asserts that the >> two are equal. That doesn't seem right. The comment's a bit bogus. What it should say, probably, is that the *correct* value of srcbitsleft is now equal to destbitsleft so we aren't bothering to track it anymore. If we were being fully anal we'd have reduced srcbitsleft by some number less than 8 inside the if-branch. >> Also, I thought about the sign extension problem, but aren't we >> chopping those bits off anyway on the next line? Nope, we'd have already stored the wrong bits into the output byte. regards, tom lane