Thread: BUG #16403: set_bit function does not have expected effect
The following bug has been logged on the website: Bug reference: 16403 Logged by: Alex Movitz Email address: amovitz@bncpu.com PostgreSQL version: 11.0 Operating system: Linux Description: Input: SELECT set_bit('\x00000000'::bytea, 0, 1); Expected Output: '\x00000001' or '\x80000000' Actual Output: '\x01000000' Input: SELECT set_bit('\x00000000'::bytea, 8, 1); Expected Output: '\x00000100' or '\x00800000' Actual Output: '\x00010000' Issue: set_bit function changes the right-most bit of the byte, but with little-endian byte order. This is confusing to any use case where setting a bit in a BYTEA in a specific position. To iterate through the bits within the BYTEA, one must have nested loops which set bits within byte boundaries.
On Thu, Apr 30, 2020 at 9:47 AM PG Bug reporting form <noreply@postgresql.org> wrote: > Input: > SELECT set_bit('\x00000000'::bytea, 0, 1); > Expected Output: > '\x00000001' or '\x80000000' > Actual Output: > '\x01000000' > > Input: > SELECT set_bit('\x00000000'::bytea, 8, 1); > Expected Output: > '\x00000100' or '\x00800000' > Actual Output: > '\x00010000' > > Issue: > set_bit function changes the right-most bit of the byte, but with > little-endian byte order. This is confusing to any use case where setting a > bit in a BYTEA in a specific position. To iterate through the bits within > the BYTEA, one must have nested loops which set bits within byte boundaries. I think this is a display expectations issue, when using addresses it seems it is changing the least significant bit in the first byte and the second byte ( in C-speak it is setting bit n%8 byte n/8 ), the problem is the hex display of a byte list ( file, C array, bytea ) in customarily is done by printing the bytes left to right, and the single byte display displays most significant nibble first in hex. It would be even weirder if the bytes where printed in binary, but it's been done this way in many programs for ages, so I think your expectation is faulty here. AAMOF if it printed any of your expected outputs I, and I suspect many more people, would be unpleasantly surprised. Also, if you had done set_bit(0,2) it would print "02000000", which according to your description should be to the left of 01000000, how do you order 02000000 and 01000000 left or right ? The problem is right-left do not apply well here, the bits in registers/memory/disks are not aligned left/right, or top/down or front/back, the most you can do is talk about addresses, left-right only applies when you place a picture of the thing, or print it on a screen. If you print the "bits" one by one by using a get_bit loop you'll get your desired order ( in any way, depending on wheter you seep hi to lo or lo to hi offset and wheter your terminal prints left to right, right to left, or top to bottom, or... ). Francisco Olarte.
Francisco Olarte <folarte@peoplecall.com> writes: > On Thu, Apr 30, 2020 at 9:47 AM PG Bug reporting form > <noreply@postgresql.org> wrote: >> set_bit function changes the right-most bit of the byte, but with >> little-endian byte order. This is confusing to any use case where setting a >> bit in a BYTEA in a specific position. To iterate through the bits within >> the BYTEA, one must have nested loops which set bits within byte boundaries. > I think this is a display expectations issue, when using addresses it > seems it is changing the least significant bit in the first byte and > the second byte ( in C-speak it is setting bit n%8 byte n/8 ), Yeah, the documentation is quite clear about this: Functions get_byte and set_byte number the first byte of a binary string as byte 0. Functions get_bit and set_bit number bits from the right within each byte; for example bit 0 is the least significant bit of the first byte, and bit 15 is the most significant bit of the second byte. so we're not likely to change it. You might consider using type "bit" instead of bytea, since the bit display order is more closely tied to how set_bit() numbers the bits. Another option is to write your own functions that number the bits however you want. regards, tom lane
I see, so the get/set_bit functions act on bits within individual bytes, not as a bit stream. Some other languages will act directly on the bits, rather than iterating over bytes.
A good example of this is in pure C when setting a bit, it is easy to do programmatically with shifting. This is how I've performed these operations in the past, specifically when looping over bits.
Eg.
unsigned int x = 13;
unsigned int i = 0;
i |= 1 << x;
This will set the bit in the bytes relative to the right-most position. In this case, it would return an integer with a value of 32, having bytes with the hex representation 0x00002000 (when using printf, MSB).
Now that I understand the implementation reasoning, I also understand that this will probably not change. If there are some bit functions implemented with the BYTEA type similar to the C above, however, I would definitely expect it to perform the same way.
On Thu, Apr 30, 2020, 07:57 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Francisco Olarte <folarte@peoplecall.com> writes:
> On Thu, Apr 30, 2020 at 9:47 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> set_bit function changes the right-most bit of the byte, but with
>> little-endian byte order. This is confusing to any use case where setting a
>> bit in a BYTEA in a specific position. To iterate through the bits within
>> the BYTEA, one must have nested loops which set bits within byte boundaries.
> I think this is a display expectations issue, when using addresses it
> seems it is changing the least significant bit in the first byte and
> the second byte ( in C-speak it is setting bit n%8 byte n/8 ),
Yeah, the documentation is quite clear about this:
Functions get_byte and set_byte number the first byte of a binary
string as byte 0. Functions get_bit and set_bit number bits from the
right within each byte; for example bit 0 is the least significant bit
of the first byte, and bit 15 is the most significant bit of the
second byte.
so we're not likely to change it.
You might consider using type "bit" instead of bytea, since the bit
display order is more closely tied to how set_bit() numbers the bits.
Another option is to write your own functions that number the bits
however you want.
regards, tom lane
On Thu, Apr 30, 2020 at 6:58 PM Alex Movitz <amovitz@bncpu.com> wrote: > I see, so the get/set_bit functions act on bits within individual bytes, not as a bit stream. Some other languages willact directly on the bits, rather than iterating over bytes. I doubt it, there is not such thing as a bitstream in normal memory ( bubble or acoustic memories will be a different things ). You have some chip-implemetation-defined arrays which the cpu sees as arrays of bytes. > A good example of this is in pure C when setting a bit, it is easy to do programmatically with shifting. This is how I'veperformed these operations in the past, specifically when looping over bits. The example works exactly as C. The problem is when you do it in PURE C you do it with, typically, a long int. The problem here is you are doing it in a BYTE ARRAY. In C, typically, you would use something like an unsigned char array and do something like "void set_byte(unsigned char * bytea, int bit) { bytea[bit/8] |= (1<<(bit%8)) }" and, if you print a byte array in hex with "for (int i=0;i<nbytes;i) printf("%02x",(unsigned)bytea[i])" you would get that exact result. If you want to manipulate long integers, just use an integer type, INTEGER seems to be the right one for your case ( 32 bits ), and then set the bit using logical ops ( update t set f = f | (1 << n) ) and print it in hex ( select to_hex(f) ), which is the same as in C ( same for setting, printf("%08x",f) ). bytea is for array of bytes, works the same as any similar C package. It has the advantage of being variable length, as a C char array. What you are trying to do is get C-integer behaviour, use postgres integer which are similar. > unsigned int x = 13; > unsigned int i = 0; > i |= 1 << x; You should use unsigned long or uint32_t for 32 bits, int is only guaranteed to have 16 bits ( IIRC, short>=16, long>=32, short<=int<=long is the only guarantee you get in C ). > This will set the bit in the bytes relative to the right-most position. In this case, it would return an integer with avalue of 32, having bytes with the hex representation 0x00002000 (when using printf, MSB). This will work when using %08X for printf, but, as you are using bytea, what you are doing is the equivalent of unsigned char * bytea = (unsigned char *) (&i) printf "0x%02x%02x%02x%02x", bytea[0], bytea[1], bytea[2], bytea[3]) which, IIRC, will give you "0x00200000" in intel. > Now that I understand the implementation reasoning, I also understand that this will probably not change. If there aresome bit functions implemented with the BYTEA type similar to the C above, however, I would definitely expect it to performthe same way. Now I'm really convinced it is an expectation problem, not a display convention problem. You want a long and are using a bytea(4). This has the same problemas as using a char[4] as an int32_t in C. Want ints? Use them. And I suspect they will also be faster. And I suspect this may be due to you using bytea because it prints in hex by default. Because, in your C examples, how would you do the long and bit shifting stuff with a bytea(2345) equivalent? The unsigned char stuff will directly translate. I located int bit manipulation easily under "9.3. Mathematical Functions and Operators", but not finding hex formatting in "9.8. Data Type Formatting Functions" I had to do a quick grepping for hex in the index to locate it under "9.4. String Functions and Operators", and this is after having used postgres since before it got the ql tail. I may be too used to *printf for this conversions. That's why I suspect bytea was chosen for its hex display default. Francisco Olarte.