Thread: BUG #16403: set_bit function does not have expected effect

BUG #16403: set_bit function does not have expected effect

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16403
Logged by:          Alex Movitz
Email address:      amovitz@bncpu.com
PostgreSQL version: 11.0
Operating system:   Linux
Description:

Input:
SELECT set_bit('\x00000000'::bytea, 0, 1);
Expected Output:
'\x00000001' or '\x80000000'
Actual Output:
'\x01000000'

Input:
SELECT set_bit('\x00000000'::bytea, 8, 1);
Expected Output:
'\x00000100' or '\x00800000'
Actual Output:
'\x00010000'

Issue:
set_bit function changes the right-most bit of the byte, but with
little-endian byte order. This is confusing to any use case where setting a
bit in a BYTEA in a specific position. To iterate through the bits within
the BYTEA, one must have nested loops which set bits within byte boundaries.


Re: BUG #16403: set_bit function does not have expected effect

From
Francisco Olarte
Date:
On Thu, Apr 30, 2020 at 9:47 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> Input:
> SELECT set_bit('\x00000000'::bytea, 0, 1);
> Expected Output:
> '\x00000001' or '\x80000000'
> Actual Output:
> '\x01000000'
>
> Input:
> SELECT set_bit('\x00000000'::bytea, 8, 1);
> Expected Output:
> '\x00000100' or '\x00800000'
> Actual Output:
> '\x00010000'
>
> Issue:
> set_bit function changes the right-most bit of the byte, but with
> little-endian byte order. This is confusing to any use case where setting a
> bit in a BYTEA in a specific position. To iterate through the bits within
> the BYTEA, one must have nested loops which set bits within byte boundaries.

I think this is a display expectations issue, when using addresses it
seems it is changing the least significant bit in the first byte and
the second byte ( in C-speak it is setting bit n%8 byte n/8 ), the
problem is the hex display of a byte list ( file, C array, bytea ) in
customarily is done by printing the bytes left to right, and the
single byte display displays most significant nibble first in hex. It
would be even weirder if the bytes where printed in binary, but it's
been done this way in many programs for ages, so I think your
expectation is faulty here. AAMOF if it printed any of your expected
outputs I, and I suspect many more people, would be unpleasantly
surprised.

Also, if you had done set_bit(0,2) it would print "02000000", which
according to your description should be to the left of 01000000, how
do you order 02000000 and 01000000 left or right ? The problem is
right-left do not apply well here, the bits in registers/memory/disks
are not aligned left/right, or top/down or front/back, the most you
can do is talk about addresses, left-right only applies when you place
a picture of the thing, or print it on a screen. If you print the
"bits" one by one by using a get_bit loop you'll get your desired
order ( in any way, depending on wheter you seep hi to lo or lo to hi
offset and wheter your terminal prints left to right, right to left,
or top to bottom, or... ).

Francisco Olarte.



Re: BUG #16403: set_bit function does not have expected effect

From
Tom Lane
Date:
Francisco Olarte <folarte@peoplecall.com> writes:
> On Thu, Apr 30, 2020 at 9:47 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> set_bit function changes the right-most bit of the byte, but with
>> little-endian byte order. This is confusing to any use case where setting a
>> bit in a BYTEA in a specific position. To iterate through the bits within
>> the BYTEA, one must have nested loops which set bits within byte boundaries.

> I think this is a display expectations issue, when using addresses it
> seems it is changing the least significant bit in the first byte and
> the second byte ( in C-speak it is setting bit n%8 byte n/8 ),

Yeah, the documentation is quite clear about this:

    Functions get_byte and set_byte number the first byte of a binary
    string as byte 0. Functions get_bit and set_bit number bits from the
    right within each byte; for example bit 0 is the least significant bit
    of the first byte, and bit 15 is the most significant bit of the
    second byte.

so we're not likely to change it.

You might consider using type "bit" instead of bytea, since the bit
display order is more closely tied to how set_bit() numbers the bits.

Another option is to write your own functions that number the bits
however you want.

            regards, tom lane



Re: BUG #16403: set_bit function does not have expected effect

From
Alex Movitz
Date:
I see, so the get/set_bit functions act on bits within individual bytes, not as a bit stream. Some other languages will act directly on the bits, rather than iterating over bytes. 

A good example of this is in pure C when setting a bit, it is easy to do programmatically with shifting. This is how I've performed these operations in the past, specifically when looping over bits. 

Eg. 
unsigned int x = 13;
unsigned int i = 0;
i |= 1 << x;

This will set the bit in the bytes relative to the right-most position. In this case, it would return an integer with a value of 32, having bytes with the hex representation 0x00002000 (when using printf, MSB). 


Now that I understand the implementation reasoning, I also understand that this will probably not change. If there are some bit functions implemented with the BYTEA type similar to the C above, however, I would definitely expect it to perform the same way. 

On Thu, Apr 30, 2020, 07:57 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Francisco Olarte <folarte@peoplecall.com> writes:
> On Thu, Apr 30, 2020 at 9:47 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> set_bit function changes the right-most bit of the byte, but with
>> little-endian byte order. This is confusing to any use case where setting a
>> bit in a BYTEA in a specific position. To iterate through the bits within
>> the BYTEA, one must have nested loops which set bits within byte boundaries.

> I think this is a display expectations issue, when using addresses it
> seems it is changing the least significant bit in the first byte and
> the second byte ( in C-speak it is setting bit n%8 byte n/8 ),

Yeah, the documentation is quite clear about this:

    Functions get_byte and set_byte number the first byte of a binary
    string as byte 0. Functions get_bit and set_bit number bits from the
    right within each byte; for example bit 0 is the least significant bit
    of the first byte, and bit 15 is the most significant bit of the
    second byte.

so we're not likely to change it.

You might consider using type "bit" instead of bytea, since the bit
display order is more closely tied to how set_bit() numbers the bits.

Another option is to write your own functions that number the bits
however you want.

                        regards, tom lane

Re: BUG #16403: set_bit function does not have expected effect

From
Francisco Olarte
Date:
On Thu, Apr 30, 2020 at 6:58 PM Alex Movitz <amovitz@bncpu.com> wrote:
> I see, so the get/set_bit functions act on bits within individual bytes, not as a bit stream. Some other languages
willact directly on the bits, rather than iterating over bytes. 

I doubt it, there is not such thing as a bitstream in normal memory  (
bubble or acoustic memories will be a different things ). You have
some chip-implemetation-defined arrays which the cpu sees as arrays of
bytes.

> A good example of this is in pure C when setting a bit, it is easy to do programmatically with shifting. This is how
I'veperformed these operations in the past, specifically when looping over bits. 

The example works exactly as C. The problem is when you do it in PURE
C you do it with, typically, a long int.

The problem here is you are doing it in a BYTE ARRAY. In C, typically,
you would use something like an unsigned char array and do something
like "void set_byte(unsigned char * bytea, int bit) { bytea[bit/8] |=
(1<<(bit%8)) }"
and, if you print a byte array in hex with "for (int i=0;i<nbytes;i)
printf("%02x",(unsigned)bytea[i])" you would get that exact result.

If you want to manipulate long integers, just use an integer type,
INTEGER seems to be the right one for your case ( 32 bits ), and then
set the bit using logical ops ( update t set f = f | (1 << n) ) and
print it in hex ( select to_hex(f) ), which is the same as in C ( same
for setting,  printf("%08x",f) ).

bytea is for array of bytes, works the same as any similar C package.
It has the advantage of being variable length, as a C char array. What
you are trying to do is get C-integer behaviour, use postgres integer
which are similar.

> unsigned int x = 13;
> unsigned int i = 0;
> i |= 1 << x;

You should use unsigned long or uint32_t for 32 bits, int is only
guaranteed to have 16 bits ( IIRC, short>=16, long>=32,
short<=int<=long is the only guarantee you get in C ).

> This will set the bit in the bytes relative to the right-most position. In this case, it would return an integer with
avalue of 32, having bytes with the hex representation 0x00002000 (when using printf, MSB). 

This will work when using %08X for printf, but, as you are using
bytea,  what you are doing is the equivalent of
unsigned char * bytea = (unsigned char *) (&i)
printf "0x%02x%02x%02x%02x", bytea[0], bytea[1], bytea[2], bytea[3])
which, IIRC, will give you "0x00200000" in intel.

> Now that I understand the implementation reasoning, I also understand that this will probably not change. If there
aresome bit functions implemented with the BYTEA type similar to the C above, however, I would definitely expect it to
performthe same way. 

Now I'm really convinced it is an expectation problem, not a display
convention problem. You want a long and are using a bytea(4). This has
the same problemas as using a char[4] as an int32_t in C. Want ints?
Use them. And I suspect they will also be faster.

And I suspect this may be due to you using bytea because it prints in
hex by default. Because, in your C examples, how would you do the long
and bit shifting stuff with a bytea(2345)  equivalent? The unsigned
char stuff will directly translate.

I located int bit manipulation easily under "9.3. Mathematical
Functions and Operators", but not finding hex formatting in "9.8. Data
Type Formatting Functions" I had to do a quick grepping for hex in the
index to locate it under "9.4. String Functions and Operators", and
this is after having used postgres since before it got the ql tail. I
may be too used to *printf for this conversions.  That's why I suspect
bytea was chosen for its hex display default.

Francisco Olarte.