Thread: bit strings - anyone working on them?

bit strings - anyone working on them?

From
Philip Warner
Date:
Having just used bit strings for the first time, I am now aware of some 
features I consider a little odd, and was wondering if (a) they are 
actively being worked on, (b) if they are not considered the way to go, and 
(c) if there would be any interest in the additions/changes indicated below.

1. Length enforcement:
----------------------
    select B'10' | B'1';
   is currently illegal.

ISTM we should return B'11' for this. ie. build the smallest varbit that 
contains the result, and return it.


2. test_bit, set_bit, or get_bit
--------------------------------

These are not implemented for the varbit data type


3. extract_bits
---------------
    extract_bits(varbit bits, int from, int to) returns varbit

would seem useful.


4. Conversion routines
----------------------

Doesn't seem too unreasonable to support varbit<->numeric conversions.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Christopher Kings-Lynne
Date:
> Having just used bit strings for the first time, I am now aware of some
> features I consider a little odd, and was wondering if (a) they are
> actively being worked on, (b) if they are not considered the way to go, and
> (c) if there would be any interest in the additions/changes indicated below.
>
> 1. Length enforcement:
> ----------------------
>
>      select B'10' | B'1';
>
>     is currently illegal.

This one has slightly annoyed me more than once. That said, none of our
apps use bitsets anymore...

Chris



Re: bit strings - anyone working on them?

From
Stephan Szabo
Date:
On Tue, 22 Apr 2003, Philip Warner wrote:

> Having just used bit strings for the first time, I am now aware of some
> features I consider a little odd, and was wondering if (a) they are
> actively being worked on, (b) if they are not considered the way to go, and
> (c) if there would be any interest in the additions/changes indicated below.
>
> 1. Length enforcement:
> ----------------------
>
>      select B'10' | B'1';
>
>     is currently illegal.
>
> ISTM we should return B'11' for this. ie. build the smallest varbit that
> contains the result, and return it.

ISTM that the answer would be closer to B'10' according to the spec.  The
spec asks for right extension by 0 when casting to a larger size fixed bit
string.

> 3. extract_bits
> ---------------
>
>      extract_bits(varbit bits, int from, int to) returns varbit
>
> would seem useful.

I think substring is the way to do this.



Re: bit strings - anyone working on them?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>      select B'10' | B'1';
>     is currently illegal.

> ISTM we should return B'11' for this. ie. build the smallest varbit that 
> contains the result, and return it.

Why would that be the correct answer and not B'10' --- ie, which end of
the shorter string should get padded?  If anything I'd expect to see
padding added on the right.

> Doesn't seem too unreasonable to support varbit<->numeric conversions.

Aren't those there already for conversions to/from int4 and int8?  What
else would you want?
        regards, tom lane



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 11:31 AM 22/04/2003 -0400, Tom Lane wrote:
>If anything I'd expect to see
>padding added on the right.

Yep. Put it down to a typo/late night. Padding to right is the norm with 
varbit.


> > Doesn't seem too unreasonable to support varbit<->numeric conversions.

At least in 7.3.2 we can't cast int to varbit. Worse still (and maybe I am 
missing something):

# select cast(8 as bit(10));    bit
------------ 0000000000





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At least in 7.3.2 we can't cast int to varbit. Worse still (and maybe I am 
> missing something):

> # select cast(8 as bit(10));
>      bit
> ------------
>   0000000000

Not sure we have a lot of choice there.  The basic cast is to bit(32).
We have

regression=# select cast(8 as bit(32));              bit
----------------------------------00000000000000000000000000001000
(1 row)

regression=# select cast(8 as bit(32)) :: bit(10);   bit
------------0000000000
(1 row)

If you make the direct cast do something different, it'll be
inconsistent.

What is perhaps needed is a bitstring length adjustment operation that
allows padding or truncation to occur at the left instead of the right.
The semantics of cast are already spoken for, but we could provide this
as a function.

As for int->varbit, you can get there easily enough:

regression=# select cast(8 as bit(32)) :: varbit;             varbit
----------------------------------00000000000000000000000000001000
(1 row)

        regards, tom lane



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 12:08 PM 22/04/2003 -0400, Tom Lane wrote:
>regression=# select cast(8 as bit(32));
>                bit
>----------------------------------
>  00000000000000000000000000001000
>(1 row)

That's also a little scary. "select cast(8 as bit(64))" is not what one 
would expect either, except if you view it as a cast to bit(32) followed by 
bit(64).

It's also sad that substring (which is I think the only get_bit-like 
function) starts at the left. I would have expected the low order bits in 
this model to be at the left, so Cast(X as bit(N)) would work consistently.

We still seem to be lacking > 64 bit numeric support (which is where I was 
coming from with the varbit<->numeric suggestion).

Is there a view as to which bit should be considered low order?

Or is there a better data type for arbitrary length bit masks?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
"Christopher Kings-Lynne"
Date:
> > ISTM we should return B'11' for this. ie. build the smallest varbit that
> > contains the result, and return it.
>
> ISTM that the answer would be closer to B'10' according to the spec.  The
> spec asks for right extension by 0 when casting to a larger size fixed bit
> string.

Huh??? That makes no sense at all!!!

Chris



Re: bit strings - anyone working on them?

From
"Christopher Kings-Lynne"
Date:
> Yep. Put it down to a typo/late night. Padding to right is the norm with
> varbit.

Serious?  How is that in anyway useful - it just means that you have to pad
all your bit strings like you did before - it doesn't save anyone any work!

If you are in first year CS and you have to add binary numbers:

10001
+ 101

Do you right pad that 101? No way!!! You left pad it!

Chris



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 09:15 AM 23/04/2003 +0800, Christopher Kings-Lynne wrote:
>Do you right pad that 101? No way!!! You left pad it!

See my earlier post in this thread questioning which was the low order bit. 
The current implementation is schizophrenic about this: it seems to treat 
the left-hand bits as low order sometimes, and the right-hand bits as low 
order when converting integer. It can't make up it's mind if it is a string 
or a number.

If we treat left as low order, then:

1 | 10 => 10

Cast(8 as varbit(10)) => 0001000000
Cast(8 as varbit(32)) => 00010000000000000000000000000000
etc

Which, on the whole, makes more sense. It's just hard to get around the 
notion  of '10' being decimal 1, not 2.

Also, I think substring is the only way to extract bits, and I would expect:
    Substring(Cast(8 as bit(X)), 4,1)

should produce 1, which it does not. But if we adopt the left-bits-are-low 
philosophy, it works.

Don't get me wrong, I don't mind if we adopt left-bits-are-high, then we 
just need to change the way substring & padding works. No idea which is better.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Stephan Szabo
Date:
On Wed, 23 Apr 2003, Christopher Kings-Lynne wrote:

> > > ISTM we should return B'11' for this. ie. build the smallest varbit that
> > > contains the result, and return it.
> >
> > ISTM that the answer would be closer to B'10' according to the spec.  The
> > spec asks for right extension by 0 when casting to a larger size fixed bit
> > string.
>
> Huh??? That makes no sense at all!!!

Yeah, as Philip says, I think the spec writers were assuming that left
most bit is LSB. It makes more sense with the way the casting and
substring are defined (but would mean that our int->bit casts are wierd
then)



Re: bit strings - anyone working on them?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> See my earlier post in this thread questioning which was the low order bit. 

That seems to be the crux of the matter.  Given the assumption that
padding or truncation occurs on the right of the bitstring, ISTM that
you will only get sensible behavior for integer<->bitstring mappings
if the integer's LSB maps to the leftmost bit of the bitstring.

Yet it would seem a tad odd to make 8 map to B'0001'.  I sure don't
find that natural.

Is there a way out of this?  I'm not sure.  I don't think that the SQL
spec's bitstring definitions are really intended to allow sensible
mappings to integer values --- so maybe there is no way to satisfy the
spec and have natural integer mappings at the same time.
        regards, tom lane



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 12:14 AM 23/04/2003 -0400, Tom Lane wrote:
>Is there a way out of this?  I'm not sure.

The problem will be avoiding breaking existing apps that have adapted to 
current functionality. I have no preference to endianness, but taking the 
current LSB-on-left approach, I would like to see:

Cast(8 as bit(10)) => '0001000000'
Cast(8 as bit(4)) => '0001'
B'1' # Cast(1 as bit(32)) => '00000000000000000000000000000000'
B'101' | B'1001' => '1011'

as well as implement get_bit, set_bit, and numeric<=>varbit conversions.

Can anyone see a way around breaking existing apps?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 12:14 AM 23/04/2003 -0400, Tom Lane wrote:
>> Is there a way out of this?  I'm not sure.

> The problem will be avoiding breaking existing apps that have adapted to 
> current functionality. I have no preference to endianness, but taking the 
> current LSB-on-left approach, I would like to see:

> Cast(8 as bit(10)) => '0001000000'
> Cast(8 as bit(4)) => '0001'

That part we could do, I think, if you are willing to abandon the
assumption that the result of casting int to bit(n) and then to bit(m)
should agree with the result of casting int directly to bit(m).  While
it seems a little weird, I doubt that many existing apps would actually
be affected.  Casting to bit(32) (or bit(64) for int8) is really the
only case with useful behavior at the moment, so it's the only one
anyone will be using.

> B'1' # Cast(1 as bit(32)) => '00000000000000000000000000000000'

Eh?  What is the # operator supposed to be doing here?

A more general point: while automatic padding of the narrower input
is clearly sensible for varbit, I'm not convinced that it's per-spec
for bit.  You're effectively treating bit the same as varbit if you are
willing to change the length at the drop of a hat.  Perhaps these
operators should only exist for varbit, or should demand inputs already
the same length for bit.
        regards, tom lane



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 10:36 AM 23/04/2003 -0400, Tom Lane wrote:

> > Cast(8 as bit(10)) => '0001000000'
> > Cast(8 as bit(4)) => '0001'
>
>That part we could do, I think, if you are willing to abandon the
>assumption that the result of casting int to bit(n) and then to bit(m)
>should agree with the result of casting int directly to bit(m).

I'm not sure I follow the implication; in fact I would hope that:
    Cast(B'0001' as bit(10))

would yield '0001000000'.



> > B'1' # Cast(1 as bit(32)) => '00000000000000000000000000000000'
>
>Eh?  What is the # operator supposed to be doing here?

It's the XOR operator.  Current implementation has:
    Cast(B'1' as bit(32)) XOR Cast(1 as bit(32))
giving    10000000000000000000000000000001

Another thing I have forgotten to mention is that there are no 
text<->varbit conversions.


>Perhaps these
>operators should only exist for varbit, or should demand inputs already
>the same length for bit.

That seems like a good idea. I'd prefer to make these changes for varbit only.





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 10:36 AM 23/04/2003 -0400, Tom Lane wrote:
>> assumption that the result of casting int to bit(n) and then to bit(m)
>> should agree with the result of casting int directly to bit(m).

> I'm not sure I follow the implication; in fact I would hope that:
>      Cast(B'0001' as bit(10))
> would yield '0001000000'.

Yes, and what will    Cast( Cast(42 as bit(32)) as bit(10))
yield?  Unless you make the second cast truncate at the left, you'll get
ten zero bits out of this.
        regards, tom lane



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 11:29 AM 23/04/2003 -0400, Tom Lane wrote:
>      Cast( Cast(42 as bit(32)) as bit(10))
>yield?  Unless you make the second cast truncate at the left, you'll get
>ten zero bits out of this.

I'm still missing the point, I think.

Cast(42 as bit(32)) => 01010100000000000000000000000000

Cast(B'01010100000000000000000000000000' as bit(10)) => '0101010000'





----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 11:51 AM 23/04/2003 -0400, Tom Lane wrote:
>> The only way to get the same answer both ways would be to legislate that
>> int-to-bitstring conversion puts the integer's LSB at the left

> That's what I think we need to do. Alternatively, we put the LSB on the 
> right, change the way substring works on varbit, and pad them on the left. 
> I suspect this will also break existing apps.

I doubt that we can change the padding rule without violating spec.

How about leaving the cast alone (okay, we can add the cast-directly-to-
a-different-width cases), so as not to break existing apps, and instead
add a separate function that maps an integer to an LSB-first bitstring?
        regards, tom lane



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 12:02 PM 23/04/2003 -0400, Tom Lane wrote:
>and instead
>add a separate function that maps an integer to an LSB-first bitstring?

This would work, but it's not going to be very friendly without automatic 
cast support. Does the spec mention LSB and/or numeric conversions? Also, I 
just noticed that B'0' is not equal to B'00', which also seems unfortunate.

Perhaps we could also add an LSB-last function, and output some kind of 
warning when using numeric casts to the effect that the application should 
use the LSB-last function because the casts will start using LSB-first in a 
future version.

Alternatively, we could consider creating a new type that works as one 
would expect:
 - Automatic casting (sizes and numeric) - Automatic padding - numeric & text conversion - extract_bits using LSB as
position1.
 

and one of: - LSB on right - pad on left
or - LSB on left - pad on right

Another option: can we use some aspect of the type definition to specify 
the LSB position to store in attyypmod?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: bit strings - anyone working on them?

From
Peter Eisentraut
Date:
Tom Lane writes:

> Is there a way out of this?  I'm not sure.  I don't think that the SQL
> spec's bitstring definitions are really intended to allow sensible
> mappings to integer values --- so maybe there is no way to satisfy the
> spec and have natural integer mappings at the same time.

They are "bit strings" after all, not binary numbers.  If you need a
binary number, use any numeric type and format the input and output
accordingly.

As for converting between bit strings and numbers, maybe we should make
pairs of explicit functions that assume one bit order or another, and
remove the casts.

All things considered, it was probably a good idea that they removed bit
strings in SQL200x.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: bit strings - anyone working on them?

From
Hannu Krosing
Date:
Philip Warner kirjutas K, 23.04.2003 kell 19:38:
> At 12:02 PM 23/04/2003 -0400, Tom Lane wrote:
> >and instead
> >add a separate function that maps an integer to an LSB-first bitstring?
> 
> This would work, but it's not going to be very friendly without automatic 
> cast support. Does the spec mention LSB and/or numeric conversions? Also, I 
> just noticed that B'0' is not equal to B'00', which also seems unfortunate.

The same is true for real strings: '0' != '00' .

-----------------
Hannu



Re: bit strings - anyone working on them?

From
Philip Warner
Date:
At 07:55 PM 23/04/2003 +0200, Peter Eisentraut wrote:
>All things considered, it was probably a good idea that they removed bit
>strings in SQL200x.

Did they replace them with any kind of open-ended bit vector?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/