Thread: bit strings - anyone working on them?
Having just used bit strings for the first time, I am now aware of some features I consider a little odd, and was wondering if (a) they are actively being worked on, (b) if they are not considered the way to go, and (c) if there would be any interest in the additions/changes indicated below. 1. Length enforcement: ---------------------- select B'10' | B'1'; is currently illegal. ISTM we should return B'11' for this. ie. build the smallest varbit that contains the result, and return it. 2. test_bit, set_bit, or get_bit -------------------------------- These are not implemented for the varbit data type 3. extract_bits --------------- extract_bits(varbit bits, int from, int to) returns varbit would seem useful. 4. Conversion routines ---------------------- Doesn't seem too unreasonable to support varbit<->numeric conversions. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> Having just used bit strings for the first time, I am now aware of some > features I consider a little odd, and was wondering if (a) they are > actively being worked on, (b) if they are not considered the way to go, and > (c) if there would be any interest in the additions/changes indicated below. > > 1. Length enforcement: > ---------------------- > > select B'10' | B'1'; > > is currently illegal. This one has slightly annoyed me more than once. That said, none of our apps use bitsets anymore... Chris
On Tue, 22 Apr 2003, Philip Warner wrote: > Having just used bit strings for the first time, I am now aware of some > features I consider a little odd, and was wondering if (a) they are > actively being worked on, (b) if they are not considered the way to go, and > (c) if there would be any interest in the additions/changes indicated below. > > 1. Length enforcement: > ---------------------- > > select B'10' | B'1'; > > is currently illegal. > > ISTM we should return B'11' for this. ie. build the smallest varbit that > contains the result, and return it. ISTM that the answer would be closer to B'10' according to the spec. The spec asks for right extension by 0 when casting to a larger size fixed bit string. > 3. extract_bits > --------------- > > extract_bits(varbit bits, int from, int to) returns varbit > > would seem useful. I think substring is the way to do this.
Philip Warner <pjw@rhyme.com.au> writes: > select B'10' | B'1'; > is currently illegal. > ISTM we should return B'11' for this. ie. build the smallest varbit that > contains the result, and return it. Why would that be the correct answer and not B'10' --- ie, which end of the shorter string should get padded? If anything I'd expect to see padding added on the right. > Doesn't seem too unreasonable to support varbit<->numeric conversions. Aren't those there already for conversions to/from int4 and int8? What else would you want? regards, tom lane
At 11:31 AM 22/04/2003 -0400, Tom Lane wrote: >If anything I'd expect to see >padding added on the right. Yep. Put it down to a typo/late night. Padding to right is the norm with varbit. > > Doesn't seem too unreasonable to support varbit<->numeric conversions. At least in 7.3.2 we can't cast int to varbit. Worse still (and maybe I am missing something): # select cast(8 as bit(10)); bit ------------ 0000000000 ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At least in 7.3.2 we can't cast int to varbit. Worse still (and maybe I am > missing something): > # select cast(8 as bit(10)); > bit > ------------ > 0000000000 Not sure we have a lot of choice there. The basic cast is to bit(32). We have regression=# select cast(8 as bit(32)); bit ----------------------------------00000000000000000000000000001000 (1 row) regression=# select cast(8 as bit(32)) :: bit(10); bit ------------0000000000 (1 row) If you make the direct cast do something different, it'll be inconsistent. What is perhaps needed is a bitstring length adjustment operation that allows padding or truncation to occur at the left instead of the right. The semantics of cast are already spoken for, but we could provide this as a function. As for int->varbit, you can get there easily enough: regression=# select cast(8 as bit(32)) :: varbit; varbit ----------------------------------00000000000000000000000000001000 (1 row) regards, tom lane
At 12:08 PM 22/04/2003 -0400, Tom Lane wrote: >regression=# select cast(8 as bit(32)); > bit >---------------------------------- > 00000000000000000000000000001000 >(1 row) That's also a little scary. "select cast(8 as bit(64))" is not what one would expect either, except if you view it as a cast to bit(32) followed by bit(64). It's also sad that substring (which is I think the only get_bit-like function) starts at the left. I would have expected the low order bits in this model to be at the left, so Cast(X as bit(N)) would work consistently. We still seem to be lacking > 64 bit numeric support (which is where I was coming from with the varbit<->numeric suggestion). Is there a view as to which bit should be considered low order? Or is there a better data type for arbitrary length bit masks? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> > ISTM we should return B'11' for this. ie. build the smallest varbit that > > contains the result, and return it. > > ISTM that the answer would be closer to B'10' according to the spec. The > spec asks for right extension by 0 when casting to a larger size fixed bit > string. Huh??? That makes no sense at all!!! Chris
> Yep. Put it down to a typo/late night. Padding to right is the norm with > varbit. Serious? How is that in anyway useful - it just means that you have to pad all your bit strings like you did before - it doesn't save anyone any work! If you are in first year CS and you have to add binary numbers: 10001 + 101 Do you right pad that 101? No way!!! You left pad it! Chris
At 09:15 AM 23/04/2003 +0800, Christopher Kings-Lynne wrote: >Do you right pad that 101? No way!!! You left pad it! See my earlier post in this thread questioning which was the low order bit. The current implementation is schizophrenic about this: it seems to treat the left-hand bits as low order sometimes, and the right-hand bits as low order when converting integer. It can't make up it's mind if it is a string or a number. If we treat left as low order, then: 1 | 10 => 10 Cast(8 as varbit(10)) => 0001000000 Cast(8 as varbit(32)) => 00010000000000000000000000000000 etc Which, on the whole, makes more sense. It's just hard to get around the notion of '10' being decimal 1, not 2. Also, I think substring is the only way to extract bits, and I would expect: Substring(Cast(8 as bit(X)), 4,1) should produce 1, which it does not. But if we adopt the left-bits-are-low philosophy, it works. Don't get me wrong, I don't mind if we adopt left-bits-are-high, then we just need to change the way substring & padding works. No idea which is better. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Wed, 23 Apr 2003, Christopher Kings-Lynne wrote: > > > ISTM we should return B'11' for this. ie. build the smallest varbit that > > > contains the result, and return it. > > > > ISTM that the answer would be closer to B'10' according to the spec. The > > spec asks for right extension by 0 when casting to a larger size fixed bit > > string. > > Huh??? That makes no sense at all!!! Yeah, as Philip says, I think the spec writers were assuming that left most bit is LSB. It makes more sense with the way the casting and substring are defined (but would mean that our int->bit casts are wierd then)
Philip Warner <pjw@rhyme.com.au> writes: > See my earlier post in this thread questioning which was the low order bit. That seems to be the crux of the matter. Given the assumption that padding or truncation occurs on the right of the bitstring, ISTM that you will only get sensible behavior for integer<->bitstring mappings if the integer's LSB maps to the leftmost bit of the bitstring. Yet it would seem a tad odd to make 8 map to B'0001'. I sure don't find that natural. Is there a way out of this? I'm not sure. I don't think that the SQL spec's bitstring definitions are really intended to allow sensible mappings to integer values --- so maybe there is no way to satisfy the spec and have natural integer mappings at the same time. regards, tom lane
At 12:14 AM 23/04/2003 -0400, Tom Lane wrote: >Is there a way out of this? I'm not sure. The problem will be avoiding breaking existing apps that have adapted to current functionality. I have no preference to endianness, but taking the current LSB-on-left approach, I would like to see: Cast(8 as bit(10)) => '0001000000' Cast(8 as bit(4)) => '0001' B'1' # Cast(1 as bit(32)) => '00000000000000000000000000000000' B'101' | B'1001' => '1011' as well as implement get_bit, set_bit, and numeric<=>varbit conversions. Can anyone see a way around breaking existing apps? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 12:14 AM 23/04/2003 -0400, Tom Lane wrote: >> Is there a way out of this? I'm not sure. > The problem will be avoiding breaking existing apps that have adapted to > current functionality. I have no preference to endianness, but taking the > current LSB-on-left approach, I would like to see: > Cast(8 as bit(10)) => '0001000000' > Cast(8 as bit(4)) => '0001' That part we could do, I think, if you are willing to abandon the assumption that the result of casting int to bit(n) and then to bit(m) should agree with the result of casting int directly to bit(m). While it seems a little weird, I doubt that many existing apps would actually be affected. Casting to bit(32) (or bit(64) for int8) is really the only case with useful behavior at the moment, so it's the only one anyone will be using. > B'1' # Cast(1 as bit(32)) => '00000000000000000000000000000000' Eh? What is the # operator supposed to be doing here? A more general point: while automatic padding of the narrower input is clearly sensible for varbit, I'm not convinced that it's per-spec for bit. You're effectively treating bit the same as varbit if you are willing to change the length at the drop of a hat. Perhaps these operators should only exist for varbit, or should demand inputs already the same length for bit. regards, tom lane
At 10:36 AM 23/04/2003 -0400, Tom Lane wrote: > > Cast(8 as bit(10)) => '0001000000' > > Cast(8 as bit(4)) => '0001' > >That part we could do, I think, if you are willing to abandon the >assumption that the result of casting int to bit(n) and then to bit(m) >should agree with the result of casting int directly to bit(m). I'm not sure I follow the implication; in fact I would hope that: Cast(B'0001' as bit(10)) would yield '0001000000'. > > B'1' # Cast(1 as bit(32)) => '00000000000000000000000000000000' > >Eh? What is the # operator supposed to be doing here? It's the XOR operator. Current implementation has: Cast(B'1' as bit(32)) XOR Cast(1 as bit(32)) giving 10000000000000000000000000000001 Another thing I have forgotten to mention is that there are no text<->varbit conversions. >Perhaps these >operators should only exist for varbit, or should demand inputs already >the same length for bit. That seems like a good idea. I'd prefer to make these changes for varbit only. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 10:36 AM 23/04/2003 -0400, Tom Lane wrote: >> assumption that the result of casting int to bit(n) and then to bit(m) >> should agree with the result of casting int directly to bit(m). > I'm not sure I follow the implication; in fact I would hope that: > Cast(B'0001' as bit(10)) > would yield '0001000000'. Yes, and what will Cast( Cast(42 as bit(32)) as bit(10)) yield? Unless you make the second cast truncate at the left, you'll get ten zero bits out of this. regards, tom lane
At 11:29 AM 23/04/2003 -0400, Tom Lane wrote: > Cast( Cast(42 as bit(32)) as bit(10)) >yield? Unless you make the second cast truncate at the left, you'll get >ten zero bits out of this. I'm still missing the point, I think. Cast(42 as bit(32)) => 01010100000000000000000000000000 Cast(B'01010100000000000000000000000000' as bit(10)) => '0101010000' ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 11:51 AM 23/04/2003 -0400, Tom Lane wrote: >> The only way to get the same answer both ways would be to legislate that >> int-to-bitstring conversion puts the integer's LSB at the left > That's what I think we need to do. Alternatively, we put the LSB on the > right, change the way substring works on varbit, and pad them on the left. > I suspect this will also break existing apps. I doubt that we can change the padding rule without violating spec. How about leaving the cast alone (okay, we can add the cast-directly-to- a-different-width cases), so as not to break existing apps, and instead add a separate function that maps an integer to an LSB-first bitstring? regards, tom lane
At 12:02 PM 23/04/2003 -0400, Tom Lane wrote: >and instead >add a separate function that maps an integer to an LSB-first bitstring? This would work, but it's not going to be very friendly without automatic cast support. Does the spec mention LSB and/or numeric conversions? Also, I just noticed that B'0' is not equal to B'00', which also seems unfortunate. Perhaps we could also add an LSB-last function, and output some kind of warning when using numeric casts to the effect that the application should use the LSB-last function because the casts will start using LSB-first in a future version. Alternatively, we could consider creating a new type that works as one would expect: - Automatic casting (sizes and numeric) - Automatic padding - numeric & text conversion - extract_bits using LSB as position1. and one of: - LSB on right - pad on left or - LSB on left - pad on right Another option: can we use some aspect of the type definition to specify the LSB position to store in attyypmod? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Tom Lane writes: > Is there a way out of this? I'm not sure. I don't think that the SQL > spec's bitstring definitions are really intended to allow sensible > mappings to integer values --- so maybe there is no way to satisfy the > spec and have natural integer mappings at the same time. They are "bit strings" after all, not binary numbers. If you need a binary number, use any numeric type and format the input and output accordingly. As for converting between bit strings and numbers, maybe we should make pairs of explicit functions that assume one bit order or another, and remove the casts. All things considered, it was probably a good idea that they removed bit strings in SQL200x. -- Peter Eisentraut peter_e@gmx.net
Philip Warner kirjutas K, 23.04.2003 kell 19:38: > At 12:02 PM 23/04/2003 -0400, Tom Lane wrote: > >and instead > >add a separate function that maps an integer to an LSB-first bitstring? > > This would work, but it's not going to be very friendly without automatic > cast support. Does the spec mention LSB and/or numeric conversions? Also, I > just noticed that B'0' is not equal to B'00', which also seems unfortunate. The same is true for real strings: '0' != '00' . ----------------- Hannu
At 07:55 PM 23/04/2003 +0200, Peter Eisentraut wrote: >All things considered, it was probably a good idea that they removed bit >strings in SQL200x. Did they replace them with any kind of open-ended bit vector? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/