Thread: Bit strings
Hi Bruce, I think there has been some confusion about the bit/bit varying stuff I did. I initially sent in a bit type that was non-SQL compliant, with the suggestion that this may be useful to stick into contrib. Then several people remarked that it would be better to have an SQL compliant type, so I wrote the code to do all the manipulations, roughly based on src/backend/utils/adt/varchar.c This can evidently not be used as a standalone routine, but needs to be properly integrated into the database as a type. Unfortunately I don't really know an awful lot about the internals of postgres, and got a bit stuck trying to figure out how to integrate a new type correctly. Now I see that this second set of routines has ended up in the contrib directory, where they are quite useless. In my opinion it would be best to have no bit-type in the contrib directory and have the proper SQL compliant one properly integrated into postgres, but I'm going to need some hints to do that correctly. So my question is how we should proceed on this. Am I right in assuming that there should be no major objections to adding an SQL type to postgres, or should I go through another iteration on the mailing list? Thanks for any suggestions, Adriaan
Can we get the BIT type working now that 7.1 is branched? > Hi Bruce, > > I think there has been some confusion about the bit/bit varying > stuff I did. I initially > sent in a bit type that was non-SQL compliant, with the suggestion that > this may be > useful to stick into contrib. > > Then several people remarked that it would be better to have an SQL > compliant type, so > I wrote the code to do all the manipulations, roughly based on > > src/backend/utils/adt/varchar.c > > This can evidently not be used as a standalone routine, but needs to be > properly > integrated into the database as a type. Unfortunately I don't really > know an awful > lot about the internals of postgres, and got a bit stuck trying to > figure out how to > integrate a new type correctly. > > Now I see that this second set of routines has ended up in the contrib > directory, > where they are quite useless. In my opinion it would be best to have no > bit-type > in the contrib directory and have the proper SQL compliant one properly > integrated > into postgres, but I'm going to need some hints to do that correctly. > > So my question is how we should proceed on this. Am I right in assuming > that there > should be no major objections to adding an SQL type to postgres, or > should I go > through another iteration on the mailing list? > > Thanks for any suggestions, > > Adriaan > > > > > ************ > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Can we get the BIT type working now that 7.1 is branched? > Yep. We never quite finished the discussions before the 7.0 release. The outstanding issues were 1) should bit operations be allowed between bit strings of different sizes? They are at the moment, but the longest string is truncated, as you otherwise get illogical behaviour (imagine padding with zeros and doing an XOR). Truncation can be somewhat unlogical as well. My current feeling is that perhaps it would be better to just disallow bit operations on bitstrings that haven't got the same length completely. Makes the code easier anyway ;-) 2) A second point brought up by Peter is the conversion of bit-strings to integers and vice-versa. Postgres does not have unsigned ints, so it can lead to somewhat surprising behaviour. Implicit conversion is probably out of the question. So what type of conversion functions would people like to see, if any? 3) With TOAST bit strings could be used to store large amounts of binary data. Don mentioned MIME-encoded strings for I/O. Is this useful? I guess and advantage is that BIT is an SQL type, but I'm not sure that any such application would be particularly portable. It doesn't look like a lot of extra work though, (provided somebody has got some MIME en-/decoding code lying about) so we could add it. Am i right that with TOAST and no restrictions on the query-buffer lengths in psql one could then upload mime objects straight into postgres? So, what do people think? Adriaan
Adriaan Joubert writes: > My current feeling is that perhaps it would be better to just disallow > bit operations on bitstrings that haven't got the same length > completely. I feel with you. :) > 3) With TOAST bit strings could be used to store large amounts of binary > data. This is not really the proper application for bit strings. There are BLOBs for that. Which leads to the question, what are bit strings really useful for other than genuine bit masks? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Can someone comment on this? > Bruce Momjian wrote: > > > Can we get the BIT type working now that 7.1 is branched? > > > > Yep. We never quite finished the discussions before the 7.0 release. The > outstanding issues were > > 1) should bit operations be allowed between bit strings of different sizes? > They are at the moment, but the longest string is truncated, as you > otherwise get illogical behaviour (imagine padding with zeros and doing an > XOR). Truncation can be somewhat unlogical as well. My current feeling is > that perhaps it would be better to just disallow bit operations on > bitstrings that haven't got the same length completely. Makes the code > easier anyway ;-) > > 2) A second point brought up by Peter is the conversion of bit-strings to > integers and vice-versa. Postgres does not have unsigned ints, so it can > lead to somewhat surprising behaviour. Implicit conversion is probably out > of the question. So what type of conversion functions would people like to > see, if any? > > 3) With TOAST bit strings could be used to store large amounts of binary > data. Don mentioned MIME-encoded strings for I/O. Is this useful? I guess > and advantage is that BIT is an SQL type, but I'm not sure that any such > application would be particularly portable. It doesn't look like a lot of > extra work though, (provided somebody has got some MIME en-/decoding code > lying about) so we could add it. Am i right that with TOAST and no > restrictions on the query-buffer lengths in psql one could then upload mime > objects straight into postgres? > > So, what do people think? > > Adriaan > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>>>> Can we get the BIT type working now that 7.1 is branched? I did some work on the BIT types a couple months ago. According to my notes, the following issues are still outstanding before they can be said to work at all: Bit and hexstring literals are not handled in a reasonable fashion; the scanner converts them to integer constants which is bogus. Probably they need to be converted to some generic 'UNKNOWNBITSTRING' pseudo-type that can later be coerced to a specific bitstring type. I didn't touch this because it seems to open up the Pandora's box of unknown-constant handling, for which we do not have a good general solution. SQL92 sez we need a position() function for bitstrings. Need a regression test for bit types. scalarltsel() and friends need to cope with bit/varbit types in order to make good use of indexes on bitstrings. pg_dump does not handle BIT/VARBIT lengths properly (pjw may have fixed this by now). regards, tom lane
Any idea where we are on this? > >>>> Can we get the BIT type working now that 7.1 is branched? > > I did some work on the BIT types a couple months ago. According to > my notes, the following issues are still outstanding before they > can be said to work at all: > > Bit and hexstring literals are not handled in a reasonable fashion; > the scanner converts them to integer constants which is bogus. > Probably they need to be converted to some generic 'UNKNOWNBITSTRING' > pseudo-type that can later be coerced to a specific bitstring type. > I didn't touch this because it seems to open up the Pandora's box > of unknown-constant handling, for which we do not have a good > general solution. > > SQL92 sez we need a position() function for bitstrings. > > Need a regression test for bit types. > > scalarltsel() and friends need to cope with bit/varbit types in > order to make good use of indexes on bitstrings. > > pg_dump does not handle BIT/VARBIT lengths properly (pjw may have > fixed this by now). > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian writes: > Any idea where we are on this? > > Bit and hexstring literals are not handled in a reasonable fashion; Bit string literals are handled correctly, although it occurred to me that perhaps #define IsA_Value(t) \ (IsA(t, Integer) || IsA(t, Float) || IsA(t, String)) should be augmented with BitString. Can someone advise? Hex literals are still not handled correctly. > > SQL92 sez we need a position() function for bitstrings. We have one now. > > Need a regression test for bit types. We have one now. > > scalarltsel() and friends need to cope with bit/varbit types in > > order to make good use of indexes on bitstrings. Not done. > > pg_dump does not handle BIT/VARBIT lengths properly (pjw may have > > fixed this by now). Works fine for me. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > perhaps > #define IsA_Value(t) \ > (IsA(t, Integer) || IsA(t, Float) || IsA(t, String)) > should be augmented with BitString. Can someone advise? Yes, definitely --- see the sole use of that macro, in outfuncs.c. This is a bug. regards, tom lane
OK, I will delete my copy of the email. If there is something to be added to the TODO list, let me know. [ People are welcome to modify it themselves too.] > Bruce Momjian writes: > > > Any idea where we are on this? > > > > Bit and hexstring literals are not handled in a reasonable fashion; > > Bit string literals are handled correctly, although it occurred to me that > perhaps > > #define IsA_Value(t) \ > (IsA(t, Integer) || IsA(t, Float) || IsA(t, String)) > > should be augmented with BitString. Can someone advise? > > Hex literals are still not handled correctly. > > > > SQL92 sez we need a position() function for bitstrings. > > We have one now. > > > > Need a regression test for bit types. > > We have one now. > > > > scalarltsel() and friends need to cope with bit/varbit types in > > > order to make good use of indexes on bitstrings. > > Not done. > > > > pg_dump does not handle BIT/VARBIT lengths properly (pjw may have > > > fixed this by now). > > Works fine for me. > > -- > Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/ > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane writes: > Peter Eisentraut <peter_e@gmx.net> writes: > > perhaps > > #define IsA_Value(t) \ > > (IsA(t, Integer) || IsA(t, Float) || IsA(t, String)) > > should be augmented with BitString. Can someone advise? > > Yes, definitely --- see the sole use of that macro, in outfuncs.c. Maybe this macro could be removed and expanded in place. That would make things more transparent. Or is there any other potential use? -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: >> Yes, definitely --- see the sole use of that macro, in outfuncs.c. > Maybe this macro could be removed and expanded in place. That would make > things more transparent. Or is there any other potential use? No objection from me. The macro clearly failed to make things better in that usage, and offhand I think it would be just as likely to be a time bomb in any other usage. regards, tom lane