Thread: Re: [GENERAL] Update of bitmask type
Can I get comments on this? Is a bit type something we want installed by default, or in contrib? Seems to me it should be in the main tree. > Hi, > > here is a new version of the bitmask type. It supports hash-indices as > well now, and fixes a bug in the definition of the <> operator. > > I would appreciate it if somebody more knowledgable than myself would > look over the index definitions. They seem to work and are used by > postgres, so I guess they can't be all wrong. The hashing function is > the same as that for char's and comes straight out of the postgres > source code. > > BTW, chapter 36 of the documentation could do with some additions, but I > don't feel knowledgable enough to attempt it. E.g. it shows how to put > an entry for the hashing into pg_amop, but never explains how to define > the entry in pg_amproc and doesn't tell you that you need to define a > separate hashing function. It took me a while of looking through the > other definitions and digging through the source code to come up with a > best guess. > > Perhaps this could go into the contrib area if it passes muster, as it > is an example of a user-defined type with indices. > > Cheers, > > Adriaan [application/x-gzip is not supported, skipping...] -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 21 Sep 1999, Bruce Momjian wrote: > Can I get comments on this? Is a bit type something we want installed > by default, or in contrib? Seems to me it should be in the main tree. first...what is a bitmask type? :) > > Hi, > > > > here is a new version of the bitmask type. It supports hash-indices as > > well now, and fixes a bug in the definition of the <> operator. > > > > I would appreciate it if somebody more knowledgable than myself would > > look over the index definitions. They seem to work and are used by > > postgres, so I guess they can't be all wrong. The hashing function is > > the same as that for char's and comes straight out of the postgres > > source code. > > > > BTW, chapter 36 of the documentation could do with some additions, but I > > don't feel knowledgable enough to attempt it. E.g. it shows how to put > > an entry for the hashing into pg_amop, but never explains how to define > > the entry in pg_amproc and doesn't tell you that you need to define a > > separate hashing function. It took me a while of looking through the > > other definitions and digging through the source code to come up with a > > best guess. > > > > Perhaps this could go into the contrib area if it passes muster, as it > > is an example of a user-defined type with indices. > > > > Cheers, > > > > Adriaan > > [application/x-gzip is not supported, skipping...] > > > -- > Bruce Momjian | http://www.op.net/~candle > maillist@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ************ > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker wrote: > > On Tue, 21 Sep 1999, Bruce Momjian wrote: > > > Can I get comments on this? Is a bit type something we want installed > > by default, or in contrib? Seems to me it should be in the main tree. > > first...what is a bitmask type? :) In this case just a single byte in which you can store states quite easily. It supports the C-style bit operations & (and) ,| (or, couldn't get it defined as a single bar though, because the parser didn't like it) ,^ (xor),! (not). For some applications it is just easier to check whether certain bits are set/not set. If somebody tells me what needs doing, I could try to get it all into a more usable format. And I have no clue what SQL3 says about bit-types (varying bits or something or other?) At the moment it is just a single byte, and perhaps it needs extension to 2 byte, 4-byte types. Adriaan
> > > Can I get comments on this? Is a bit type something we want installed > > > by default, or in contrib? Seems to me it should be in the main tree. > In this case just a single byte in which you can store states quite > easily. It supports the C-style bit operations & (and) ,| (or, couldn't > get it defined as a single bar though, because the parser didn't like > it) ,^ (xor),! (not). For some applications it is just easier to check > whether certain bits are set/not set. As long as it is limited to a single byte, perhaps it should prove itself in contrib. However, SQL92 has bit types, and it would be nice to get full support for them (and beyond, as this already is doing :) > If somebody tells me what needs doing, I could try to get it all into a > more usable format. And I have no clue what SQL3 says about bit-types > (varying bits or something or other?) At the moment it is just a single > byte, and perhaps it needs extension to 2 byte, 4-byte types. I don't have time right now to type up a short summary, but can do that later if you like. But the data entry for an SQL92 bit type looks like B'10111' X'17' The underlying data type is BIT(n), a fixed-length type where n is the exact number of bits. BIT VARYING (n) allows a variable number of bits (duh!) up to n bits. We can support these SQL92 constructs in the parser, folding them into an internal type as we do for character strings currently. It could be implemented just like the character types, having a header on the internal representation which holds the length. It can't re-use the character type support functions as-is, since they currently consider a zero byte in the string as end-of-string. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> > I don't have time right now to type up a short summary, but can do > that later if you like. But the data entry for an SQL92 bit type looks > like > > B'10111' > X'17' > > The underlying data type is BIT(n), a fixed-length type where n is the > exact number of bits. BIT VARYING (n) allows a variable number of bits > (duh!) up to n bits. We can support these SQL92 constructs in the > parser, folding them into an internal type as we do for character > strings currently. > > It could be implemented just like the character types, having a header > on the internal representation which holds the length. It can't re-use > the character type support functions as-is, since they currently > consider a zero byte in the string as end-of-string. OK, I'll have a go at this as I get a chance. If somebody has the SQL standard on line and could send me the appropriate sections I would appreciate it. As I know very little about the postgres internals I would also appreciate a short roadmap as to what needs to be done where, i.e. does the parser need to be changed, and where the files are /new files hsould go that I need to update. If this is somewhere in the docs please point me to it. What I've found upto now is backend/utils/adt/varlena.c backend/utils/adt/varchar.c which I will use as starting point? I found the file src/backend/lib/bit.c (Bruce's according to the log message). Has that got anything to do with bit arrays? Cheers, Adriaan
> OK, I'll have a go at this as I get a chance. If somebody has the SQL > standard on line and could send me the appropriate sections I would > appreciate it. I have a text version of the SQL92 draft standard. Let me know if you want the whole thing. > As I know very little about the postgres internals I would also > appreciate a short roadmap as to what needs to be done where, i.e. does > the parser need to be changed, and where the files are /new files hsould > go that I need to update. If this is somewhere in the docs please point > me to it. > What I've found upto now is > backend/utils/adt/varlena.c > backend/utils/adt/varchar.c > which I will use as starting point? That's probably the right place to look. I'll help with the parser issues; the first thing to do is to figure out the appropriate behavior and implement the underlying types. Then we can modify the parser (backend/parser/gram.y) to support SQL92->Postgres internal type syntax, just as is done for char and numeric types. > I found the file src/backend/lib/bit.c (Bruce's according to the log > message). Has that got anything to do with bit arrays? Yes it does, but not as a user-accessible type. btw, if you go by the cvs logs, Bruce owns *every* file in the tree since he does wholesale reformatting on files; in this case the code has been there since the beginning. Looks like it might be a good start at some underlying utilities for what you want though, and it is OK to reuse them. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas asked how I was going to implement bitstring comparisons. > > How do you handle the length and ordering issues? Is x'01' greater > than x'1' since it is longer? And if you have a 16-bit bit column, how > does it look internally if you assign x'01' rather than x'0001'? I had a look in my freshly down-loaded draft standard. On page 336 it says: 7) The comparison of two bit string values, X and Y, is determined by comparison of their bits with the same ordinal position. If Xi and Yi are the values ofthe i-th bits of X and Y, respectively, and if LX is the length in bits of X and LY is the length inbits of Y, then: a) X is equal to Y if and only if X = LY and Xi = Yi for all i. ? I presume this should be 'LX=LY' ?? Anyway, this means that b'01' <> b'0010'. b) X is less than Y if and only if: i) LX < LY and Xi = Yi for all i less than or equal to LX; or ii) Xi = Yi for all i < n and Xn = 0 and Yn = 1 for some n less than or equal to the minimum of LX and LY. b) seems to imply, rather bizarrely in my opinion, that B'001100' < B'10' as the second bit in B'10' is 1 and in B'001100' it is 0. Surely I must be reading this wrong? On the other hand, this would be a type of lexicographical ordering, so perhaps it is not so dumb. Comments? Adriaan
Adriaan Joubert wrote: > > b) seems to imply, rather bizarrely in my opinion, that > > B'001100' < B'10' > Maybe you start counting from the wrong end ? Just use them as you use char() 'AABBAA' < 'BA' Does it say something in the standard about direction, is it left-> right or right->left ? ------------ Hannu
Hannu Krosing wrote: > > Adriaan Joubert wrote: > > > > b) seems to imply, rather bizarrely in my opinion, that > > > > B'001100' < B'10' > > > Maybe you start counting from the wrong end ? > > Just use them as you use char() > > 'AABBAA' < 'BA' > > Does it say something in the standard about direction, > is it left-> right or right->left ? No, not that I could find. But in the above example B'001100' < B'10' whichever end you start counting from, as 1>0. I have no particularly strong opinion on which way round it should be done -- perhaps we should just try to be consistent with other databases? Could somebody who has access to Oracle or Sybase please do a few tests and let me know? A second problem I encountered last night is that the postgres variable length types only allow for the length of an array to be stored in bytes. This means that the number of bits will automatically always be rounded up to the nearest factor of 8, i.e. you want tp store 3 bits and you get 8. For ordering and output this is not always going to produce the correct output, as the bitstrings will get zero-padded. Is there anywhere else where one could store the exact length of a bit string? I haven't quite understood what the variable attypmod is. In varchar.c it looks as if it is the length of the record, but if it is just an integer identifier, then I could store the exact length in there. In that case I could handle the difference between 3 and 5 bit strings correctly. My main worry was that this might be used in other routines to determine the length of a record. Adriaan
Adriaan Joubert ha scritto: > Hannu Krosing wrote: > > > > Adriaan Joubert wrote: > > > > > > b) seems to imply, rather bizarrely in my opinion, that > > > > > > B'001100' < B'10' > > > > > Maybe you start counting from the wrong end ? > > > > Just use them as you use char() > > > > 'AABBAA' < 'BA' > > > > Does it say something in the standard about direction, > > is it left-> right or right->left ? > > No, not that I could find. But in the above example B'001100' < B'10' > whichever end you start counting from, as 1>0. I have no particularly > strong opinion on which way round it should be done -- perhaps we should > just try to be consistent with other databases? Could somebody who has > access to Oracle or Sybase please do a few tests and let me know? > Oracle doesn't have this data type neither Informix. I think it is hard to find this data type in any database. I found this feature in the OCELOT database You can download it from: http://ourworld.compuserve.com/homepages/OCELOTSQL/ As they say: "Ocelot makes the only Database Management System (DBMS) that supports the full ANSI / ISO SQL Standard (1992), and an always-growing checklist of SQL3 features (also known as SQL-99)." A second problem I encountered last night is that the postgres variable > length types only allow for the length of an array to be stored in > bytes. This means that the number of bits will automatically always be > rounded up to the nearest factor of 8, i.e. you want tp store 3 bits and > you get 8. For ordering and output this is not always going to produce > the correct output, as the bitstrings will get zero-padded. Is there > anywhere else where one could store the exact length of a bit string? > > I haven't quite understood what the variable attypmod is. In varchar.c > it looks as if it is the length of the record, but if it is just an > integer identifier, then I could store the exact length in there. In > that case I could handle the difference between 3 and 5 bit strings > correctly. My main worry was that this might be used in other routines > to determine the length of a record. > > Adriaan > > ************
> A second problem I encountered last night is that the postgres variable > length types only allow for the length of an array to be stored in > bytes. This means that the number of bits will automatically always be > rounded up to the nearest factor of 8, i.e. you want tp store 3 bits and > you get 8. For ordering and output this is not always going to produce > the correct output, as the bitstrings will get zero-padded. Is there > anywhere else where one could store the exact length of a bit string? attypmod has been modified recently to contain two fields (each of 16 bits) in a backward-compatible way. It can hold the size *and* precision of the numeric data types, and presumably should be used in a similar manner for your bit type. The problem is that you need another field which contains a length in bit units. Assuming that the second field in attypmod can't be used for this purpose, then istm that you will want to add a field to the data type itself. The character types have: length - total size of data, in bytes (4 bytes) data - body and you might have length - total size of data, in bytes (4 bytes) blen - total size of data, in bits (4 bytes) data - body - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Adriaan Joubert <a.joubert@albourne.com> writes: > A second problem I encountered last night is that the postgres variable > length types only allow for the length of an array to be stored in > bytes. This means that the number of bits will automatically always be > rounded up to the nearest factor of 8, i.e. you want tp store 3 bits and > you get 8. For ordering and output this is not always going to produce > the correct output, as the bitstrings will get zero-padded. Is there > anywhere else where one could store the exact length of a bit string? You will need to put it right in the string, I think. You could dedicate the first byte of the value of a bitstring (after the required varlena length word) to indicating how many bits in the last byte are wasted padding (0-7). That would leave a few spare bits in this header byte that might or might not have any good use. > I haven't quite understood what the variable attypmod is. In varchar.c > it looks as if it is the length of the record, but if it is just an > integer identifier, then I could store the exact length in there. In > that case I could handle the difference between 3 and 5 bit strings > correctly. My main worry was that this might be used in other routines > to determine the length of a record. atttypmod is a type-specific modifier: if you are developing a new data type then you can define it any way you darn please. However, it's not quite as useful as it first appears, because it is only stored in connection with a column of a table --- there is no atttypmod associated with the result of a function, for example. It is primarily useful if you want to be able to coerce values into a common subformat when they are stored into a column. For example, fixed-length char(n) types use atttypmod as the column width so that they can pad or truncate a supplied string to the right length just before storing. But a free-standing string value does not have an atttypmod, only a length. Similar remarks apply to NUMERIC, which uses atttypmod to store the desired precision for a column, but not to figure out the actual precision of a value in memory. In short, your datatype representation needs to be self-identifying without help from atttypmod. regards, tom lane
Thomas Lockhart wrote: > > attypmod has been modified recently to contain two fields (each of 16 > bits) in a backward-compatible way. It can hold the size *and* > precision of the numeric data types, and presumably should be used in > a similar manner for your bit type. > > The problem is that you need another field which contains a length in > bit units. Assuming that the second field in attypmod can't be used > for this purpose, then istm that you will want to add a field to the > data type itself. The character types have: > > length - total size of data, in bytes (4 bytes) > data - body > > and you might have > > length - total size of data, in bytes (4 bytes) > blen - total size of data, in bits (4 bytes) > data - body OK, I just saw th email from Tom Lane as well. So I will use attypmod as the length of the bit string in bits, and use an additional byte, as suggested here, for the actual length. Jose recommended looking at the Ocelot database and I got it down. Turns out they have a real big problem with the output of bit strings, but at least I could figure out how they do the ordering. Looks as if it is lexicographically from the least significant bit, i.e. B'1' > B'10' > B'1100' the only surprising thing was that they then have B'1000' > B'01000', and my reading of the SQL standard says that it should be the other way round. So I will just do the comparison from the least significant bit. Cheers, Adriaan
> attypmod has been modified recently to contain two fields (each of 16 > bits) in a backward-compatible way. It can hold the size *and* > precision of the numeric data types, and presumably should be used in > a similar manner for your bit type. You can use a union to split atttypmod up into two 8-bit fields and on 16-bit field. Let me know if you need help. > > The problem is that you need another field which contains a length in > bit units. Assuming that the second field in attypmod can't be used > for this purpose, then istm that you will want to add a field to the > data type itself. The character types have: > > length - total size of data, in bytes (4 bytes) > data - body > > and you might have > > length - total size of data, in bytes (4 bytes) > blen - total size of data, in bits (4 bytes) > data - body > > - Thomas > > -- > Thomas Lockhart lockhart@alumni.caltech.edu > South Pasadena, California > > ************ > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hi, OK, I've finally gotten round to coding most of the functions for bit strings. I've got I/O, concatenation, substring and all bit operations (and, or, xor, not, shift) working on the bitstring data structures. A few (probably pretty daft) questions (if there is documentation on this somewhere, please point me to it): 1. In the varchar file there are some functions which I believe are for the conversion of char(n) to char(m). They take as argument a pointer to a char() and a len which is the length of the total data structure. I haven't figured out how conversions are implemented within postgres, but I would need to transfer the equivalent of an atttypmod value, which would contain the length of the bit string to do the conversions. Does this fit in with the way the rest of the system works? char * zpbit (char * arg, int32 bitlen) 2. there is a function _bpchar, which has something to do with arrays, but I can't see how it fits in with everything else. 3. I need to write a hash function for bitstrings. I know nothing about hash functions, except that they are hard to do well. I looked at the function for text hashes and that is some weird code (i.e. it took me a while to figure out what it did). Does anybody have any suggestions off-hand for a decent hash function for bit strings? Could I just use the text hash function? (Seems to me text should be different as it usually draws from a more restricted range than a bit string). 4. Now that I've got the functionality, can somebody give me a rough roadmap to what I need to change to turn this into a proper postgres type? As far as I can see I need to assign oid's to it in pg_type.h and I'll have to have a look at the parser to get it to recognise the types. It would be a big help though if somebody could tell me what else needs to change. Thanks, Adriaan
> 4. Now that I've got the functionality, can somebody give me a rough > roadmap to what I need to change to turn this into a proper postgres > type? As far as I can see I need to assign oid's to it in pg_type.h and > I'll have to have a look at the parser to get it to recognise the types. > It would be a big help though if somebody could tell me what else needs > to change. I can integrate the type for you into the include/catalog files if everyone agrees they want it as a standard type and not an contrib type. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Adriaan Joubert <a.joubert@albourne.com> writes: > 1. In the varchar file there are some functions which I believe are for > the conversion of char(n) to char(m). They take as argument a pointer to > a char() and a len which is the length of the total data structure. I > haven't figured out how conversions are implemented within postgres, but > I would need to transfer the equivalent of an atttypmod value, which > would contain the length of the bit string to do the conversions. bpchar(), for example, is actually a user-callable SQL function; it takes a char(n) value and an atttypmod value and coerces the string to the right length for that atttypmod. Although there are no *direct* references to bpchar() anywhere except in pg_proc, the parser's SizeTargetExpr routine nonetheless generates calls to it as part of INSERT and UPDATE queries: /** SizeTargetExpr()** If the target column type possesses a function named for the type* and having parameter signature(columntype, int4), we assume that* the type requires coercion to its own length and that the said* function shouldbe invoked to do that.** Currently, "bpchar" (ie, char(N)) is the only such type, but try* to be more general thana hard-wired test...*/ So, if you want to implement a fixed-length BIT(N) type, the only real difference between that and an any-width bitstring is the existence of a coercion function matching SizeTargetExpr's criteria. BTW, the last line of that comment is in error --- "varchar" also has a function matching SizeTargetExpr's criteria. Its function behaves a little differently, since it only truncates and never pads, but the interface to the system is the same. > 2. there is a function _bpchar, which has something to do with arrays, > but I can't see how it fits in with everything else. Looks like it is the equivalent of bpchar() for arrays of char(N). > 3. I need to write a hash function for bitstrings. I know nothing about > hash functions, except that they are hard to do well. I looked at the > function for text hashes and that is some weird code (i.e. it took me a > while to figure out what it did). If you're looking at the type-specific hash functions in hashfunc.c, I think they are mostly junk. They could all be replaced by two functions, one for pass-by-val types and one for pass-by-ref types, a la the type-independent hashFunc() in nodeHash.c. The only situation where you really need a type-specific hasher is with datatypes that have garbage bits in them (such as padding between struct elements that might contain uninitialized bits). If you're careful to make sure that all unused bits are zeroes, so that logically equivalent values of your type will always have the same bit contents, then you should be able to just use hashtext(). Actually, unless you feel a compelling need to support hash indexes on your datatype, you don't need a hash routine at all. Certainly getting btree index support should be a higher-priority item. regards, tom lane
Sounds great to me... On Sat, 9 Oct 1999, Bruce Momjian wrote: > > 4. Now that I've got the functionality, can somebody give me a rough > > roadmap to what I need to change to turn this into a proper postgres > > type? As far as I can see I need to assign oid's to it in pg_type.h and > > I'll have to have a look at the parser to get it to recognise the types. > > It would be a big help though if somebody could tell me what else needs > > to change. > > I can integrate the type for you into the include/catalog files if > everyone agrees they want it as a standard type and not an contrib type. > > > -- > Bruce Momjian | http://www.op.net/~candle > maillist@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ************ > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Bruce Momjian wrote: > > > I can integrate the type for you into the include/catalog files if > everyone agrees they want it as a standard type and not an contrib type. Hi, Attached are the C-routines that implement a BIT and BIT VARYING type. I know Bruce said he would integrate them, but he is writing a book at the moment as well, so if somebody can explain to me how to go about integrating it, or would like to have a go, go ahead. If any functions are missing, let me know and I will add them. This should implement concatenation and substr as defined in the SQL standard, as well as comparison operators. I've also added all the normal bit operators. I developed the C routines outside the postgres source tree, only using postgres.h and copying bits from ctype.h. I hope it will be fairly easy to integrate. Any comments welcome. Adriaan
> Bruce Momjian wrote: > > > > > > I can integrate the type for you into the include/catalog files if > > everyone agrees they want it as a standard type and not an contrib type. > > Hi, > > Attached are the C-routines that implement a BIT and BIT VARYING type. > I know Bruce said he would integrate them, but he is writing a book at > the moment as well, so if somebody can explain to me how to go about > integrating it, or would like to have a go, go ahead. Applied. I am�embarrassed to say I had a copy from June still in my mailbox. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > Bruce Momjian wrote: > > > > > > > > > I can integrate the type for you into the include/catalog files if > > > everyone agrees they want it as a standard type and not an contrib type. > > > > Hi, > > > > Attached are the C-routines that implement a BIT and BIT VARYING type. > > I know Bruce said he would integrate them, but he is writing a book at > > the moment as well, so if somebody can explain to me how to go about > > integrating it, or would like to have a go, go ahead. > > Applied. I am embarrassed to say I had a copy from June still in my > mailbox. Don't be: they've been ready for a while, but I had to recheck them. When BIT and BIT VARYING are properly integrated, do I need to do something about regression tests? Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes: > When BIT and BIT VARYING are properly integrated, do I need to do > something about regression tests? Please do contribute a regression test for them. We always need more regression tests ... regards, tom lane
> > Applied. I am embarrassed to say I had a copy from June still in my > > mailbox. > > Don't be: they've been ready for a while, but I had to recheck them. > When BIT and BIT VARYING are properly integrated, do I need to do > something about regression tests? Yes, we will need them to be added to the regression tests. We can use your test/ directory as a source for that. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026