Thread: Efficient Boolean Storage
Hello,
I need to store many (e.g. 30) booleans and am wondering what is the most efficient way to store them. I can think of four options.
Option 1...Use ‘bool’ data type
No good since each value will require 1 byte rather than 1 bit.
Option 2...Use ‘bit string’ data type
Good since bit status is visible e.g. 0001000110
Questionable...How much storage used? Storage = 1 byte per bit? Or Storage = # of bits rounded to nearest byte?
Option 3...Use sets of ‘char’
Good since compact. Storage is known...8 booleans = 8 bits = 1 byte
Bad since less obvious to casual database viewer.
Option 4...Use individual ‘bit’ data types for each Boolean
Good since obvious to casual database viewer.
Questionable...Probably uses 1 byte per bit.
Thanks in advance for your help. I’m new to PostgreSQL. Yet, I’m architecting a database where storage space is at a premium.
- Chris
On Wednesday 04 Dec 2002 1:06 am, Chris White wrote: > Hello, > > I need to store many (e.g. 30) booleans and am wondering what is the > most efficient way to store them. I can think of four options. > > Option 1...Use 'bool' data type > No good since each value will require 1 byte rather than 1 bit. Depends what you're going to use them for. Are these 30 flags that should be grouped together? Will the users/app want all together or one at a time? Will they feature in WHERE clauses? -- Richard Huxton
Hmmm... You can store them in an int4 type (will take up to 31) To store your numbers: $num = (2^$num1) + (2^$num2) ... where $num1 and $num2 are your bit positions that you want to set to "1", To retrieve them... SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0) Here you are checking for the 1st bit position and the 3rd bit position. - Ericson Smith eric@did-it.com On Wed, 2002-12-04 at 10:11, Richard Huxton wrote: > On Wednesday 04 Dec 2002 1:06 am, Chris White wrote: > > Hello, > > > > I need to store many (e.g. 30) booleans and am wondering what is the > > most efficient way to store them. I can think of four options. > > > > Option 1...Use 'bool' data type > > No good since each value will require 1 byte rather than 1 bit. > > Depends what you're going to use them for. Are these 30 flags that should be > grouped together? Will the users/app want all together or one at a time? Will > they feature in WHERE clauses? > > -- > Richard Huxton > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Why worry if a boolean isn't a bit in database storage? When they are loaded to your CPU they will fill a register anyway (32 bits) On Wed, 2002-12-04 at 14:01, Ericson Smith wrote: > Hmmm... > > You can store them in an int4 type (will take up to 31) > > To store your numbers: > $num = (2^$num1) + (2^$num2) ... > where $num1 and $num2 are your bit positions that you want to set to > "1", > > To retrieve them... > SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0) > Here you are checking for the 1st bit position and the 3rd bit position. > > - Ericson Smith > eric@did-it.com > > On Wed, 2002-12-04 at 10:11, Richard Huxton wrote: > > On Wednesday 04 Dec 2002 1:06 am, Chris White wrote: > > > Hello, > > > > > > I need to store many (e.g. 30) booleans and am wondering what is the > > > most efficient way to store them. I can think of four options. > > > > > > Option 1...Use 'bool' data type > > > No good since each value will require 1 byte rather than 1 bit. > > > > Depends what you're going to use them for. Are these 30 flags that should be > > grouped together? Will the users/app want all together or one at a time? Will > > they feature in WHERE clauses? > > > > -- > > Richard Huxton > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Felipe Schnack Analista de Sistemas felipes@ritterdosreis.br Cel.: (51)91287530 Linux Counter #281893 Faculdade Ritter dos Reis www.ritterdosreis.br felipes@ritterdosreis.br Fone/Fax.: (51)32303328
Well, I've had the same dilemma in what data type to use for efficient boolean data storage. My application has to do the following: - store the state of max. 128 objects (this is more than the size of any of the numeric types of postgres for which bitwise operators are defined); - be able to use the state of any of these bits in a where clause. This could be achieved by applying an AND mask and compare to 0. - be able to have a variing nr. of bits. Using 128 separate bit fields is not just insane but unpractical, because these are not the only columns, there are plenty of others... also this would mean to fix the nr. of bits, which is not good, as most of the time the nr. of tracked objects will be below of the maximum of 128 possible. Using bit varying type could be a solution, but here the problem is that you can't apply bitwise operators to 2 operands with different bit length. This would cause problems in queries, as I would need to apply the same bitmask to rows with possibly different sizes of the bit field... in my case a missing bit should mean false. Furthermore, the bit variing type is decently documented, but the operators you can apply to them are not. Another problem is constructing a bit varying, which can be done only via strings - this is at least unpleasant, to be forced to transform a number to it's binary string representation to initialize the bit varying field with it... if there is a way to do it directly with a number, it's undocumented. Finally the solution was for me to convince the users they don't need more than 64 bits :D , and use bigint. But if in the future it turns out they need more, I will need better bit level facilities than Postgres currently has... BTW, is or is not using Postgres a string to store the bit data types ? I have seen a few people asking this on the list, but no straight answer as of yet... the docs really don't mention this aspect. This makes a big difference for bit fields with lots of bits. 128 bits is 16 bytes, a 128 character string is at least 128 bytes... on 1 million records this means about 100 MB difference in size, and also means slower data transfer. If the string is toasted, I presume it means less difference in storage, but still eats some more processing power and network bandwidth on data transfer (unless the client-backend communications are compressed... are they ?) I hope I convinced everybody that the bit data type representation DOES make a difference in those special cases where you have lots of bits... Thanks for your attention, Csaba. -----Ursprüngliche Nachricht----- Von: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Felipe Schnack Gesendet: Mittwoch, 4. Dezember 2002 17:01 An: pgsql-general Betreff: Re: [GENERAL] Efficient Boolean Storage Why worry if a boolean isn't a bit in database storage? When they are loaded to your CPU they will fill a register anyway (32 bits) On Wed, 2002-12-04 at 14:01, Ericson Smith wrote: > Hmmm... > > You can store them in an int4 type (will take up to 31) > > To store your numbers: > $num = (2^$num1) + (2^$num2) ... > where $num1 and $num2 are your bit positions that you want to set to > "1", > > To retrieve them... > SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0) > Here you are checking for the 1st bit position and the 3rd bit position. > > - Ericson Smith > eric@did-it.com > > On Wed, 2002-12-04 at 10:11, Richard Huxton wrote: > > On Wednesday 04 Dec 2002 1:06 am, Chris White wrote: > > > Hello, > > > > > > I need to store many (e.g. 30) booleans and am wondering what is the > > > most efficient way to store them. I can think of four options. > > > > > > Option 1...Use 'bool' data type > > > No good since each value will require 1 byte rather than 1 bit. > > > > Depends what you're going to use them for. Are these 30 flags that should be > > grouped together? Will the users/app want all together or one at a time? Will > > they feature in WHERE clauses? > > > > -- > > Richard Huxton > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Felipe Schnack Analista de Sistemas felipes@ritterdosreis.br Cel.: (51)91287530 Linux Counter #281893 Faculdade Ritter dos Reis www.ritterdosreis.br felipes@ritterdosreis.br Fone/Fax.: (51)32303328 ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Csaba Nagy wrote: > BTW, is or is not using Postgres a string to store the bit data types ? I > have seen a few people asking this on the list, but no straight answer as of > yet... the docs really don't mention this aspect. Use the source ;-) From varbit.c /*---------- * attypmod -- contains the length of the bit string in bits, or for * varying bits the maximum length. * * The data structure contains the following elements: * header -- length of the whole data structure (incl header) * in bytes. (as with all varying length datatypes) * data section -- private data section for the bits data structures * bitlength -- length of the bit string in bits * bitdata -- bit string, most significant byte first * * The length of the bitdata vector should always be exactly as many * bytes as are needed for the given bitlength. If the bitlength is * not a multiple of 8, the extra low-order padding bits of the last * byte must be zeroes. *---------- */ Joe
It is unintuitive to me that the low byte be padded. Shouldn't it be the high byte? JLL Joe Conway wrote: > > Csaba Nagy wrote: > > BTW, is or is not using Postgres a string to store the bit data types ? I > > have seen a few people asking this on the list, but no straight answer as of > > yet... the docs really don't mention this aspect. > > Use the source ;-) > > From varbit.c > > /*---------- > * attypmod -- contains the length of the bit string in bits, or for > * varying bits the maximum length. > * > * The data structure contains the following elements: > * header -- length of the whole data structure (incl header) > * in bytes. (as with all varying length datatypes) > * data section -- private data section for the bits data structures > * bitlength -- length of the bit string in bits > * bitdata -- bit string, most significant byte first > * > * The length of the bitdata vector should always be exactly as many > * bytes as are needed for the given bitlength. If the bitlength is > * not a multiple of 8, the extra low-order padding bits of the last > * byte must be zeroes. > *---------- > */ > > Joe > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
I think it is because of the MSB order. Would it be implemented as LSB, I'm sure the high byte would be the one to be padded :) G. -- while (!asleep()) sheep++; ---------------------------- cut here ------------------------------ ----- Original Message ----- From: "Jean-Luc Lachance" <jllachan@nsd.ca> Sent: Wednesday, December 04, 2002 7:39 PM > It is unintuitive to me that the low byte be padded. Shouldn't it be > the high byte? > > JLL > > Joe Conway wrote: > > * bitdata -- bit string, most significant byte first