Thread: Efficient Boolean Storage

Efficient Boolean Storage

From
"Chris White"
Date:

Hello,

 

I need to store many (e.g. 30) booleans and am wondering what is the most efficient way to store them.  I can think of four options.

 

Option 1...Use ‘bool’ data type

No good since each value will require 1 byte rather than 1 bit.

 

Option 2...Use ‘bit string’ data type

Good since bit status is visible e.g. 0001000110

Questionable...How much storage used?  Storage = 1 byte per bit?  Or Storage = # of bits rounded to nearest byte?

 

Option 3...Use sets of ‘char’

Good since compact.  Storage is known...8 booleans = 8 bits = 1 byte

Bad since less obvious to casual database viewer.

 

Option 4...Use individual ‘bit’ data types for each Boolean

Good since obvious to casual database viewer.

Questionable...Probably uses 1 byte per bit.

 

Thanks in advance for your help.  I’m new to PostgreSQL.  Yet, I’m architecting a database where storage space is at a premium.

 

- Chris

 

Re: Efficient Boolean Storage

From
Richard Huxton
Date:
On Wednesday 04 Dec 2002 1:06 am, Chris White wrote:
> Hello,
>
> I need to store many (e.g. 30) booleans and am wondering what is the
> most efficient way to store them.  I can think of four options.
>
> Option 1...Use 'bool' data type
> No good since each value will require 1 byte rather than 1 bit.

Depends what you're going to use them for. Are these 30 flags that should be
grouped together? Will the users/app want all together or one at a time? Will
they feature in WHERE clauses?

--
  Richard Huxton

Re: Efficient Boolean Storage

From
Ericson Smith
Date:
Hmmm...

You can store them in an int4 type (will take up to 31)

To store your numbers:
$num = (2^$num1) + (2^$num2) ...
where $num1 and $num2 are your bit positions that you want to set to
"1",

To retrieve them...
SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0)
Here you are checking for the 1st bit position and the 3rd bit position.

- Ericson Smith
eric@did-it.com

On Wed, 2002-12-04 at 10:11, Richard Huxton wrote:
> On Wednesday 04 Dec 2002 1:06 am, Chris White wrote:
> > Hello,
> >
> > I need to store many (e.g. 30) booleans and am wondering what is the
> > most efficient way to store them.  I can think of four options.
> >
> > Option 1...Use 'bool' data type
> > No good since each value will require 1 byte rather than 1 bit.
>
> Depends what you're going to use them for. Are these 30 flags that should be
> grouped together? Will the users/app want all together or one at a time? Will
> they feature in WHERE clauses?
>
> --
>   Richard Huxton
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly



Re: Efficient Boolean Storage

From
Felipe Schnack
Date:
  Why worry if a boolean isn't a bit in database storage?
  When they are loaded to your CPU they will fill a register anyway (32
bits)

On Wed, 2002-12-04 at 14:01, Ericson Smith wrote:
> Hmmm...
>
> You can store them in an int4 type (will take up to 31)
>
> To store your numbers:
> $num = (2^$num1) + (2^$num2) ...
> where $num1 and $num2 are your bit positions that you want to set to
> "1",
>
> To retrieve them...
> SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0)
> Here you are checking for the 1st bit position and the 3rd bit position.
>
> - Ericson Smith
> eric@did-it.com
>
> On Wed, 2002-12-04 at 10:11, Richard Huxton wrote:
> > On Wednesday 04 Dec 2002 1:06 am, Chris White wrote:
> > > Hello,
> > >
> > > I need to store many (e.g. 30) booleans and am wondering what is the
> > > most efficient way to store them.  I can think of four options.
> > >
> > > Option 1...Use 'bool' data type
> > > No good since each value will require 1 byte rather than 1 bit.
> >
> > Depends what you're going to use them for. Are these 30 flags that should be
> > grouped together? Will the users/app want all together or one at a time? Will
> > they feature in WHERE clauses?
> >
> > --
> >   Richard Huxton
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
--

Felipe Schnack
Analista de Sistemas
felipes@ritterdosreis.br
Cel.: (51)91287530
Linux Counter #281893

Faculdade Ritter dos Reis
www.ritterdosreis.br
felipes@ritterdosreis.br
Fone/Fax.: (51)32303328


Re: Efficient Boolean Storage

From
Csaba Nagy
Date:
Well, I've had the same dilemma in what data type to use for efficient
boolean data storage.
My application has to do the following:
 - store the state of max. 128 objects (this is more than the size of any of
the numeric types of postgres for which bitwise operators are defined);
 - be able to use the state of any of these bits in a where clause. This
could be achieved by applying an AND mask and compare to 0.
 - be able to have a variing nr. of bits.
Using 128 separate bit fields is not just insane but unpractical, because
these are not the only columns, there are plenty of others... also this
would mean to fix the nr. of bits, which is not good, as most of the time
the nr. of tracked objects will be below of the maximum of 128 possible.
Using bit varying type could be a solution, but here the problem is that you
can't apply bitwise operators to 2 operands with different bit length. This
would cause problems in queries, as I would need to apply the same bitmask
to rows with possibly different sizes of the bit field... in my case a
missing bit should mean false.
Furthermore, the bit variing type is decently documented, but the operators
you can apply to them are not.
Another problem is constructing a bit varying, which can be done only via
strings - this is at least unpleasant, to be forced to transform a number to
it's binary string representation to initialize the bit varying field with
it... if there is a way to do it directly with a number, it's undocumented.
Finally the solution was for me to convince the users they don't need more
than 64 bits :D , and use bigint.
But if in the future it turns out they need more, I will need better bit
level facilities than Postgres currently has...

BTW, is or is not using Postgres a string to store the bit data types ? I
have seen a few people asking this on the list, but no straight answer as of
yet... the docs really don't mention this aspect.
This makes a big difference for bit fields with lots of bits. 128 bits is 16
bytes, a 128 character string is at least 128 bytes... on 1 million records
this means about 100 MB difference in size, and also means slower data
transfer. If the string is toasted, I presume it means less difference in
storage, but still eats some more processing power and network bandwidth on
data transfer (unless the client-backend communications are compressed...
are they ?)
I hope I convinced everybody that the bit data type representation DOES make
a difference in those special cases where you have lots of bits...

Thanks for your attention,
Csaba.

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Felipe Schnack
Gesendet: Mittwoch, 4. Dezember 2002 17:01
An: pgsql-general
Betreff: Re: [GENERAL] Efficient Boolean Storage


  Why worry if a boolean isn't a bit in database storage?
  When they are loaded to your CPU they will fill a register anyway (32
bits)

On Wed, 2002-12-04 at 14:01, Ericson Smith wrote:
> Hmmm...
>
> You can store them in an int4 type (will take up to 31)
>
> To store your numbers:
> $num = (2^$num1) + (2^$num2) ...
> where $num1 and $num2 are your bit positions that you want to set to
> "1",
>
> To retrieve them...
> SELECT * FROM table WHERE ((mycol & 2^1) != 0 OR (mycol & 2^3) != 0)
> Here you are checking for the 1st bit position and the 3rd bit position.
>
> - Ericson Smith
> eric@did-it.com
>
> On Wed, 2002-12-04 at 10:11, Richard Huxton wrote:
> > On Wednesday 04 Dec 2002 1:06 am, Chris White wrote:
> > > Hello,
> > >
> > > I need to store many (e.g. 30) booleans and am wondering what is the
> > > most efficient way to store them.  I can think of four options.
> > >
> > > Option 1...Use 'bool' data type
> > > No good since each value will require 1 byte rather than 1 bit.
> >
> > Depends what you're going to use them for. Are these 30 flags that
should be
> > grouped together? Will the users/app want all together or one at a time?
Will
> > they feature in WHERE clauses?
> >
> > --
> >   Richard Huxton
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
--

Felipe Schnack
Analista de Sistemas
felipes@ritterdosreis.br
Cel.: (51)91287530
Linux Counter #281893

Faculdade Ritter dos Reis
www.ritterdosreis.br
felipes@ritterdosreis.br
Fone/Fax.: (51)32303328


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Efficient Boolean Storage

From
Joe Conway
Date:
Csaba Nagy wrote:
> BTW, is or is not using Postgres a string to store the bit data types ? I
> have seen a few people asking this on the list, but no straight answer as of
> yet... the docs really don't mention this aspect.

Use the source ;-)

 From varbit.c

  /*----------
   *      attypmod -- contains the length of the bit string in bits, or for
   *                         varying bits the maximum length.
   *
   *      The data structure contains the following elements:
   *        header  -- length of the whole data structure (incl header)
   *                   in bytes. (as with all varying length datatypes)
   *        data section -- private data section for the bits data structures
   *              bitlength -- length of the bit string in bits
   *              bitdata   -- bit string, most significant byte first
   *
   *      The length of the bitdata vector should always be exactly as many
   *      bytes as are needed for the given bitlength.  If the bitlength is
   *      not a multiple of 8, the extra low-order padding bits of the last
   *      byte must be zeroes.
   *----------
   */

Joe


Re: Efficient Boolean Storage

From
Jean-Luc Lachance
Date:
It is unintuitive to me that the low byte be padded.  Shouldn't it be
the high byte?

JLL

Joe Conway wrote:
>
> Csaba Nagy wrote:
> > BTW, is or is not using Postgres a string to store the bit data types ? I
> > have seen a few people asking this on the list, but no straight answer as of
> > yet... the docs really don't mention this aspect.
>
> Use the source ;-)
>
>  From varbit.c
>
>   /*----------
>    *      attypmod -- contains the length of the bit string in bits, or for
>    *                         varying bits the maximum length.
>    *
>    *      The data structure contains the following elements:
>    *        header  -- length of the whole data structure (incl header)
>    *                   in bytes. (as with all varying length datatypes)
>    *        data section -- private data section for the bits data structures
>    *              bitlength -- length of the bit string in bits
>    *              bitdata   -- bit string, most significant byte first
>    *
>    *      The length of the bitdata vector should always be exactly as many
>    *      bytes as are needed for the given bitlength.  If the bitlength is
>    *      not a multiple of 8, the extra low-order padding bits of the last
>    *      byte must be zeroes.
>    *----------
>    */
>
> Joe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Efficient Boolean Storage

From
SZUCS Gábor
Date:
I think it is because of the MSB order. Would it be implemented as LSB, I'm
sure the high byte would be the one to be padded :)

G.
--
while (!asleep()) sheep++;

---------------------------- cut here ------------------------------
----- Original Message -----
From: "Jean-Luc Lachance" <jllachan@nsd.ca>
Sent: Wednesday, December 04, 2002 7:39 PM


> It is unintuitive to me that the low byte be padded.  Shouldn't it be
> the high byte?
>
> JLL
>
> Joe Conway wrote:
> >    *              bitdata   -- bit string, most significant byte first