Thread: Boolean storage takes up 1 byte?

Boolean storage takes up 1 byte?

From
Thom Brown
Date:
Hi,

I've read the PostgreSQL documentation page on the boolean datatype (http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out what PostgreSQL's definition of a boolean is, as I believe it is distinctive from a bit(1) datatype (as you can't max() a boolean.. not sure what an efficient alternative to that is).  However, I see that a boolean takes up 1 byte of storage, which is 8 bits.  Is this due to the fact that the value can be null?  I'm not clear as to how a null field is stored, or is that the point... nothing references is so it is defined as null?  If that is the case, can't this be stored as 1 bit?  And does its storage as a byte affect indexing or query planning?

Thom

Re: Boolean storage takes up 1 byte?

From
Sam Mason
Date:
On Thu, Oct 01, 2009 at 11:37:40AM +0100, Thom Brown wrote:
> I've read the PostgreSQL documentation page on the boolean datatype (
> http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out
> what PostgreSQL's definition of a boolean is, as I believe it is distinctive
> from a bit(1) datatype

Yup, they're really for different things.  AND, OR and NOT are defined
for the BOOLEAN datatype and not for bit strings.

> (as you can't max() a boolean.. not sure what an
> efficient alternative to that is).

bool_or and bool_and are aggregates that work over boolean data types.

> However, I see that a boolean takes up 1
> byte of storage, which is 8 bits.  Is this due to the fact that the value
> can be null?

I believe it's more to do with the fact that if you add a boolean column
and then subsequently an int column then you're going to struggle to
"pack" them efficiently.  PG always puts columns on the "end" so that you
can add a column in constant time (i.e. no need to rewrite the table
in some common situations).  Once you start doing this then packing is
awkward and a single byte becomes much easier.  Whether the value is
NULL is stored elsewhere in the row.

Yes, this could be made more efficient; whether it's worth it is a
difficult question!

> And does its storage as a byte affect indexing or query planning?

Not sure which aspects you're referring to here, sorry.

--
  Sam  http://samason.me.uk/

Re: Boolean storage takes up 1 byte?

From
Thom Brown
Date:

2009/10/1 Sam Mason <sam@samason.me.uk>

bool_or and bool_and are aggregates that work over boolean data types.


Ah yes, that makes total sense!  I knew max wouldn't be logical in such as case, but couldn't think of the alternative.  Thanks!
 

I believe it's more to do with the fact that if you add a boolean column
and then subsequently an int column then you're going to struggle to
"pack" them efficiently.  PG always puts columns on the "end" so that you
can add a column in constant time (i.e. no need to rewrite the table
in some common situations).  Once you start doing this then packing is
awkward and a single byte becomes much easier.  Whether the value is
NULL is stored elsewhere in the row.


That's clear now.
 

> And does its storage as a byte affect indexing or query planning?

Not sure which aspects you're referring to here, sorry.

Giving my question more thought, I believe it's pointless.

You've answered my question.  Thanks Sam.

Re: Boolean storage takes up 1 byte?

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 12:03 +0100, Sam Mason wrote:

> > However, I see that a boolean takes up 1
> > byte of storage, which is 8 bits.  Is this due to the fact that the value
> > can be null?
>
> I believe it's more to do with the fact that if you add a boolean column
> and then subsequently an int column then you're going to struggle to
> "pack" them efficiently.  PG always puts columns on the "end" so that you
> can add a column in constant time (i.e. no need to rewrite the table
> in some common situations).  Once you start doing this then packing is
> awkward and a single byte becomes much easier.  Whether the value is
> NULL is stored elsewhere in the row.

It might be possible to make BOOLEAN NOT NULL use the null bit to
represent the actual data value and then have the column use no
additional bytes, except when we don't store the null bitmap at all.
Just needs people to make it happen cleanly, if that's possible.

Don't like booleans myself. They tend to end up as 3+ values eventually.

--
 Simon Riggs           www.2ndQuadrant.com


Re: Boolean storage takes up 1 byte?

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> It might be possible to make BOOLEAN NOT NULL use the null bit to
> represent the actual data value and then have the column use no
> additional bytes, except when we don't store the null bitmap at all.
> Just needs people to make it happen cleanly, if that's possible.

I really doubt that any scheme to pack booleans tighter would be a
net win.  It'd make the core tuple-assembly and -disassembly loops more
complicated, hence slower and harder to maintain.  Everybody would pay
that price whether or not they ever saved a byte from it.

It's worth noting also that you don't save anything from packing a bool
unless the *next* field in the row has a weak enough alignment
requirement that it can be moved over.  In a majority of cases this
would mean that you'd need at least five adjacent bool columns before
you have any shot at winning anything --- with four or less, a following
column with int alignment will stay right where it is.

If you do have lots and lots of bool columns, it might possibly be worth
the trouble to represent them as a combined BIT(n) column ...

            regards, tom lane