Thread: Boolean storage takes up 1 byte?
Hi,
I've read the PostgreSQL documentation page on the boolean datatype (http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out what PostgreSQL's definition of a boolean is, as I believe it is distinctive from a bit(1) datatype (as you can't max() a boolean.. not sure what an efficient alternative to that is). However, I see that a boolean takes up 1 byte of storage, which is 8 bits. Is this due to the fact that the value can be null? I'm not clear as to how a null field is stored, or is that the point... nothing references is so it is defined as null? If that is the case, can't this be stored as 1 bit? And does its storage as a byte affect indexing or query planning?
Thom
I've read the PostgreSQL documentation page on the boolean datatype (http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out what PostgreSQL's definition of a boolean is, as I believe it is distinctive from a bit(1) datatype (as you can't max() a boolean.. not sure what an efficient alternative to that is). However, I see that a boolean takes up 1 byte of storage, which is 8 bits. Is this due to the fact that the value can be null? I'm not clear as to how a null field is stored, or is that the point... nothing references is so it is defined as null? If that is the case, can't this be stored as 1 bit? And does its storage as a byte affect indexing or query planning?
Thom
On Thu, Oct 01, 2009 at 11:37:40AM +0100, Thom Brown wrote: > I've read the PostgreSQL documentation page on the boolean datatype ( > http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out > what PostgreSQL's definition of a boolean is, as I believe it is distinctive > from a bit(1) datatype Yup, they're really for different things. AND, OR and NOT are defined for the BOOLEAN datatype and not for bit strings. > (as you can't max() a boolean.. not sure what an > efficient alternative to that is). bool_or and bool_and are aggregates that work over boolean data types. > However, I see that a boolean takes up 1 > byte of storage, which is 8 bits. Is this due to the fact that the value > can be null? I believe it's more to do with the fact that if you add a boolean column and then subsequently an int column then you're going to struggle to "pack" them efficiently. PG always puts columns on the "end" so that you can add a column in constant time (i.e. no need to rewrite the table in some common situations). Once you start doing this then packing is awkward and a single byte becomes much easier. Whether the value is NULL is stored elsewhere in the row. Yes, this could be made more efficient; whether it's worth it is a difficult question! > And does its storage as a byte affect indexing or query planning? Not sure which aspects you're referring to here, sorry. -- Sam http://samason.me.uk/
2009/10/1 Sam Mason <sam@samason.me.uk>
Ah yes, that makes total sense! I knew max wouldn't be logical in such as case, but couldn't think of the alternative. Thanks!
That's clear now.
Giving my question more thought, I believe it's pointless.
bool_or and bool_and are aggregates that work over boolean data types.
Ah yes, that makes total sense! I knew max wouldn't be logical in such as case, but couldn't think of the alternative. Thanks!
I believe it's more to do with the fact that if you add a boolean column
and then subsequently an int column then you're going to struggle to
"pack" them efficiently. PG always puts columns on the "end" so that you
can add a column in constant time (i.e. no need to rewrite the table
in some common situations). Once you start doing this then packing is
awkward and a single byte becomes much easier. Whether the value is
NULL is stored elsewhere in the row.
That's clear now.
Not sure which aspects you're referring to here, sorry.
> And does its storage as a byte affect indexing or query planning?
You've answered my question. Thanks Sam.
On Thu, 2009-10-01 at 12:03 +0100, Sam Mason wrote: > > However, I see that a boolean takes up 1 > > byte of storage, which is 8 bits. Is this due to the fact that the value > > can be null? > > I believe it's more to do with the fact that if you add a boolean column > and then subsequently an int column then you're going to struggle to > "pack" them efficiently. PG always puts columns on the "end" so that you > can add a column in constant time (i.e. no need to rewrite the table > in some common situations). Once you start doing this then packing is > awkward and a single byte becomes much easier. Whether the value is > NULL is stored elsewhere in the row. It might be possible to make BOOLEAN NOT NULL use the null bit to represent the actual data value and then have the column use no additional bytes, except when we don't store the null bitmap at all. Just needs people to make it happen cleanly, if that's possible. Don't like booleans myself. They tend to end up as 3+ values eventually. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > It might be possible to make BOOLEAN NOT NULL use the null bit to > represent the actual data value and then have the column use no > additional bytes, except when we don't store the null bitmap at all. > Just needs people to make it happen cleanly, if that's possible. I really doubt that any scheme to pack booleans tighter would be a net win. It'd make the core tuple-assembly and -disassembly loops more complicated, hence slower and harder to maintain. Everybody would pay that price whether or not they ever saved a byte from it. It's worth noting also that you don't save anything from packing a bool unless the *next* field in the row has a weak enough alignment requirement that it can be moved over. In a majority of cases this would mean that you'd need at least five adjacent bool columns before you have any shot at winning anything --- with four or less, a following column with int alignment will stay right where it is. If you do have lots and lots of bool columns, it might possibly be worth the trouble to represent them as a combined BIT(n) column ... regards, tom lane