Thread: Storing and querying boolean fields

Storing and querying boolean fields

From
"Brown, Richard"
Date:

Hi All,

First, some background:

- We are using PostgreSQL 7.3.4, and am locked into this version. I would upgrade if I could, but the decision is not mine.

- The table referred to below is 120+ million rows, and has a width of 27 columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text field, and 1 double precision). This table already has 3 multi-field indexes, as well as a primary key index.

I need to add many boolean flags to my database that need to be retrieved by primary key, as well as be searched upon with decent speed. By "many", I mean about 30 booleans total. The added bonus is that some of these values need to be searched in an OR manner, while others need to be searched as an AND. For example, I will need to do a search that looks something like:   select * from table where A && B && C && (D || E) && F

Our first thought was to "pack" these booleans into integer fields to save space and help with searching (so we thought). The design calls for a smallint field and an integer field to be added to the table. UPDATE/INSERT would be handled by calculating the values of the two fields outside of the database. Searching would be accomplished by taking the bits that were requested, creating a bitmask corresponding to the 2 fields, and then doing something like:  select * from table where (field & BITMASK) = BITMASK.  We felt that putting this information in this table rather than a separate one would help performance by eliminating a JOIN, and it was assumed (ugh) that if we built an index on those two fields, that we would be able to use those indexes for searching in the aforementioned manner.

Unfortunately we have come up with the following problems with this approach:
1. Postgres will only use 1 index per table, so putting these fields in the same table as several other searchable fields actually hurts us more than helps.

2. We haven't been able to get any index scans in preliminary testing (all seq. scans).

After all of that background, my actual question is:  what is the best way to effectively store and query this type of data? It seems more and more that our initial design is just flat-out wrong, but we are at a loss as to what the other options are.

Thanks very much in advance, and I apologize for the length of the message, but I felt background info would be important.

Regards,
Richard

Re: Storing and querying boolean fields

From
Andrew Sullivan
Date:
On Wed, Jan 09, 2008 at 06:19:40PM -0500, Brown, Richard wrote:
>
> - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> upgrade if I could, but the decision is not mine.

I mean this sincerely and not snidely: get another job.  7.3.20 was the last
release in the 7.3 series.  It's binary compatible with the 7.3.4 version
you have.  There is _no_ reason not to have the latest patch level.  If some
bonehead manager thinks you have to stick with the broken old software,
you're in deep trouble.  In addition, the project has stopped maintaining
7.3.  If you're still in testing, _now_ is the time to upgrade.

Also,

> I need to add many boolean flags to my database that need to be retrieved by
> primary key, as well as be searched upon with decent speed. By "many", I
> mean about 30 booleans total.

I have pretty grave doubts that what you have here is a good application for
a relational database.

> 1. Postgres will only use 1 index per table, so putting these fields in the
> same table as several other searchable fields actually hurts us more than
> helps.

Well, ancient versions of Postgres will only use 1 index per table, yes.
This is actually fixed in later releases.  So if you come asking for help,
you're going to get the answer, "Upgrade."  That's the right answer.

> 2. We haven't been able to get any index scans in preliminary testing (all
> seq. scans).

I'd bet the selectivity is going to be lousy anyway, given what you
described.

> After all of that background, my actual question is:  what is the best way
> to effectively store and query this type of data? It seems more and more
> that our initial design is just flat-out wrong, but we are at a loss as to
> what the other options are.

I suspect the design is indeed wrong.  I think you need to tell us what
you're trying to do rather than tell us what kind of data you want to store.
My guess is that your data _representation_ is wrong in the first place, and
if you re-think that, you'll find that there's a natural way to store it.
It's possible, though, that what you have is a kind of data that doesn't
lend itself to a relational model.  The RDBMS sometimes turns into the
hammer people have, so they start banging away; when what they need is a
different data system.

A


Re: Storing and querying boolean fields

From
Andrew Sullivan
Date:
On Thu, Jan 10, 2008 at 11:08:16AM -0500, Andrew Sullivan wrote:
> I mean this sincerely and not snidely: get another job.  7.3.20 was the last

Err, 7.3.21, I meant, of course.  Sorry.

A