Thread: Storing and querying boolean fields
Hi All,
First, some background:
- We are using PostgreSQL 7.3.4, and am locked into this version. I would upgrade if I could, but the decision is not mine.
- The table referred to below is 120+ million rows, and has a width of 27 columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text field, and 1 double precision). This table already has 3 multi-field indexes, as well as a primary key index.
I need to add many boolean flags to my database that need to be retrieved by primary key, as well as be searched upon with decent speed. By "many", I mean about 30 booleans total. The added bonus is that some of these values need to be searched in an OR manner, while others need to be searched as an AND. For example, I will need to do a search that looks something like: select * from table where A && B && C && (D || E) && F
Our first thought was to "pack" these booleans into integer fields to save space and help with searching (so we thought). The design calls for a smallint field and an integer field to be added to the table. UPDATE/INSERT would be handled by calculating the values of the two fields outside of the database. Searching would be accomplished by taking the bits that were requested, creating a bitmask corresponding to the 2 fields, and then doing something like: select * from table where (field & BITMASK) = BITMASK. We felt that putting this information in this table rather than a separate one would help performance by eliminating a JOIN, and it was assumed (ugh) that if we built an index on those two fields, that we would be able to use those indexes for searching in the aforementioned manner.
Unfortunately we have come up with the following problems with this approach:
1. Postgres will only use 1 index per table, so putting these fields in the same table as several other searchable fields actually hurts us more than helps.
2. We haven't been able to get any index scans in preliminary testing (all seq. scans).
After all of that background, my actual question is: what is the best way to effectively store and query this type of data? It seems more and more that our initial design is just flat-out wrong, but we are at a loss as to what the other options are.
Thanks very much in advance, and I apologize for the length of the message, but I felt background info would be important.
Regards,
Richard
"Richard Brown" <rbrown@openratings.com> writes: > - We are using PostgreSQL 7.3.4, and am locked into this version. I would > upgrade if I could, but the decision is not mine. They won't even let you update to 7.3.something-reasonably-current ? Resign. Go find a job with a boss whose IQ is above room temperature. You want to be out of there before the finger-pointing starts, which it certainly will when one of the many fixed-since-7.3.4 bugs eats your database. > I need to add many boolean flags to my database that need to be retrieved by > primary key, as well as be searched upon with decent speed. By "many", I > mean about 30 booleans total. The added bonus is that some of these values > need to be searched in an OR manner, while others need to be searched as an > AND. For example, I will need to do a search that looks something like: > select * from table where A && B && C && (D || E) && F As for the problem at hand, the contrib/intarray module would probably fill the bill nicely, though I dunno how far I'd trust either it or the GIST index code back in 7.3. You'd represent the flags as an array of the numbers of the flags that are ON, and search it with the @@ operator: int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)') This on-disk representation might be a bit larger than you were hoping for, but it'd not break if you suddenly found you needed 33 flags not 32... regards, tom lane
On Wed, 9 Jan 2008, Richard Brown wrote: > - We are using PostgreSQL 7.3.4, and am locked into this version. I would > upgrade if I could, but the decision is not mine. Just make sure you CYA so when said server eats itself the decision maker can't point the finger at you. Give them a copy of a dated written statement from you with text like "PostgreSQL 7.3.4 has several major bugs that are known to cause data corruption" and quote some of the choicer items in the trail of release notes at http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are in 7.3.7, 7.3.12, 7.3.19. The bonus to this approach, where you disclaim the whole platform as known to be unreliable, is that if you ever make a huge mistake you can shrug off the blame by saying the old database that you told them was buggy, but weren't allowed to upgrade, ate the data and it was totally not your fault. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wednesday 09 January 2008 20:09, Greg Smith wrote: > On Wed, 9 Jan 2008, Richard Brown wrote: > > - We are using PostgreSQL 7.3.4, and am locked into this version. I would > > upgrade if I could, but the decision is not mine. > > Just make sure you CYA so when said server eats itself the decision maker > can't point the finger at you. Give them a copy of a dated written > statement from you with text like "PostgreSQL 7.3.4 has several major bugs > that are known to cause data corruption" and quote some of the choicer > items in the trail of release notes at > http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are > in 7.3.7, 7.3.12, 7.3.19. > > The bonus to this approach, where you disclaim the whole platform as known > to be unreliable, is that if you ever make a huge mistake you can shrug > off the blame by saying the old database that you told them was buggy, but > weren't allowed to upgrade, ate the data and it was totally not your > fault. > It would probably also be nice to reference the last security announcement (http://www.postgresql.org/about/news.905) which includes this relevant message: "Minor release 7.3.21 for PostgreSQL version 7.3 will be the last update to the 7.3 branch. As version 7.3 is now over five years old, the community will no longer release patches for it after today's release. Users of version 7.3 are encouraged to upgrade to a more current version as soon as possible, or to seek support from a commercial support vendor who is willing to continue backpatching for them." -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL