Thread: making a pg store of 'multiple checkboxes' efficient
I have a large table (5M items current) that is projected to grow at the rate of 2.5M a month looking at the current usage trends. the table represents some core standardized user account attributes , while text heavy / unstandardized info lies in other tables. my issue is this: i'm adding in a 'multiple checkboxes' style field, and trying to weigh the options for db representation against one another. my main concern is speed - this is read heavy , but I am worried to some degree about disk space (not sure where disk space fits in with pg, when I used to use mysql the simplest schema change could drastically effect the disk size though ). that said , these are my current choices: option a bitwise operations and/or operations to condense checkboxes into searchable field pro: super small fits in 1 table con: could not find any docs on the speed of bitwise searches in pg option b secondary table with bools create table extends( account_id , option_1_bool , option_2_bool ) pro: 1 join , fast search on bools con: PITA to maintain/extend option c mapping table create table mapping ( account_id , option_id ) pro: extensible con: slow speed - needs multiple joins , records all over I'd personally lean towards option a or b . anyone have suggestions ? thanks. // Jonathan Vanasco | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | SyndiClick.com | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | FindMeOn.com - The cure for Multiple Web Personality Disorder | Web Identity Management and 3D Social Networking | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | RoadSound.com - Tools For Bands, Stuff For Fans | Collaborative Online Management And Syndication Tools | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Jonathan Vanasco wrote: > I have a large table (5M items current) that is projected to grow at the > rate of 2.5M a month looking at the current usage trends. > > the table represents some core standardized user account attributes , > while text heavy / unstandardized info lies in other tables. > > my issue is this: i'm adding in a 'multiple checkboxes' style field, and > trying to weigh the options for db representation against one another. > > my main concern is speed - this is read heavy , but I am worried to some > degree about disk space (not sure where disk space fits in with pg, when > I used to use mysql the simplest schema change could drastically effect > the disk size though ). > > that said , these are my current choices: > > option a > bitwise operations > and/or operations to condense checkboxes into searchable field > pro: > super small > fits in 1 table > con: > could not find any docs on the speed of bitwise searches > in pg I got some good results using bitwise operations on an integer column. An index on such a column helps a great deal of course. What type of integer you need depends on how many booleans you (expect to) have. My operations were like "WHERE (value & 80) = 80" to match against the 7th and 5th bits, with value an int4 column. Query times are in the tens of milliseconds range. Admittedly I only got about that many records through joins with other tables I needed (meaning that my result set may at some point in time have been around as big as your data set), so the planner may have reduced the number of bitwise operations significantly for me. The actual number of bitwise values to compare was around 40,000 integers * 25 mask values, but that got joined with over 1M records from another result set. A bitwise operator on a single column should (theoretically) have less overhead than integer/boolean operators on multiple columns. Computers are good at bitwise operations, after all. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
On Apr 18, 2007, at 4:28 AM, Alban Hertroys wrote: > I got some good results using bitwise operations on an integer column. > An index on such a column helps a great deal of course. What type of > integer you need depends on how many booleans you (expect to) have. > > My operations were like "WHERE (value & 80) = 80" to match against the > 7th and 5th bits, with value an int4 column. Query times are in the > tens > of milliseconds range. > > Admittedly I only got about that many records through joins with other > tables I needed (meaning that my result set may at some point in time > have been around as big as your data set), so the planner may have > reduced the number of bitwise operations significantly for me. > > The actual number of bitwise values to compare was around 40,000 > integers * 25 mask values, but that got joined with over 1M records > from > another result set. > > A bitwise operator on a single column should (theoretically) have less > overhead than integer/boolean operators on multiple columns. Computers > are good at bitwise operations, after all. Computers are good at bitwise operations, but software often has scary implementations :) thanks for the input. I'll definitely go this route. It was my first thought, but there is almost no documentation out there for this type of storage. // Jonathan Vanasco | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | SyndiClick.com | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | FindMeOn.com - The cure for Multiple Web Personality Disorder | Web Identity Management and 3D Social Networking | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | RoadSound.com - Tools For Bands, Stuff For Fans | Collaborative Online Management And Syndication Tools | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -