Thread: making a pg store of 'multiple checkboxes' efficient

making a pg store of 'multiple checkboxes' efficient

From
Jonathan Vanasco
Date:
I have a large table (5M items current) that is projected to grow at
the rate of 2.5M a month looking at the current usage trends.

the table represents some core standardized user account attributes ,
while text heavy / unstandardized info lies in other tables.

my issue is this: i'm adding in a 'multiple checkboxes' style field,
and trying to weigh the options for db representation against one
another.

my main concern is speed - this is read heavy , but I am worried to
some degree about disk space (not sure where disk space fits in with
pg, when I used to use mysql the simplest schema change could
drastically effect the disk size though ).

that said , these are my current choices:

    option a
          bitwise operations
            and/or operations to condense checkboxes into searchable field
            pro:
                super small
                fits in 1 table
            con:
                could not find any docs on the speed of bitwise searches in pg

    option b
        secondary table with bools
            create table extends( account_id , option_1_bool , option_2_bool )
            pro:
                1 join , fast search on bools
            con:
                PITA to maintain/extend

    option c
        mapping table
            create table mapping ( account_id , option_id )
            pro:
                extensible
            con:
                slow speed - needs multiple joins , records all over


I'd personally lean towards option a or b .  anyone have suggestions ?

thanks.


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -



Re: making a pg store of 'multiple checkboxes' efficient

From
Alban Hertroys
Date:
Jonathan Vanasco wrote:
> I have a large table (5M items current) that is projected to grow at the
> rate of 2.5M a month looking at the current usage trends.
>
> the table represents some core standardized user account attributes ,
> while text heavy / unstandardized info lies in other tables.
>
> my issue is this: i'm adding in a 'multiple checkboxes' style field, and
> trying to weigh the options for db representation against one another.
>
> my main concern is speed - this is read heavy , but I am worried to some
> degree about disk space (not sure where disk space fits in with pg, when
> I used to use mysql the simplest schema change could drastically effect
> the disk size though ).
>
> that said , these are my current choices:
>
>     option a
>          bitwise operations
>             and/or operations to condense checkboxes into searchable field
>             pro:
>                 super small
>                 fits in 1 table
>             con:
>                 could not find any docs on the speed of bitwise searches
> in pg

I got some good results using bitwise operations on an integer column.
An index on such a column helps a great deal of course. What type of
integer you need depends on how many booleans you (expect to) have.

My operations were like "WHERE (value & 80) = 80" to match against the
7th and 5th bits, with value an int4 column. Query times are in the tens
of milliseconds range.

Admittedly I only got about that many records through joins with other
tables I needed (meaning that my result set may at some point in time
have been around as big as your data set), so the planner may have
reduced the number of bitwise operations significantly for me.

The actual number of bitwise values to compare was around 40,000
integers * 25 mask values, but that got joined with over 1M records from
another result set.

A bitwise operator on a single column should (theoretically) have less
overhead than integer/boolean operators on multiple columns. Computers
are good at bitwise operations, after all.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: making a pg store of 'multiple checkboxes' efficient

From
Jonathan Vanasco
Date:
On Apr 18, 2007, at 4:28 AM, Alban Hertroys wrote:

> I got some good results using bitwise operations on an integer column.
> An index on such a column helps a great deal of course. What type of
> integer you need depends on how many booleans you (expect to) have.
>
> My operations were like "WHERE (value & 80) = 80" to match against the
> 7th and 5th bits, with value an int4 column. Query times are in the
> tens
> of milliseconds range.
>
> Admittedly I only got about that many records through joins with other
> tables I needed (meaning that my result set may at some point in time
> have been around as big as your data set), so the planner may have
> reduced the number of bitwise operations significantly for me.
>
> The actual number of bitwise values to compare was around 40,000
> integers * 25 mask values, but that got joined with over 1M records
> from
> another result set.
>
> A bitwise operator on a single column should (theoretically) have less
> overhead than integer/boolean operators on multiple columns. Computers
> are good at bitwise operations, after all.


Computers are good at bitwise operations, but software often has
scary implementations :)

thanks for the input.  I'll definitely go this route.  It was my
first thought, but there is almost no documentation out there for
this type of storage.



// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - -