I've been debating with a collegue who argues that indexing a
boolean column is a BAD idea and that is will actually slow
down queries.
My plan is to have a table with many rows sharing 'versions'
(version/archive/history) of data where the most current row
is the one where 'is_active' contains a true value.
If the table begins to look like this:
data_id(pk) | data_lookup_key | data_is_active | ...
------------+-----------------+----------------+--------
1 | banana | false | ...
2 | banana | false | ...
3 | banana | false | ...
4 | banana | false | ...
5 | banana | false | ...
6 | banana | false | ...
7 | banana | false | ...
8 | banana | false | ...
9 | banana | true | ...
10 | apple | true | ...
11 | pear | false | ...
12 | pear | false | ...
13 | pear | false | ...
14 | pear | false | ...
15 | pear | false | ...
...
1000000 | pear | true | ...
Will an index on the 'data_is_active' column be used or work
as I expect? I'm assuming that I may have a million entries
sharing the same 'data_lookup_key' and I'll be using that to
search for the active version of the row.
SELECT *
FROM table
WHERE data_lookup_key = 'pear'
AND data_is_active IS TRUE;
Does it make sense to have an index on data_is_active?
Now, I've read that in some databases the index on a column that
has relatively even distribution of values over a small set of values
will not be efficient.
I bet this is in a FAQ somewhere. Can you point me in the right
direction?
Dante