Indexing a Boolean or Null column? - Mailing list pgsql-performance

From D. Dante Lorenso
Subject Indexing a Boolean or Null column?
Date
Msg-id 3FF769EA.7090508@lorenso.com
Whole thread Raw
Responses Re: Indexing a Boolean or Null column?
Re: Indexing a Boolean or Null column?
List pgsql-performance
I've been debating with a collegue who argues that indexing a
boolean column is a BAD idea and that is will actually slow
down queries.

My plan is to have a table with many rows sharing 'versions'
(version/archive/history) of data where the most current row
is the one where 'is_active' contains a true value.

If the table begins to look like this:

data_id(pk) | data_lookup_key | data_is_active | ...
------------+-----------------+----------------+--------
1           | banana          | false          | ...
2           | banana          | false          | ...
3           | banana          | false          | ...
4           | banana          | false          | ...
5           | banana          | false          | ...
6           | banana          | false          | ...
7           | banana          | false          | ...
8           | banana          | false          | ...
9           | banana          | true           | ...
10          | apple           | true           | ...
11          | pear            | false          | ...
12          | pear            | false          | ...
13          | pear            | false          | ...
14          | pear            | false          | ...
15          | pear            | false          | ...
...
1000000     | pear            | true           | ...

Will an index on the 'data_is_active' column be used or work
as I expect?  I'm assuming that I may have a million entries
sharing the same 'data_lookup_key' and I'll be using that to
search for the active version of the row.

    SELECT *
    FROM table
    WHERE data_lookup_key = 'pear'
    AND data_is_active IS TRUE;

Does it make sense to have an index on data_is_active?

Now, I've read that in some databases the index on a column that
has relatively even distribution of values over a small set of values
will not be efficient.

I bet this is in a FAQ somewhere.  Can you point me in the right
direction?

Dante





pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: deferred foreign keys
Next
From: Tom Lane
Date:
Subject: Re: Indexing a Boolean or Null column?