Home > mailing lists

Re: Indexing a Boolean or Null column? - Mailing list pgsql-performance

From	D. Dante Lorenso
Subject	Re: Indexing a Boolean or Null column?
Date	January 4, 2004 01:25:35
Msg-id	3FF7A395.4050106@lorenso.com Whole thread Raw
In response to	Re: Indexing a Boolean or Null column? (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
Responses	Re: Indexing a Boolean or Null column? Re: Indexing a Boolean or Null column?
List	pgsql-performance

Tree view

Christopher Kings-Lynne wrote:

 > > Will an index on the 'data_is_active' column be used or work
 > > as I expect?  I'm assuming that I may have a million entries
 > > sharing the same 'data_lookup_key' and I'll be using that to
 > > search for the active version of the row.

 > An index just on a boolean column won't be 'selective enough'.
 > eg. The index will only be able to choose 50% of the table -
 > since it's faster to do a full table scan in that case, the
 > index won't get used.

Ok, so ...evenly distributed data on small set of values forces
sequential scan since that's faster.  I expected that based on
what I've read so far.

 > A multi keyed index, however will work a bit better, eg an index
 > over (data_lookup_key, data_is_active).
 >
 > That way, the index will first be able to find the correct
 > key    (which is nicely selective) and then will be able to
 > halve the resulting ? search space to get the active ones.

I'm not using the 50% TRUE / 50% FALSE model.  My model will be
more like only ONE value IS TRUE for 'is_active' for each
'data_lookup_key' in my table.  All the rest are FALSE.  In
this case for 100 rows all having the same 'data_lookup_key'
we are looking at a 99% FALSE / 1% TRUE model ... and what I'll
be searching for is the ONE TRUE.

In this case, it WILL pay off to have the index on a boolean
column, yes?  Will I win my debate with my collegue? ;-)

I think Tom Lanes suggestion of partial indexes is what I need to
look into.

 > BTW, you shouldn't use 'banana', 'pear', etc as the data_lookup_key,
 > you should make another table like this: ... And then replace the
 > data_lookup_key col with a column of integers that is a foreign
 > key to the names table - waaaaay faster to process.

Gotcha, yeah, I'm targeting as close to 3NF as I get get.  Was just
trying to be generic for my example ... bad example, oops.

Dante

pgsql-performance by date:

From: Christopher Kings-Lynne
Date: 04 January 2004, 00:33:35
Subject: Re: Indexing a Boolean or Null column?

From: Christopher Kings-Lynne
Date: 04 January 2004, 03:21:56
Subject: Re: Indexing a Boolean or Null column?

Re: Indexing a Boolean or Null column? - Mailing list pgsql-performance

Previous

Next