Thread: Indexes on low cardinality columns

Indexes on low cardinality columns

From
Vikul Khosla
Date:
Folks,

We have just migrated from Oracle to PG.

We have a database that has approx 3 mil rows and one of the columns has a cardinality
of only 0.1% (3000 unique values).

We have to issue several queries that use this low cardinality column in a WHERE clause
as well as see this column participating in JOINS (ouch!).

A regular B-Tree index has been created on these columns.

In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw performance go
through the roof. I know Postgres does not have Bitmap indexes,
but is there a reasonable alternative to boost performance in situations where low cardinality
columns are involved ?

I dont have the option of changing schemas - so please dont go there :)

TIA,
VK

Re: Indexes on low cardinality columns

From
Greg Stark
Date:
On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?

Do you need to query on all of the 3,000 values?

If it's just particular values which are common i would suggest using
partial indexes on some other column with a where clause restricting
them to only one value in the low-cardinality column. But I wouldn't
want to have 3,000 indexes.

Alternately you could try partitioning the table, though 3,000
partitions is a lot too. If you often update this value then
partitioning wouldn't work well anyways (but then bitmap indexes
wouldn't have worked well in oracle either)

--
greg

Re: Indexes on low cardinality columns

From
Vikul Khosla
Date:

Thanks Greg!.

Yes, we do need to query on all 3000 values ... potentially. Considering
that when we changed the B-Tree indexes to Bitmap indexes in Oracle
we saw a huge performance boost ... doesn't that suggest that absence of this
feature in PG is a constraint ?

Are there any other clever workarounds to boosting performance involving
low queries on low cardinality columns ? i.e avoiding a full table scan ?

VK

From: Greg Stark <gsstark@mit.edu>
To: Vikul Khosla <vkhosla@gridsolv.com>
Cc: pgsql-performance@postgresql.org
Sent: Fri, October 16, 2009 8:27:15 PM
Subject: Re: [PERFORM] Indexes on low cardinality columns

On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?

Do you need to query on all of the 3,000 values?

If it's just particular values which are common i would suggest using
partial indexes on some other column with a where clause restricting
them to only one value in the low-cardinality column. But I wouldn't
want to have 3,000 indexes.

Alternately you could try partitioning the table, though 3,000
partitions is a lot too. If you often update this value then
partitioning wouldn't work well anyways (but then bitmap indexes
wouldn't have worked well in oracle either)

--
greg

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Indexes on low cardinality columns

From
Robert Haas
Date:
On Sat, Oct 17, 2009 at 1:02 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
>
> Thanks Greg!.
>
> Yes, we do need to query on all 3000 values ... potentially. Considering
> that when we changed the B-Tree indexes to Bitmap indexes in Oracle
> we saw a huge performance boost ... doesn't that suggest that absence of
> this
> feature in PG is a constraint ?

Maybe, but it's hard to speculate since you haven't provided any data.  :-)

Are you running PG on the same hardware you used for Oracle?  Have you
tuned postgresql.conf?  What is the actual runtime of your query under
Oracle with a btree index, Oracle with a bitmap index, and PostgreSQL
with a btree index?

It's not immediately obvious to me why a bitmap index would be better
for a case with so many distinct values.  Seems like the bitmap would
tend to be sparse.  But I'm just hand-waving here, since we have no
actual performance data to look at.  Keep in mind that PostgreSQL will
construct an in-memory bitmap from a B-tree index in some situations,
which can be quite fast.  That begs the question of what the planner
is deciding to do now - it would be really helpful if you could post
some EXPLAIN ANALYZE results.

> Are there any other clever workarounds to boosting performance involving
> low queries on low cardinality columns ? i.e avoiding a full table scan ?

Here again, if you post the EXPLAIN ANALYZE results from your queries,
it might be possible for folks on this list to offer some more
specific suggestions.

If you query mostly on this column, you could try clustering the table
on that column (and re-analyzing).

...Robert

Re: Indexes on low cardinality columns

From
Jeff Janes
Date:
On Sat, Oct 17, 2009 at 10:02 AM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
>
> Thanks Greg!.
>
> Yes, we do need to query on all 3000 values ... potentially. Considering
> that when we changed the B-Tree indexes to Bitmap indexes in Oracle
> we saw a huge performance boost ... doesn't that suggest that absence of
> this
> feature in PG is a constraint ?

Was the  bitmap index in Oracle used all by itself, or was it used in
concert with other bitmaps (either native bitmap indexes or a bitmap
conversion of a non-bitmap index) to produce the speed up?

> Are there any other clever workarounds to boosting performance involving
> low queries on low cardinality columns ? i.e avoiding a full table scan ?

Have you tired setting enable_seqscan=off to see what plan that
produces and whether it is faster or slower?  If it is better, then
lowering random_page_cost or increasing cpu_tuple_cost might help
motivate it to make that decision without having to resort to
enable_seqscan.  Of course tuning those setting just to focus on one
query could backfire rather badly.


Jeff

Re: Indexes on low cardinality columns

From
Vikul Khosla
Date:
Thanks Greg!.

Yes, we do need to query on all 3000 values ... potentially. Considering
that when we changed the B-Tree indexes to Bitmap indexes in Oracle
we saw a huge performance boost ... doesn't that suggest that absence of this
feature in PG is a constraint ?

Are there any other clever workarounds to boosting performance involving
low queries on low cardinality columns ? i.e avoiding a full table scan ?

VK


From: Greg Stark <gsstark@mit.edu>
To: Vikul Khosla <vkhosla@gridsolv.com>
Cc: pgsql-performance@postgresql.org
Sent: Fri, October 16, 2009 8:27:15 PM
Subject: Re: [PERFORM] Indexes on low cardinality columns

On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?

Do you need to query on all of the 3,000 values?

If it's just particular values which are common i would suggest using
partial indexes on some other column with a where clause restricting
them to only one value in the low-cardinality column. But I wouldn't
want to have 3,000 indexes.

Alternately you could try partitioning the table, though 3,000
partitions is a lot too. If you often update this value then
partitioning wouldn't work well anyways (but then bitmap indexes
wouldn't have worked well in oracle either)

--
greg

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Indexes on low cardinality columns

From
Ron Mayer
Date:
If the table can be clustered on that column, I suspect
it'd be a nice case for the grouped index tuples patch
http://community.enterprisedb.com/git/

Actually, simply clustering on that column might give
major speedups anyway.

Vikul Khosla wrote:
> Folks,
>
> We have just migrated from Oracle to PG.
>
> We have a database that has approx 3 mil rows and one of the columns has
> a cardinality
> of only 0.1% (3000 unique values).
>
> We have to issue several queries that use this low cardinality column in
> a WHERE clause
> as well as see this column participating in JOINS (ouch!).
>
> A regular B-Tree index has been created on these columns.
>
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?
>
> I dont have the option of changing schemas - so please dont go there :)
>
> TIA,
> VK