Thread: Indexes on low cardinality columns
Folks,
We have just migrated from Oracle to PG.
We have a database that has approx 3 mil rows and one of the columns has a cardinality
of only 0.1% (3000 unique values).
We have to issue several queries that use this low cardinality column in a WHERE clause
as well as see this column participating in JOINS (ouch!).
A regular B-Tree index has been created on these columns.
In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw performance go
through the roof. I know Postgres does not have Bitmap indexes,
but is there a reasonable alternative to boost performance in situations where low cardinality
columns are involved ?
I dont have the option of changing schemas - so please dont go there :)
TIA,
VK
We have just migrated from Oracle to PG.
We have a database that has approx 3 mil rows and one of the columns has a cardinality
of only 0.1% (3000 unique values).
We have to issue several queries that use this low cardinality column in a WHERE clause
as well as see this column participating in JOINS (ouch!).
A regular B-Tree index has been created on these columns.
In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw performance go
through the roof. I know Postgres does not have Bitmap indexes,
but is there a reasonable alternative to boost performance in situations where low cardinality
columns are involved ?
I dont have the option of changing schemas - so please dont go there :)
TIA,
VK
On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote: > In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw > performance go > through the roof. I know Postgres does not have Bitmap indexes, > but is there a reasonable alternative to boost performance in situations > where low cardinality > columns are involved ? Do you need to query on all of the 3,000 values? If it's just particular values which are common i would suggest using partial indexes on some other column with a where clause restricting them to only one value in the low-cardinality column. But I wouldn't want to have 3,000 indexes. Alternately you could try partitioning the table, though 3,000 partitions is a lot too. If you often update this value then partitioning wouldn't work well anyways (but then bitmap indexes wouldn't have worked well in oracle either) -- greg
Thanks Greg!.
Yes, we do need to query on all 3000 values ... potentially. Considering
that when we changed the B-Tree indexes to Bitmap indexes in Oracle
we saw a huge performance boost ... doesn't that suggest that absence of this
feature in PG is a constraint ?
Are there any other clever workarounds to boosting performance involving
low queries on low cardinality columns ? i.e avoiding a full table scan ?
VK
From: Greg Stark <gsstark@mit.edu>
To: Vikul Khosla <vkhosla@gridsolv.com>
Cc: pgsql-performance@postgresql.org
Sent: Fri, October 16, 2009 8:27:15 PM
Subject: Re: [PERFORM] Indexes on low cardinality columns
On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?
Do you need to query on all of the 3,000 values?
If it's just particular values which are common i would suggest using
partial indexes on some other column with a where clause restricting
them to only one value in the low-cardinality column. But I wouldn't
want to have 3,000 indexes.
Alternately you could try partitioning the table, though 3,000
partitions is a lot too. If you often update this value then
partitioning wouldn't work well anyways (but then bitmap indexes
wouldn't have worked well in oracle either)
--
greg
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Yes, we do need to query on all 3000 values ... potentially. Considering
that when we changed the B-Tree indexes to Bitmap indexes in Oracle
we saw a huge performance boost ... doesn't that suggest that absence of this
feature in PG is a constraint ?
Are there any other clever workarounds to boosting performance involving
low queries on low cardinality columns ? i.e avoiding a full table scan ?
VK
From: Greg Stark <gsstark@mit.edu>
To: Vikul Khosla <vkhosla@gridsolv.com>
Cc: pgsql-performance@postgresql.org
Sent: Fri, October 16, 2009 8:27:15 PM
Subject: Re: [PERFORM] Indexes on low cardinality columns
On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?
Do you need to query on all of the 3,000 values?
If it's just particular values which are common i would suggest using
partial indexes on some other column with a where clause restricting
them to only one value in the low-cardinality column. But I wouldn't
want to have 3,000 indexes.
Alternately you could try partitioning the table, though 3,000
partitions is a lot too. If you often update this value then
partitioning wouldn't work well anyways (but then bitmap indexes
wouldn't have worked well in oracle either)
--
greg
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Sat, Oct 17, 2009 at 1:02 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote: > > Thanks Greg!. > > Yes, we do need to query on all 3000 values ... potentially. Considering > that when we changed the B-Tree indexes to Bitmap indexes in Oracle > we saw a huge performance boost ... doesn't that suggest that absence of > this > feature in PG is a constraint ? Maybe, but it's hard to speculate since you haven't provided any data. :-) Are you running PG on the same hardware you used for Oracle? Have you tuned postgresql.conf? What is the actual runtime of your query under Oracle with a btree index, Oracle with a bitmap index, and PostgreSQL with a btree index? It's not immediately obvious to me why a bitmap index would be better for a case with so many distinct values. Seems like the bitmap would tend to be sparse. But I'm just hand-waving here, since we have no actual performance data to look at. Keep in mind that PostgreSQL will construct an in-memory bitmap from a B-tree index in some situations, which can be quite fast. That begs the question of what the planner is deciding to do now - it would be really helpful if you could post some EXPLAIN ANALYZE results. > Are there any other clever workarounds to boosting performance involving > low queries on low cardinality columns ? i.e avoiding a full table scan ? Here again, if you post the EXPLAIN ANALYZE results from your queries, it might be possible for folks on this list to offer some more specific suggestions. If you query mostly on this column, you could try clustering the table on that column (and re-analyzing). ...Robert
On Sat, Oct 17, 2009 at 10:02 AM, Vikul Khosla <vkhosla@gridsolv.com> wrote: > > Thanks Greg!. > > Yes, we do need to query on all 3000 values ... potentially. Considering > that when we changed the B-Tree indexes to Bitmap indexes in Oracle > we saw a huge performance boost ... doesn't that suggest that absence of > this > feature in PG is a constraint ? Was the bitmap index in Oracle used all by itself, or was it used in concert with other bitmaps (either native bitmap indexes or a bitmap conversion of a non-bitmap index) to produce the speed up? > Are there any other clever workarounds to boosting performance involving > low queries on low cardinality columns ? i.e avoiding a full table scan ? Have you tired setting enable_seqscan=off to see what plan that produces and whether it is faster or slower? If it is better, then lowering random_page_cost or increasing cpu_tuple_cost might help motivate it to make that decision without having to resort to enable_seqscan. Of course tuning those setting just to focus on one query could backfire rather badly. Jeff
Thanks Greg!.
Yes, we do need to query on all 3000 values ... potentially. Considering
that when we changed the B-Tree indexes to Bitmap indexes in Oracle
we saw a huge performance boost ... doesn't that suggest that absence of this
feature in PG is a constraint ?
Are there any other clever workarounds to boosting performance involving
low queries on low cardinality columns ? i.e avoiding a full table scan ?
VK
Yes, we do need to query on all 3000 values ... potentially. Considering
that when we changed the B-Tree indexes to Bitmap indexes in Oracle
we saw a huge performance boost ... doesn't that suggest that absence of this
feature in PG is a constraint ?
Are there any other clever workarounds to boosting performance involving
low queries on low cardinality columns ? i.e avoiding a full table scan ?
VK
From: Greg Stark <gsstark@mit.edu>
To: Vikul Khosla <vkhosla@gridsolv.com>
Cc: pgsql-performance@postgresql.org
Sent: Fri, October 16, 2009 8:27:15 PM
Subject: Re: [PERFORM] Indexes on low cardinality columns
On Fri, Oct 16, 2009 at 4:36 PM, Vikul Khosla <vkhosla@gridsolv.com> wrote:
> In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw
> performance go
> through the roof. I know Postgres does not have Bitmap indexes,
> but is there a reasonable alternative to boost performance in situations
> where low cardinality
> columns are involved ?
Do you need to query on all of the 3,000 values?
If it's just particular values which are common i would suggest using
partial indexes on some other column with a where clause restricting
them to only one value in the low-cardinality column. But I wouldn't
want to have 3,000 indexes.
Alternately you could try partitioning the table, though 3,000
partitions is a lot too. If you often update this value then
partitioning wouldn't work well anyways (but then bitmap indexes
wouldn't have worked well in oracle either)
--
greg
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
If the table can be clustered on that column, I suspect it'd be a nice case for the grouped index tuples patch http://community.enterprisedb.com/git/ Actually, simply clustering on that column might give major speedups anyway. Vikul Khosla wrote: > Folks, > > We have just migrated from Oracle to PG. > > We have a database that has approx 3 mil rows and one of the columns has > a cardinality > of only 0.1% (3000 unique values). > > We have to issue several queries that use this low cardinality column in > a WHERE clause > as well as see this column participating in JOINS (ouch!). > > A regular B-Tree index has been created on these columns. > > In Oracle, we replaced the B-Tree Indexes with Bitmap indexes and saw > performance go > through the roof. I know Postgres does not have Bitmap indexes, > but is there a reasonable alternative to boost performance in situations > where low cardinality > columns are involved ? > > I dont have the option of changing schemas - so please dont go there :) > > TIA, > VK