Thread: Indexing columns with low cardinality: persistent bitmap indexes?

Indexing columns with low cardinality: persistent bitmap indexes?

From
Bruno Lavoie
Date:
Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I  can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes... Is that feature planned in next
releases of pg?

Thanks
Bruno Lavoie

Re: Indexing columns with low cardinality: persistent bitmap indexes?

From
Alvaro Herrera
Date:
Bruno Lavoie escribió:
> Hello,
>
> I have a column with a small number of distinct values, indexing this
> one with a standard BTree is useless. How do I  can index this column
> efficiently? I searched and it seems that pg doesn't support the
> creation of persistent bitmap indexes...

It doesn't.

> Is that feature planned in next  releases of pg?

There are some efforts to get it done, but don't hold your breath (it
won't be in 8.4 either, as it has major problems currently.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Indexing columns with low cardinality: persistent bitmap indexes?

From
Reg Me Please
Date:
On Monday 12 January 2009 21:38:02 Bruno Lavoie wrote:
> Hello,
>
> I have a column with a small number of distinct values, indexing this
> one with a standard BTree is useless. How do I  can index this column
> efficiently? I searched and it seems that pg doesn't support the
> creation of persistent bitmap indexes... Is that feature planned in next
> releases of pg?
>
> Thanks
> Bruno Lavoie

I would try partial indexes, as many as the distinct values.
I'm not sure this would help, though.

--
Fahrbahn ist ein graues Band
weisse Streifen, grüner Rand

Re: Indexing columns with low cardinality: persistent bitmap indexes?

From
"Jaime Casanova"
Date:
On Mon, Jan 12, 2009 at 4:16 PM, Reg Me Please <regmeplease@gmail.com> wrote:
> On Monday 12 January 2009 21:38:02 Bruno Lavoie wrote:
>> Hello,
>>
>> I have a column with a small number of distinct values, indexing this
>> one with a standard BTree is useless. How do I  can index this column
>> efficiently? I searched and it seems that pg doesn't support the
>> creation of persistent bitmap indexes... Is that feature planned in next
>> releases of pg?
>>
>> Thanks
>> Bruno Lavoie
>
> I would try partial indexes, as many as the distinct values.
> I'm not sure this would help, though.
>

you should create partial indexes only on those values that are a
lower fraction on the table
ie: if you have

value     "fraction of the table that has this value"
1                5%
2                3%
3               20%
4               25%
5               47%

then only partial indexes on values 1 and 2 are of some value

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: Indexing columns with low cardinality: persistent bitmap indexes?

From
"Scott Marlowe"
Date:
On Mon, Jan 12, 2009 at 1:38 PM, Bruno Lavoie <bruno.lavoie@gmail.com> wrote:
> Hello,
>
> I have a column with a small number of distinct values, indexing this one
> with a standard BTree is useless. How do I  can index this column
> efficiently? I searched and it seems that pg doesn't support the creation of
> persistent bitmap indexes... Is that feature planned in next releases of pg?

You have a few options based on your access patterns.  If you tend to
access just one of these an get them all at once, then either clusting
on this value, or partitioning your table will help.

If you access your data using these values and other column values at
the same time, then partial or multi-column indexes might help.

Re: Indexing columns with low cardinality: persistentbitmap indexes?

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Alvaro Herrera
> Sent: Monday, January 12, 2009 12:41 PM
> To: Bruno Lavoie
> Cc: PostgreSQL
> Subject: Re: [GENERAL] Indexing columns with low cardinality:
> persistentbitmap indexes?
>
> Bruno Lavoie escribió:
> > Hello,
> >
> > I have a column with a small number of distinct values, indexing this
> > one with a standard BTree is useless. How do I  can index this column
> > efficiently? I searched and it seems that pg doesn't support the
> > creation of persistent bitmap indexes...
>
> It doesn't.
>
> > Is that feature planned in next  releases of pg?
>
> There are some efforts to get it done, but don't hold your breath (it
> won't be in 8.4 either, as it has major problems currently.)

Here is an interesting experiment:

Application of Bitmap Index to Information Retrieval. K. Fujioka, Y. Uematsu, and M. Onizuka. WWW 2008
Source: [ACM]
Synopsis: This paper proposes a hierarchical structure called HS-bitmap index to represent document-term matrix. The
authorsimplemented their data structure on PostgreSQL and observed it to perform better than an inverted index. A
short-comingmight be that HS-bitmap index takes more space than the inverted index even after compression.  
Note this work makes use of PostgreSQL but is unrelated to the on-going work of implementing bitmap index in
PostgreSQL.

http://portal.acm.org/citation.cfm?doid=1367497.1367680

Here is the research page where I found the above:
http://www-users.cs.umn.edu/~kewu/annotated.html



Re: Indexing columns with low cardinality: persistent bitmap indexes?

From
Martin Gainty
Date:
Jaime-

Porque no utiliza Bitmap?

*Saludos Cordiales desde EEUU*
Martin
______________________________________________
Disclaimer and confidentiality note
Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission.




> Date: Mon, 12 Jan 2009 16:20:40 -0500
> From: jcasanov@systemguards.com.ec
> To: regmeplease@gmail.com
> Subject: Re: [GENERAL] Indexing columns with low cardinality: persistent bitmap indexes?
> CC: pgsql-general@postgresql.org; bruno.lavoie@gmail.com
>
> On Mon, Jan 12, 2009 at 4:16 PM, Reg Me Please <regmeplease@gmail.com> wrote:
> > On Monday 12 January 2009 21:38:02 Bruno Lavoie wrote:
> >> Hello,
> >>
> >> I have a column with a small number of distinct values, indexing this
> >> one with a standard BTree is useless. How do I can index this column
> >> efficiently? I searched and it seems that pg doesn't support the
> >> creation of persistent bitmap indexes... Is that feature planned in next
> >> releases of pg?
> >>
> >> Thanks
> >> Bruno Lavoie
> >
> > I would try partial indexes, as many as the distinct values.
> > I'm not sure this would help, though.
> >
>
> you should create partial indexes only on those values that are a
> lower fraction on the table
> ie: if you have
>
> value "fraction of the table that has this value"
> 1 5%
> 2 3%
> 3 20%
> 4 25%
> 5 47%
>
> then only partial indexes on values 1 and 2 are of some value
>
> --
> Atentamente,
> Jaime Casanova
> Soporte y capacitación de PostgreSQL
> Asesoría y desarrollo de sistemas
> Guayaquil - Ecuador
> Cel. +59387171157
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Windows Live™: Keep your life in sync. Check it out.

Re: Indexing columns with low cardinality: persistent bitmap indexes?

From
"Adam Rich"
Date:
> > Hello,
> >
> > I have a column with a small number of distinct values, indexing this
one
> > with a standard BTree is useless. How do I  can index this column
> > efficiently? I searched and it seems that pg doesn't support the
creation of
> > persistent bitmap indexes... Is that feature planned in next releases of
pg?

> You have a few options based on your access patterns.  If you tend to
> access just one of these an get them all at once, then either clusting
> on this value, or partitioning your table will help.

How will clustering benefit this pattern?  Won't a full table scan be
required
regardless of the table being clustered?  And I thought the point of
clustering
was the organize the table by some indexed key, requiring fewer seeks and
increasing the likelihood of the pages being in the cache ....  if the index
is never used in this case ( low cardinality ) would it still help?



> If you access your data using these values and other column values at
> the same time, then partial or multi-column indexes might help.