Re: State of the on-disk bitmap index - Mailing list pgsql-hackers
From | Daniel Bausch |
---|---|
Subject | Re: State of the on-disk bitmap index |
Date | |
Msg-id | 5032070A.302@dvs.tu-darmstadt.de Whole thread Raw |
In response to | Re: State of the on-disk bitmap index ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Responses |
Re: State of the on-disk bitmap index
|
List | pgsql-hackers |
Am 20.08.2012 09:40, schrieb Albe Laurenz: > Daniel Bausch wrote: >> Hello Jonah, Simon, and the hackers, >> >> I am going to implement a simple kind of "encoded bitmap indexes" > (EBI). >> That is an index type where the bitmap columns may not only contain >> only a single '1' in the set of bits belonging to a tuple. Instead, > an >> additional mapping table translates the distinct values of the table >> column into a unique encoding. To select for a given value all bitmap >> columns must be compared instead of only one. Queries that match >> multiple different values (like IN lists or range queries) simplify to >> less than the full set of bitmaps that needs to be compared because of >> boolean logic. The total number of bitmaps required to represent > unique >> encodings for all different values is ceil(ld(n)), where n is the > number >> of distinct values. Compared to normal bitmap indexes this solves the >> problem of high-cardinality columns. It is targetet at data > warehousing >> scenarios with insert only data. >> >> The respective scientific paper can be found at >> http://www.dvs.tu-darmstadt.de/publications/pdf/ebi_a4.pdf > > I cannot answer your questions, but I read the paper and have some > questions myself. > > 1) As you mention, a WHERE clause that checks for only one value > will be more expensive with an encoded bitmap index than with > a regular bitmap index. If you want to implement encoded bitmap > indexes, wouldn't it be good to also implement regular bitmap > indexes so that the user has a choice? Sorry if that one was not clear: The first thing, I am going to do, is to work on the normal bitmap indexes (the one based on the Bizgres patch). I want to port it to master HEAD and give it back to the community. After that I want to base my EBI implementation on that. Eventually, I will publish that implementation, too. (After doing tuning, experiments, and make sure it works well.) > 2) The paper mentions that finding a good encoding and simplifying > bitmap access for a certain query are nontrivial problems. > Moreover, an encoding is good or bad only with respect to > certain queries, which the system does not know at index > creation time. Actually, I was not involved in writing that paper. I want to use that idea to show something different. I know of a follow up work by Golam Rabilul Alam et al. that uses the query history and data mining on that to optimize for the most common cases. There may be others. A more detailed discussion of EBI can also be found in: http://www-old.dvs.informatik.tu-darmstadt.de/staff/wu/query.TR.ps.gz > Do you have any ideas how to approach that? > If not, the paper suggests that, with enough values to check for, > even a non-optimized encoded bitmap index should perform > much better than a normal bitmap index, so maybe that's the way > to go (maybe only encode the NULL value as all zeros). Actually "all zeros" is reserved for "non-existent" (a.k.a. "deleted" or "invisible"). The thing with the "enough values" is a bit problematic, indeed, because even a DBA cannot influence how the queries of the user or the user application look like. You will not use encoded bitmap indexes or normal bitmap indexes for a column that is usually point accessed like the ID column. For that you will stick to hash or tree indexes. Kind regards, Daniel -- Daniel Bausch Wissenschaftlicher Mitarbeiter Technische Universität Darmstadt Fachbereich Informatik Fachgebiet Datenbanken und Verteilte Systeme Hochschulstraße 10 64289 Darmstadt Germany Tel.: +49 6151 16 6706 Fax: +49 6151 16 6229
pgsql-hackers by date: