Thread: An inverted index using roaring bitmaps

An inverted index using roaring bitmaps

From
Chinmay Kanchi
Date:
I've been doing some preliminary prep work to see how an inverted index using roaring bitmaps (https://roaringbitmap.org/) would perform. I'm presenting some early work using SQL code with the roaring bitmap Postgres extension (https://github.com/ChenHuajun/pg_roaringbitmap) to simulate a hypothetical index using this approach. 

I'd like to solicit feedback from the community to see if this is something worth pursuing or if there are potential issues that I'm not aware of (or if this approach has been considered in the past and discarded for whatever reason). For context, my experience with Postgres is primarily as a user and I'm not at all familiar with the code base, so please be gentle :).

That said, here's a quick and dirty demo:

I have a table "cities"

              Table "public.cities"
 Column  | Type | Collation | Nullable | Default
---------+------+-----------+----------+---------
 city    | text |           |          |
 country | text |           |          |
Indexes:
    "cities_country_idx" btree (country)

select count(*) from cities;
 count  
--------
 139739
(1 row)

And just some sample rows:

select * from cities order by random() limit 10;
          city          |           country            
------------------------+------------------------------
 Alcalá de la Selva     | Spain
 Bekirhan               | Turkey
 Ceggia                 | Italy
 Châtillon-en-Vendelais | France
 Hohenfelde             | Germany
 Boedo                  | Argentina
 Saint-Vith             | Belgium
 Gaggenau               | Germany
 Lake Ozark             | United States
 Igunga                 | Tanzania, United Republic of
(10 rows)

Since a bitmap requires you to convert your inputs into integers, I created a function as a hack to convert our TIDs to integers. It's ugly as hell, but it serves. 2048 is 2^11, which according to the GIN index source code is a safe assumption for the highest possible MaxHeapTuplesPerPage. 

create function ctid_to_int(ctid tid) returns integer as $$
select (ctid::text::point)[0] * 2048 + (ctid::text::point)[1];
$$
language sql returns null on null input;

And the reverse:
create function int_to_ctid(i integer) returns tid as $$
select point(i/2048, i%2048)::text::tid;
$$
language sql returns null on null input;

In addition, I created a table "cities_rb" to roughly represent an "index" on the country column:

create table cities_rb as (select country, roaringbitmap(array_agg(ctid_to_int(ctid))::text) idx from cities group by country);

                 Table "public.cities_rb"
 Column  |     Type      | Collation | Nullable | Default
---------+---------------+-----------+----------+---------
 country | text          |           |          |
 idx     | roaringbitmap |           |          |


Now for the fun stuff - to simulate the "index" I will be running some queries against the cities_rb table using bitmap aggregations and comparing them to functionally the same queries using the BTree index on cities.

explain analyze select ctid from cities where country = 'Japan';
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on cities  (cost=18.77..971.83 rows=1351 width=6) (actual time=0.041..0.187 rows=1322 loops=1)
   Recheck Cond: (country = 'Japan'::text)
   Heap Blocks: exact=65
   ->  Bitmap Index Scan on cities_country_idx  (cost=0.00..18.43 rows=1351 width=0) (actual time=0.031..0.031 rows=1322 loops=1)
         Index Cond: (country = 'Japan'::text)
 Planning Time: 0.055 ms
 Execution Time: 0.233 ms
(7 rows)

explain analyze select rb_to_array(idx) from cities_rb where country = 'Japan';
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on cities_rb  (cost=0.00..14.88 rows=1 width=32) (actual time=0.050..0.067 rows=1 loops=1)
   Filter: (country = 'Japan'::text)
   Rows Removed by Filter: 229
 Planning Time: 0.033 ms
 Execution Time: 0.076 ms
(5 rows)

explain analyze select count(*) from cities where country = 'Japan';
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=35.31..35.32 rows=1 width=8) (actual time=0.151..0.151 rows=1 loops=1)
   ->  Index Only Scan using cities_country_idx on cities  (cost=0.29..31.94 rows=1351 width=0) (actual time=0.026..0.103 rows=1322 loops=1)
         Index Cond: (country = 'Japan'::text)
         Heap Fetches: 0
 Planning Time: 0.056 ms
 Execution Time: 0.180 ms
(6 rows)

explain analyze select rb_cardinality(idx) from cities_rb where country = 'Japan';
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 Seq Scan on cities_rb  (cost=0.00..14.88 rows=1 width=8) (actual time=0.037..0.053 rows=1 loops=1)
   Filter: (country = 'Japan'::text)
   Rows Removed by Filter: 229
 Planning Time: 0.037 ms
 Execution Time: 0.063 ms
(5 rows)


explain analyze select country, count(*) from cities group by country;
                                                    QUERY PLAN                                                    
-------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2990.09..2992.22 rows=214 width=17) (actual time=34.054..34.076 rows=230 loops=1)
   Group Key: country
   Batches: 1  Memory Usage: 77kB
   ->  Seq Scan on cities  (cost=0.00..2291.39 rows=139739 width=9) (actual time=0.005..8.552 rows=139739 loops=1)
 Planning Time: 0.051 ms
 Execution Time: 34.103 ms
(6 rows)

explain analyze select country, rb_cardinality(idx) from cities_rb;
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Seq Scan on cities_rb  (cost=0.00..14.88 rows=230 width=19) (actual time=0.008..0.184 rows=230 loops=1)
 Planning Time: 0.030 ms
 Execution Time: 0.200 ms
(3 rows)

The simulated index in this case is outrageously fast, up to ~150x on the GROUP BY. 


Cheers,
Chinmay

Re: An inverted index using roaring bitmaps

From
Peter Geoghegan
Date:
On Mon, Jun 6, 2022 at 10:42 PM Chinmay Kanchi <cgkanchi@gmail.com> wrote:
> The simulated index in this case is outrageously fast, up to ~150x on the GROUP BY.

Couldn't you make a similar argument in favor of adding a B-Tree index
on "country"? This probably won't be effective in practice, but the
reasons for this have little to do with how a B-Tree index represents
TIDs. A GIN index can compress TIDs much more effectively, but the
same issues apply there.

The main reason why it won't work with a B-Tree is that indexes in
Postgres are not transactionally consistent structures, in general.
Whereas your cities_rb table is transactionally consistent (or perhaps
just simulates a transactionally consistent index). Maybe it could
work in cases where an index-only scan could be used, which is roughly
comparable to having a transactionally consistent index. But that
depends on having the visibility map set most or all heap pages
all-visible.

GIN indexes don't support index-only scans, and I don't see that
changing. So it's possible that just adding TID compression to B-Tree
indexes would significantly speedup this kind of query, just by making
low cardinality indexes much smaller. Though that's a hard project,
for many subtle reasons. This really amounts to building a bitmap
index, of the kind that are typically used for data warehousing, which
is something that has been discussed plenty of times on this list. GIN
indexes were really built for things like full-text search, not for
data warehousing.

B-Tree deduplication makes B-Tree indexes a lot smaller, but it tends
to "saturate" at about 3.5x smaller (relative to the same index with
deduplication disabled) once there are about 10 or so distinct keys
per row (the exception is indexes that happen to have huge keys, which
aren't very interesting). There are many B-Tree indexes (with typical
sized keys) that are similar in size to an "equivalent" GIN index --
the ability to compress TIDs isn't very valuable when you don't have
that many TIDs per key anyway. It's different when you have many TIDs
per key, of course. GIN indexes alone don't "saturate" at the same
point -- there is often a big size difference between low cardinality
and ultra low cardinality data. There are bound to be cases where not
having that level of space efficiency matters, especially with B-Tree
index-only scans that scan a significant fraction of the entire index,
or even the entire index.

-- 
Peter Geoghegan



Re: An inverted index using roaring bitmaps

From
Chinmay Kanchi
Date:
I personally don't think this is a great replacement for a BTree index - for one thing, it isn't really possible to use this approach beyond equality comparisons (for scalars) or "contains"-type operations for arrays (or tsvectors, jsonb, etc). I see this more as "competing" with GIN, though I think GIN solves a different use-case. The primary thought here is that we could build lightning fast inverted indexes for the cases where these really help. 

I played with using roaring bitmaps in production to build rollup tables, for instance - where a single bitmap per key could satisfy count() queries and count(*) ... GROUP BY with multiple WHERE conditions way faster than even an index-only scan could, and without the overhead of multi-column indexes. In our particular case, there were about 2 dozen columns with around 30-40 million rows, and we were able to run these queries in single-digit milliseconds. We ultimately abandoned that project because of the difficulty of keeping the bitmaps in sync with changing data, which would no longer be an issue, if this was built as an index.

I think your point about data warehouse-style bitmap indexes hits the nail on the head here. This would be pretty much just that, a very efficient way to accelerate such queries.

Cheers,
Chinmay


On Tue, Jun 7, 2022 at 11:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Jun 6, 2022 at 10:42 PM Chinmay Kanchi <cgkanchi@gmail.com> wrote:
> The simulated index in this case is outrageously fast, up to ~150x on the GROUP BY.

Couldn't you make a similar argument in favor of adding a B-Tree index
on "country"? This probably won't be effective in practice, but the
reasons for this have little to do with how a B-Tree index represents
TIDs. A GIN index can compress TIDs much more effectively, but the
same issues apply there.

The main reason why it won't work with a B-Tree is that indexes in
Postgres are not transactionally consistent structures, in general.
Whereas your cities_rb table is transactionally consistent (or perhaps
just simulates a transactionally consistent index). Maybe it could
work in cases where an index-only scan could be used, which is roughly
comparable to having a transactionally consistent index. But that
depends on having the visibility map set most or all heap pages
all-visible.

GIN indexes don't support index-only scans, and I don't see that
changing. So it's possible that just adding TID compression to B-Tree
indexes would significantly speedup this kind of query, just by making
low cardinality indexes much smaller. Though that's a hard project,
for many subtle reasons. This really amounts to building a bitmap
index, of the kind that are typically used for data warehousing, which
is something that has been discussed plenty of times on this list. GIN
indexes were really built for things like full-text search, not for
data warehousing.

B-Tree deduplication makes B-Tree indexes a lot smaller, but it tends
to "saturate" at about 3.5x smaller (relative to the same index with
deduplication disabled) once there are about 10 or so distinct keys
per row (the exception is indexes that happen to have huge keys, which
aren't very interesting). There are many B-Tree indexes (with typical
sized keys) that are similar in size to an "equivalent" GIN index --
the ability to compress TIDs isn't very valuable when you don't have
that many TIDs per key anyway. It's different when you have many TIDs
per key, of course. GIN indexes alone don't "saturate" at the same
point -- there is often a big size difference between low cardinality
and ultra low cardinality data. There are bound to be cases where not
having that level of space efficiency matters, especially with B-Tree
index-only scans that scan a significant fraction of the entire index,
or even the entire index.

--
Peter Geoghegan

Re: An inverted index using roaring bitmaps

From
Peter Geoghegan
Date:
On Tue, Jun 7, 2022 at 5:00 PM Chinmay Kanchi <cgkanchi@gmail.com> wrote:
> I personally don't think this is a great replacement for a BTree index - for one thing, it isn't really possible to
usethis approach beyond equality comparisons (for scalars) or "contains"-type operations for arrays (or tsvectors,
jsonb,etc).
 

Why not? A bitmap is just a way of representing TIDs, that is often
very space efficient once compression is applied. In principle a
bitmap index can do anything that a B-Tree index can do, at least for
SELECTs.

Bitmap indexes are considered totally distinct to B-Tree indexes in
some DB systems because the concurrency control characteristics (the
use of heavyweight locks to protect the logical contents of the
database) are very different . I think that this is because the index
structure itself is so dense that the only practical approach that's
compatible with 2PL style concurrency control (or versions to MVCC
based on 2PL) is to lock a large number of TIDs at the same time. This
can lead to deadlocks with even light concurrent modifications --
which would never happen with an equivalent B-Tree index. But the data
structure is nevertheless more similar than different.

I probably wouldn't want to have a technique like roaring bitmap
compression of TIDs get applied by default within Postgres B-Trees,
but the reasons for that are pretty subtle. I might still advocate
*optional* TID list compression in Postgres B-Trees, which might even
be something we'd end up calling a bitmap index, that would only be
recommended for use in data warehousing scenarios. Extreme TID list
compression isn't free -- it really isn't desirable when there are
many concurrent modifications to relatively few index pages, as is
common in OLTP applications. That's one important reason why bitmap
indexes are generally only used in data warehousing environments,
where the downside doesn't really matter, but the upside pays for
itself (usually a fact table will have several bitmap indexes that are
usually combined, not just one).

> We ultimately abandoned that project because of the difficulty of keeping the bitmaps in sync with changing data,
whichwould no longer be an issue, if this was built as an index.
 

I think that it would in fact still be an issue if this was built as
an index. There is a reason why the concurrency characteristics of
bitmap indexes make them unsuitable for OLTP apps, which seems
related. That wouldn't mean that it wouldn't still be worth it, but it
would definitely be a real downside with some workloads. B-Tree
deduplication is designed to have very little overhead with mixed
reads and writes, so it's a performance all-rounder that can still be
beaten by specialized techniques that come with their own downsides.

-- 
Peter Geoghegan