Re: Index scan with bitmap filter - has this been explored - Mailing list pgsql-general

From Tomas Vondra
Subject Re: Index scan with bitmap filter - has this been explored
Date
Msg-id f816bcdd-ad77-4538-a2dd-136c1e389c2e@vondra.me
Whole thread Raw
In response to Index scan with bitmap filter - has this been explored  (Laurence Newman <lenewman05@googlemail.com>)
List pgsql-general
Hello Laurence,

I think pgsql-hackers would be a better list to discuss this.
pgsql-general is a user list, while pgsql-hakers is about development.
So I'm CCing that list, and let's continue the discussion there (you may
need to subscribe to that list, if you're not a subscriber already).

FWIW we're in the final push for PG19 features, so people are focusing o
that for the next ~month. It may take time to get feedback.

On 3/13/26 13:08, Laurence Newman wrote:
> Hello,
> 
> I've been thinking about a scan strategy that is sort of in between a
> bitmap scan and an  index scan, and I wanted to check whether this has
> been explored before much or if there's a
> reason it wouldn't work well in practice.
> 

I'm not aware of any formal implementation proposal, but that does not
mean it couldn't be a good idea (at least for some use cases).

> Example scenario where it could be useful: you need the first N rows
> ordered by a B-tree column, filtered by a condition on another column
> with a GIN index, e.g. full text search. Today I believe Postgres has
> two options that use indexes:
> 
>     1. Bitmap scan on the filter index → heap fetch all matches → sort →
> limit. Can be expensive
>  when there are many matches but you only want a few results returned.
>     2. Index scan on the B-tree → recheck the filter condition against
> the heap for each
>  row. Can be expensive when the match rate is low and many non-matching
> rows are found before finding N hits.
> 

There's also the possibility to create a an index with all the columns
(aka covering index), in which case we don't need to actually visit the
heap in most cases thanks to IOS.

But that is imperfect for various reasons - sometimes the index can't
have all the columns, and then we don't check any filters. That could be
improved without inventing a new "hybrid" scan.

But even then there are cases where doing what you describe could be
interesting. For example, it'd allow combining indexes with different
index AMs (e.g. btree + gist). Which covering indexes can't do easily.

Something similar applies to vector search, where I'm not sure covering
indexes are viable. Maybe allowing to pass a bitmap to the index scan
would be helpful.

> The idea for the new scan strategy is: build the TID bitmap from the
> filter index first (same as a bitmap scan would), then walk the B-tree
> in order, but instead of going to the heap to check the
> filter condition, check each TID against the bitmap. Matching TIDs get
> returned
> directly in order.
> 

It's hard to give clear opinion without looking at this much deeper, but
it seems interesting and possibly worth exploring.

> I built a small proof of concept as a pgrx extension against PG 18 to
> try and test this:
> https://github.com/lauri3new/ordered-bitmap-scan <https://github.com/
> lauri3new/ordered-bitmap-scan>. The benchmarks there show it using fewer
> buffer hits than either a bitmap scan or index scan for the example data
> distribution - which is quite skewed to show the benefit.
> 

I think it'd be helpful to include at least some benchmark numbers, so
that people can get an idea without having to build/run.

> I want to caveat that I am a web backend developer and not a Postgres
> expert by any means, so there's probably good reasons this isn't a thing
> (planner cost or benefit is too small/niche). I would appreciate though
> any pointers to prior discussion or feedback on whether this is worth
> looking into further or not.
> 

I'm not aware of any prior discussions or reasons why this would not
work (or be competitive with other solutions). I think there's
definitely a lot of open questions. It's not just about the execution,
but also about generating the interesting paths (with indexes used as
inputs for other indexes) and costing.


regards

-- 
Tomas Vondra




pgsql-general by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Replication to standby broke with WAL file corruption
Next
From: Igor Korot
Date:
Subject: Re: Does included columns part of the PK