Index scan with bitmap filter - has this been explored - Mailing list pgsql-general

From Laurence Newman
Subject Index scan with bitmap filter - has this been explored
Date
Msg-id CAHueFRVqNuKQ0+LQPZVwMgJ+LpuTbZ21+ztwFbgPhBf9maFGYA@mail.gmail.com
Whole thread
Responses Re: Index scan with bitmap filter - has this been explored
Re: Index scan with bitmap filter - has this been explored
List pgsql-general
Hello,

I've been thinking about a scan strategy that is sort of in between a bitmap scan and an  index scan, and I wanted to check whether this has been explored before much or if there's a
reason it wouldn't work well in practice.

Example scenario where it could be useful: you need the first N rows ordered by a B-tree column, filtered by a condition on another column with a GIN index, e.g. full text search. Today I believe Postgres has two options that use indexes:

    1. Bitmap scan on the filter index → heap fetch all matches → sort → limit. Can be expensive
 when there are many matches but you only want a few results returned.
    2. Index scan on the B-tree → recheck the filter condition against the heap for each
 row. Can be expensive when the match rate is low and many non-matching rows are found before finding N hits.

The idea for the new scan strategy is: build the TID bitmap from the filter index first (same as a bitmap scan would), then walk the B-tree in order, but instead of going to the heap to check the
filter condition, check each TID against the bitmap. Matching TIDs get returned
directly in order.

I built a small proof of concept as a pgrx extension against PG 18 to try and test this:
https://github.com/lauri3new/ordered-bitmap-scan. The benchmarks there show it using fewer buffer hits than either a bitmap scan or index scan for the example data distribution - which is quite skewed to show the benefit.

I want to caveat that I am a web backend developer and not a Postgres expert by any means, so there's probably good reasons this isn't a thing (planner cost or benefit is too small/niche). I would appreciate though any pointers to prior discussion or feedback on whether this is worth looking into further or not.

Thanks

pgsql-general by date:

Previous
From: Ishan joshi
Date:
Subject: Replication to standby broke with WAL file corruption
Next
From: Igor Korot
Date:
Subject: libpq usage from C++