The query's behavior is expected due to how PostgreSQL handles composite indexes and MVCC. The index on `(a, b)` is used efficiently for the `a` condition, but the `b IN (<ids>)` filter is more complex, leading to additional filtering rather than direct index usage. Although the index-only scan is utilized, heap fetches still occur to verify tuple visibility, a necessary step when the visibility map doesn’t confirm visibility or to apply the `b` filter accurately. This is standard in PostgreSQL’s handling of such queries, ensuring data consistency and accuracy. Performance remains good, but these heap fetches could be optimized if needed by reconsidering the index structure or query design. Thank you!
On Mon, Aug 19, 2024 at 7:26 AM Stephen Samuel (Sam) <sam@sksamuel.com> wrote:
Hi folks.
I have a table with 4.5m rows per partition (16 partitions) (I know, very small, probably didn't need to be partitioned).
The table has two columns, a bigint and b text.
There is a unique index on (a,b)
The query is:
SELECT b FROM table WHERE a = <id> AND bIN (<ids>)
The visibility map is almost exclusively true. This table gets few updates.
The planner says index only scan, but is filtering on b.
Index Only Scan using pkey on table (cost=0.46..29.09 rows=1 width=19) (actual time=0.033..0.053 rows=10 loops=1) Index Cond: (a = 662028765) " Filter: (b = ANY ('{634579987:662028765,561730945:662028765,505555183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))" Rows Removed by Filter: 1 Heap Fetches: 11 Planning Time: 0.095 ms Execution Time: 0.070 ms
My question is, why isn't it using the index for column b? Is this expected? And why is it doing heap lookups for every row,.