Index INCLUDE vs. Bitmap Index Scan - Mailing list pgsql-hackers

From Markus Winand
Subject Index INCLUDE vs. Bitmap Index Scan
Date
Msg-id 7DF52167-4379-4A1E-A957-90D774EBDF21@winand.at
Whole thread Raw
Responses Re: Index INCLUDE vs. Bitmap Index Scan
Re: Index INCLUDE vs. Bitmap Index Scan
List pgsql-hackers
Hi!

I think Bitmap Index Scan should take advantage of B-tree INCLUDE columns, which it doesn’t at the moment (tested on
masteras of yesterday). 

Consider this (find the setup at the bottom of this mail).

CREATE INDEX idx ON tbl (a, b) INCLUDE (c);

EXPLAIN (analyze, buffers)
SELECT *
  FROM tbl
 WHERE a = 1
   AND c = 1;

The following plan could be produced:

                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=4.14..8.16 rows=1 width=7616) (actual time=0.027..0.029 rows=1 loops=1)
   Recheck Cond: (a = 1)
   Filter: (c = 1)
   Rows Removed by Filter: 7
   Heap Blocks: exact=2
   Buffers: shared hit=2 read=1
   ->  Bitmap Index Scan on idx  (cost=0.00..4.14 rows=1 width=0) (actual time=0.018..0.018 rows=8 loops=1)
         Index Cond: (a = 1)
         Buffers: shared read=1
 Planning Time: 0.615 ms
 Execution Time: 0.060 ms


The Bitmap Index Scan does not filter on column C, which is INCLUDEd in the index leaf nodes.

Instead, the Bitmap Heap Scan fetches unnecessary blocks and then applies the filter.

I would expect a similar execution as with this index:

DROP INDEX idx;
CREATE INDEX idx ON tbl (a, b, c);

                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=4.14..8.16 rows=1 width=7616) (actual time=0.021..0.021 rows=1 loops=1)
   Recheck Cond: ((a = 1) AND (c = 1))
   Heap Blocks: exact=1
   Buffers: shared hit=1 read=1
   ->  Bitmap Index Scan on idx  (cost=0.00..4.14 rows=1 width=0) (actual time=0.018..0.018 rows=1 loops=1)
         Index Cond: ((a = 1) AND (c = 1))
         Buffers: shared read=1
 Planning Time: 0.123 ms
 Execution Time: 0.037 ms

(As a side node: I also dislike it how Bitmap Index Scan mixes search conditions and filters in “Index Cond”)

Due to the not-filtered column B in the index, the use of column C is here pretty much the same as it is for the first
index,which has C in INCLUDE. 

Am I missing anything or is it just an oversight because INCLUDE was primarily done for Index Only Scan?

As a background, here is the use case I see for this scenario:
Filters that can not be used for searching the tree can be put in INCLUDE instead of the key-part of the index even if
thereis no intend to run the query as Index Only Scan (e.g. SELECT *). Columns in INCLUDE can be used for filtering
(worksfine for Index Only Scan, btw). 

The most prominent example are post-fix or in-fix LIKE ‘%term%’ filters. The benefit of putting such columns in INCLUDE
isthat it is clearly documented that the index column is neither used for searching in the tree nor for ordering. It is
eitherused for filtering, or for fetching. Knowing this makes it easier to extend existing index. 

Imagine you have this query to optimize:

SELECT *
  FROM …
 WHERE a = ?
   AND b = ?
 ORDER BY ts DESC
 LIMIT 10

And find this existing index on that table:

CREATE INDEX … ON … (a, b) INCLUDE (c);

In this case it is a rather safe option to replace the index with the following:

CREATE INDEX … ON … (a, b, ts) INCLUDE (c);

On the other hand, if the original index had all column in the key part, changing it to (A, B, TS, C) is a rather
dangerousoption. 

-markus

— Setup for the demo

-- demo table: 4 rows per block
-- the row if interest is the first one, the others spill to a second block so the problem can also seen in “Buffers"

CREATE TABLE tbl (a INTEGER, b INTEGER, c INTEGER, junk CHAR(1900));

INSERT INTO tbl VALUES (1, 1, 1, 'junk'),
                       (1, 1, 9, 'junk'),
                       (1, 1, 9, 'junk'),
                       (1, 1, 9, 'junk'),
                       (1, 1, 9, 'junk'),
                       (1, 1, 9, 'junk'),
                       (1, 1, 9, 'junk'),
                       (1, 1, 9, 'junk');

-- prevent unwanted plans for demo of Bitmap Index+Heap Scan
SET ENABLE_INDEXSCAN = OFF;
SET ENABLE_SEQSCAN = OFF;


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Remove Deprecated Exclusive Backup Mode
Next
From: Noah Misch
Date:
Subject: Re: No-rewrite timestamp<->timestamptz conversions