Improve EXPLAIN output for multicolumn B-Tree Index - Mailing list pgsql-hackers

From
Subject Improve EXPLAIN output for multicolumn B-Tree Index
Date
Msg-id TYWPR01MB1098260B694D27758FE2BA46FB1C92@TYWPR01MB10982.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: Improve EXPLAIN output for multicolumn B-Tree Index
Re: Improve EXPLAIN output for multicolumn B-Tree Index
Re: Improve EXPLAIN output for multicolumn B-Tree Index
List pgsql-hackers

Hi,

 

Regarding the multicolumn B-Tree Index, I'm considering

if we can enhance the EXPLAIN output. There have been requests

for this from our customer.

 

As the document says, we need to use it carefully.

> The exact rule is that equality constraints on leading columns,

> plus any inequality constraints on the first column that does

> not have an equality constraint, will be used to limit the portion

> of the index that is scanned.

https://www.postgresql.org/docs/17/indexes-multicolumn.html

 

However, it's not easy to confirm whether multi-column indexes are

being used efficiently because we need to compare the index

definitions and query conditions individually.

 

For instance, just by looking at the following EXPLAIN result, we

can't determine whether the index is being used efficiently or not

at a glance. Indeed, the current index definition is not suitable

for the query, so the cost is significantly high.

 

  =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;

                                                           QUERY PLAN                                                        

  ----------------------------------------------------------------------------------------------------------------------------

   Index Scan using test_idx on public.test  (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)

     Output: id1, id2, id3, value

     Index Cond: ((test.id1 = 1) AND (test.id3 = 101))    -- Is it efficient or not?

   Planning Time: 0.145 ms

   Execution Time: 54.150 ms

  (6 rows)

 

So, I'd like to improve the output to be more user-friendly.

 

 

# Idea

 

I'm considering adding new information, "Index Bound Cond", which specifies

what quals will be used for the boundary condition of the B-Tree index.

(Since this is just my current idea, I'm open to changing the output.)

 

Here is an example output.

 

-- prepare for the test

CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));

CREATE INDEX test_idx ON test(id1, id2, id3);                -- multicolumn B-Tree index

INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM generate_series(1,1000000) s(i));

 ANALYZE;

 

-- explain

 

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;

                                                       QUERY PLAN                                                      

 -----------------------------------------------------------------------------------------------------------------------

  Index Scan using test_idx on public.test  (cost=0.42..8.45 rows=1 width=18) (actual time=0.046..0.047 rows=1 loops=1)

    Output: id1, id2, id3, value

    Index Cond: ((test.id1 = 1) AND (test.id2 = 101))

    Index Bound Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- The B-Tree index is used efficiently.

  Planning Time: 0.124 ms

  Execution Time: 0.076 ms

(6 rows)

 =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;

                                                          QUERY PLAN                                                        

 ----------------------------------------------------------------------------------------------------------------------------

  Index Scan using test_idx on public.test  (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)

    Output: id1, id2, id3, value

    Index Cond: ((test.id1 = 1) AND (test.id3 = 101))

    Index Bound Cond: (test.id1 = 1)                        -- The B-tree index is *not* used efficiently

                                                          -- compared to the previous execution conditions,

                                                          -- because it differs from "Index Cond".

  Planning Time: 0.145 ms

  Execution Time: 54.150 ms

(6 rows)

 

 

# PoC patch

 

The PoC patch makes the following changes:

 

* Adds a new variable related to bound conditions

  to IndexPath, IndexScan, IndexOnlyScan, and BitmapIndexScan

* Adds quals for bound conditions to IndexPath when estimating cost, since

  the B-Tree index considers the boundary condition in btcostestimate()

* Adds quals for bound conditions to the output of EXPLAIN

 

 

 

Thank you for reading my suggestion. Please feel free to comment.

 

* Is this feature useful? Is there a possibility it will be accepted?

* Are there any other ideas for determining if multicolumn indexes are

being used efficiently? Although I considered calculating the efficiency using

pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,

 I believe improving the EXPLAIN output is better because it can be output

per query and it's more user-friendly.

* Is "Index Bound Cond" the proper term?I also considered changing

"Index Cond" to only show quals for the boundary condition and adding

a new term "Index Filter".

* Would it be better to add new interfaces to Index AM? Is there any case

  to output the EXPLAIN for each index context? At least, I think it's worth

  considering whether it's good for amcostestimate() to modify the

  IndexPath directly as the PoC patch does.

 

 

Regards,

--

Masahiro Ikeda

NTT DATA CORPORATION

 

Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: pgsql: Add more SQL/JSON constructor functions
Next
From: Peter Smith
Date:
Subject: Re: Pgoutput not capturing the generated columns