Re: Improve EXPLAIN output for multicolumn B-Tree Index - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Improve EXPLAIN output for multicolumn B-Tree Index |
Date | |
Msg-id | CAExHW5sZXUQDoV7VkdMGyFHtWLvkZZgfg05jxyW09jXUjtawEQ@mail.gmail.com Whole thread Raw |
In response to | RE: Improve EXPLAIN output for multicolumn B-Tree Index (<Masahiro.Ikeda@nttdata.com>) |
Responses |
RE: Improve EXPLAIN output for multicolumn B-Tree Index
|
List | pgsql-hackers |
On Mon, Jun 24, 2024 at 8:08 AM <Masahiro.Ikeda@nttdata.com> wrote:
> I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without knowing the difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be a better indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any.
> Also, do we want to report bound quals even if they are the same as index conditions or just when they are different?
Thank you for your comment. After receiving your comment, I thought it would be better to also report information that would make the difference in selectivity understandable. One idea I had is to output the number of index tuples inefficiently extracted, like “Rows Removed by Filter”. Users can check the selectivity and efficiency by looking at the number.
Also, I thought it would be better to change the way bound quals are reported to align with the "Filter". I think it would be better to modify it so that it does not output when the bound quals are the same as the index conditions.
In my local PoC patch, I have modified the output as follows, what do you think?
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on ikedamsh.test (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: ((test.id1 = 1) AND (test.id2 = 101)) -- If it’s efficient, the output won’t change.
Planning Time: 5.088 ms
Execution Time: 0.162 ms
(5 rows)
This looks fine. We may highlight in the documentation that lack of Index bound quals in EXPLAIN output indicate that they are same as Index Cond:. Other idea is to use Index Cond and bound quals as property name but that's too long.
=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on ikedamsh.test (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1)
Output: id1, id2, id3, value
Index Cond: (test.id1 = 1) -- Change the output. Show only the bound quals.
Index Filter: (test.id3 = 101) -- New. Output quals which are not used as the bound quals
Rows Removed by Index Filter: 499999 -- New. Output when ANALYZE option is specified
Planning Time: 0.354 ms
Execution Time: 279.908 ms
(7 rows)
I don't think we want to split these clauses. Index Cond should indicate the conditions applied to the index scan. Bound quals should be listed separately even though they will have an intersection with Index Cond. I am not sure whether Index Filter is the right name, maybe Index Bound Cond: But I don't know this area enough to make a final call.
About Rows Removed by Index Filter: it's good to provide a number when ANALYZE is specified, but it will be also better to specify what was estimated. We do that for (cost snd rows etc.) but doing that somewhere in the plan output may not have a precedent. I think we should try that and see what others think.
Best Wishes,
Ashutosh Bapat
pgsql-hackers by date: