Thread: Improve EXPLAIN output for multicolumn B-Tree Index

Hi,

 

Regarding the multicolumn B-Tree Index, I'm considering

if we can enhance the EXPLAIN output. There have been requests

for this from our customer.

 

As the document says, we need to use it carefully.

> The exact rule is that equality constraints on leading columns,

> plus any inequality constraints on the first column that does

> not have an equality constraint, will be used to limit the portion

> of the index that is scanned.

https://www.postgresql.org/docs/17/indexes-multicolumn.html

 

However, it's not easy to confirm whether multi-column indexes are

being used efficiently because we need to compare the index

definitions and query conditions individually.

 

For instance, just by looking at the following EXPLAIN result, we

can't determine whether the index is being used efficiently or not

at a glance. Indeed, the current index definition is not suitable

for the query, so the cost is significantly high.

 

  =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;

                                                           QUERY PLAN                                                        

  ----------------------------------------------------------------------------------------------------------------------------

   Index Scan using test_idx on public.test  (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)

     Output: id1, id2, id3, value

     Index Cond: ((test.id1 = 1) AND (test.id3 = 101))    -- Is it efficient or not?

   Planning Time: 0.145 ms

   Execution Time: 54.150 ms

  (6 rows)

 

So, I'd like to improve the output to be more user-friendly.

 

 

# Idea

 

I'm considering adding new information, "Index Bound Cond", which specifies

what quals will be used for the boundary condition of the B-Tree index.

(Since this is just my current idea, I'm open to changing the output.)

 

Here is an example output.

 

-- prepare for the test

CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));

CREATE INDEX test_idx ON test(id1, id2, id3);                -- multicolumn B-Tree index

INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM generate_series(1,1000000) s(i));

 ANALYZE;

 

-- explain

 

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;

                                                       QUERY PLAN                                                      

 -----------------------------------------------------------------------------------------------------------------------

  Index Scan using test_idx on public.test  (cost=0.42..8.45 rows=1 width=18) (actual time=0.046..0.047 rows=1 loops=1)

    Output: id1, id2, id3, value

    Index Cond: ((test.id1 = 1) AND (test.id2 = 101))

    Index Bound Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- The B-Tree index is used efficiently.

  Planning Time: 0.124 ms

  Execution Time: 0.076 ms

(6 rows)

 =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;

                                                          QUERY PLAN                                                        

 ----------------------------------------------------------------------------------------------------------------------------

  Index Scan using test_idx on public.test  (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)

    Output: id1, id2, id3, value

    Index Cond: ((test.id1 = 1) AND (test.id3 = 101))

    Index Bound Cond: (test.id1 = 1)                        -- The B-tree index is *not* used efficiently

                                                          -- compared to the previous execution conditions,

                                                          -- because it differs from "Index Cond".

  Planning Time: 0.145 ms

  Execution Time: 54.150 ms

(6 rows)

 

 

# PoC patch

 

The PoC patch makes the following changes:

 

* Adds a new variable related to bound conditions

  to IndexPath, IndexScan, IndexOnlyScan, and BitmapIndexScan

* Adds quals for bound conditions to IndexPath when estimating cost, since

  the B-Tree index considers the boundary condition in btcostestimate()

* Adds quals for bound conditions to the output of EXPLAIN

 

 

 

Thank you for reading my suggestion. Please feel free to comment.

 

* Is this feature useful? Is there a possibility it will be accepted?

* Are there any other ideas for determining if multicolumn indexes are

being used efficiently? Although I considered calculating the efficiency using

pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,

 I believe improving the EXPLAIN output is better because it can be output

per query and it's more user-friendly.

* Is "Index Bound Cond" the proper term?I also considered changing

"Index Cond" to only show quals for the boundary condition and adding

a new term "Index Filter".

* Would it be better to add new interfaces to Index AM? Is there any case

  to output the EXPLAIN for each index context? At least, I think it's worth

  considering whether it's good for amcostestimate() to modify the

  IndexPath directly as the PoC patch does.

 

 

Regards,

--

Masahiro Ikeda

NTT DATA CORPORATION

 

Attachment

Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Ashutosh Bapat
Date:


On Fri, Jun 21, 2024 at 12:42 PM <Masahiro.Ikeda@nttdata.com> wrote:

Hi,

 

Regarding the multicolumn B-Tree Index, I'm considering

if we can enhance the EXPLAIN output. There have been requests

for this from our customer.

 

As the document says, we need to use it carefully.

> The exact rule is that equality constraints on leading columns,

> plus any inequality constraints on the first column that does

> not have an equality constraint, will be used to limit the portion

> of the index that is scanned.

https://www.postgresql.org/docs/17/indexes-multicolumn.html

 

However, it's not easy to confirm whether multi-column indexes are

being used efficiently because we need to compare the index

definitions and query conditions individually.

 

For instance, just by looking at the following EXPLAIN result, we

can't determine whether the index is being used efficiently or not

at a glance. Indeed, the current index definition is not suitable

for the query, so the cost is significantly high.

 

  =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;

                                                           QUERY PLAN                                                        

  ----------------------------------------------------------------------------------------------------------------------------

   Index Scan using test_idx on public.test  (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)

     Output: id1, id2, id3, value

     Index Cond: ((test.id1 = 1) AND (test.id3 = 101))    -- Is it efficient or not?

   Planning Time: 0.145 ms

   Execution Time: 54.150 ms

  (6 rows)

 

So, I'd like to improve the output to be more user-friendly.

 

 

# Idea

 

I'm considering adding new information, "Index Bound Cond", which specifies

what quals will be used for the boundary condition of the B-Tree index.

(Since this is just my current idea, I'm open to changing the output.)

 

Here is an example output.

 

-- prepare for the test

CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));

CREATE INDEX test_idx ON test(id1, id2, id3);                -- multicolumn B-Tree index

INSERT INTO test (SELECT i % 2, i, i, 'hello' FROM generate_series(1,1000000) s(i));

 ANALYZE;

 

-- explain

 

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;

                                                       QUERY PLAN                                                      

 -----------------------------------------------------------------------------------------------------------------------

  Index Scan using test_idx on public.test  (cost=0.42..8.45 rows=1 width=18) (actual time=0.046..0.047 rows=1 loops=1)

    Output: id1, id2, id3, value

    Index Cond: ((test.id1 = 1) AND (test.id2 = 101))

    Index Bound Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- The B-Tree index is used efficiently.

  Planning Time: 0.124 ms

  Execution Time: 0.076 ms

(6 rows)

 =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;

                                                          QUERY PLAN                                                        

 ----------------------------------------------------------------------------------------------------------------------------

  Index Scan using test_idx on public.test  (cost=0.42..12754.76 rows=1 width=18) (actual time=0.033..54.115 rows=1 loops=1)

    Output: id1, id2, id3, value

    Index Cond: ((test.id1 = 1) AND (test.id3 = 101))

    Index Bound Cond: (test.id1 = 1)                        -- The B-tree index is *not* used efficiently

                                                          -- compared to the previous execution conditions,

                                                          -- because it differs from "Index Cond".

  Planning Time: 0.145 ms

  Execution Time: 54.150 ms

(6 rows)

 

 

# PoC patch

 

The PoC patch makes the following changes:

 

* Adds a new variable related to bound conditions

  to IndexPath, IndexScan, IndexOnlyScan, and BitmapIndexScan

* Adds quals for bound conditions to IndexPath when estimating cost, since

  the B-Tree index considers the boundary condition in btcostestimate()

* Adds quals for bound conditions to the output of EXPLAIN

 

 

 

Thank you for reading my suggestion. Please feel free to comment.

 

* Is this feature useful? Is there a possibility it will be accepted?

* Are there any other ideas for determining if multicolumn indexes are

being used efficiently? Although I considered calculating the efficiency using

pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,

 I believe improving the EXPLAIN output is better because it can be output

per query and it's more user-friendly.

* Is "Index Bound Cond" the proper term?I also considered changing

"Index Cond" to only show quals for the boundary condition and adding

a new term "Index Filter".

* Would it be better to add new interfaces to Index AM? Is there any case

  to output the EXPLAIN for each index context? At least, I think it's worth

  considering whether it's good for amcostestimate() to modify the

  IndexPath directly as the PoC patch does.

 


I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without knowing the difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be a better indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any. 

Also, do we want to report bound quals even if they are the same as index conditions or just when they are different?
--
Best Wishes,
Ashutosh Bapat

Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Yugo NAGATA
Date:
On Fri, 21 Jun 2024 07:12:25 +0000
<Masahiro.Ikeda@nttdata.com> wrote:

> * Is this feature useful? Is there a possibility it will be accepted?

I think adding such information to EXPLAIN outputs is useful because it
will help users confirm the effect of a multicolumn index on a certain query
and decide to whether leave, drop, or recreate the index, and so on.
 
> * Are there any other ideas for determining if multicolumn indexes are
> 
> being used efficiently? Although I considered calculating the efficiency using
> 
> pg_statio_all_indexes.idx_blks_read and pg_stat_all_indexes.idx_tup_read,
> 
>  I believe improving the EXPLAIN output is better because it can be output
> 
> per query and it's more user-friendly.

It seems for me improving EXPLAIN is a natural way to show information
on query optimization like index scans.
 
> * Is "Index Bound Cond" the proper term?I also considered changing
> 
> "Index Cond" to only show quals for the boundary condition and adding
> 
> a new term "Index Filter".

"Index Bound Cond" seems not intuitive for me because I could not find
description explaining what this means from the documentation. I like
"Index Filter" that implies the index has to be scanned.
 
> * Would it be better to add new interfaces to Index AM? Is there any case
> 
>   to output the EXPLAIN for each index context? At least, I think it's worth
> 
>   considering whether it's good for amcostestimate() to modify the
> 
>   IndexPath directly as the PoC patch does.

I am not sure it is the best way to modify IndexPath in amcostestimate(), but
I don't have better ideas for now.

Regards,
Yugo Nagata

> 
> 
> 
> 
> Regards,
> 
> --
> 
> Masahiro Ikeda
> 
> NTT DATA CORPORATION
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



> I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without
knowingthe difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be
abetter indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any. 
 
> Also, do we want to report bound quals even if they are the same as index conditions or just when they are
different?

Thank you for your comment. After receiving your comment, I thought it would be better to also report information that
wouldmake the difference in selectivity understandable. One idea I had is to output the number of index tuples
inefficientlyextracted, like “Rows Removed by Filter”. Users can check the selectivity and efficiency by looking at the
number.

Also, I thought it would be better to change the way bound quals are reported to align with the "Filter". I think it
wouldbe better to modify it so that it does not output when the bound quals are the same as the index conditions.
 

In my local PoC patch, I have modified the output as follows, what do you think?

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
                                                       QUERY PLAN

 

-------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_idx on ikedamsh.test  (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1
loops=1)
   Output: id1, id2, id3, value
   Index Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- If it’s efficient, the output won’t change.
 Planning Time: 5.088 ms
 Execution Time: 0.162 ms
(5 rows)

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
                                                          QUERY PLAN
      
 

-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_idx on ikedamsh.test  (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1
loops=1)
   Output: id1, id2, id3, value
   Index Cond: (test.id1 = 1)                 -- Change the output. Show only the bound quals. 
   Index Filter: (test.id3 = 101)              -- New. Output quals which are not used as the bound quals
   Rows Removed by Index Filter: 499999    -- New. Output when ANALYZE option is specified
 Planning Time: 0.354 ms
 Execution Time: 279.908 ms
(7 rows)

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

> > * Is this feature useful? Is there a possibility it will be accepted?
>
> I think adding such information to EXPLAIN outputs is useful because it will help users
> confirm the effect of a multicolumn index on a certain query and decide to whether
> leave, drop, or recreate the index, and so on.

Thank you for your comments and for empathizing with the utility of the approach.

> > * Are there any other ideas for determining if multicolumn indexes are
> >
> > being used efficiently? Although I considered calculating the
> > efficiency using
> >
> > pg_statio_all_indexes.idx_blks_read and
> > pg_stat_all_indexes.idx_tup_read,
> >
> >  I believe improving the EXPLAIN output is better because it can be
> > output
> >
> > per query and it's more user-friendly.
>
> It seems for me improving EXPLAIN is a natural way to show information on query
> optimization like index scans.

OK, I'll proceed with the way.

> > * Is "Index Bound Cond" the proper term?I also considered changing
> >
> > "Index Cond" to only show quals for the boundary condition and adding
> >
> > a new term "Index Filter".
>
> "Index Bound Cond" seems not intuitive for me because I could not find description
> explaining what this means from the documentation. I like "Index Filter" that implies the
> index has to be scanned.

OK, I think you are right. Even at this point, there are things like ‘Filter’ and
‘Rows Removed by Filter’, so it seems natural to align with them. I described a
new output example in the previous email, how about that?

> > * Would it be better to add new interfaces to Index AM? Is there any
> > case
> >
> >   to output the EXPLAIN for each index context? At least, I think it's
> > worth
> >
> >   considering whether it's good for amcostestimate() to modify the
> >
> >   IndexPath directly as the PoC patch does.
>
> I am not sure it is the best way to modify IndexPath in amcostestimate(), but I don't
> have better ideas for now.

OK, I’ll consider what the best way to change is. In addition, if we add
"Rows Removed by Index Filter", we might need to consider a method to receive the
number of filtered tuples at execution time from Index AM.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Matthias van de Meent
Date:
On Mon, 24 Jun 2024 at 04:38, <Masahiro.Ikeda@nttdata.com> wrote:
>
> In my local PoC patch, I have modified the output as follows, what do you think?
>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
>                                                        QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_idx on ikedamsh.test  (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1
loops=1)
>    Output: id1, id2, id3, value
>    Index Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- If it’s efficient, the output won’t change.
>  Planning Time: 5.088 ms
>  Execution Time: 0.162 ms
> (5 rows)
>
> =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using test_idx on ikedamsh.test  (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1
loops=1)
>    Output: id1, id2, id3, value
>    Index Cond: (test.id1 = 1)                 -- Change the output. Show only the bound quals.
>    Index Filter: (test.id3 = 101)              -- New. Output quals which are not used as the bound quals

I think this is too easy to confuse with the pre-existing 'Filter'
condition, which you'll find on indexes with INCLUDE-d columns or
filters on non-index columns.
Furthermore, I think this is probably not helpful (maybe even harmful)
for index types like GIN and BRIN, where index searchkey order is
mostly irrelevant to the index shape and performance.
Finally, does this change the index AM API? Does this add another
scankey argument to ->amrescan?

>    Rows Removed by Index Filter: 499999    -- New. Output when ANALYZE option is specified

Separate from the changes to Index Cond/Index Filter output changes I
think this can be useful output, though I'd probably let the AM
specify what kind of filter data to display.
E.g. BRIN may well want to display how many ranges matched the
predicate, vs how many ranges were unsummarized and thus returned; two
conditions which aren't as easy to differentiate but can be important
debugging query performance.

>  Planning Time: 0.354 ms
>  Execution Time: 279.908 ms
> (7 rows)

Was this a test against the same dataset as the one you'd posted your
measurements of your first patchset with? The execution time seems to
have slown down quite significantly, so if the testset is the same
then this doesn't bode well for your patchset.


Kind regards,

Matthias van de Meent



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Jelte Fennema-Nio
Date:
+1 for the idea.

On Mon, 24 Jun 2024 at 11:11, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> I think this is too easy to confuse with the pre-existing 'Filter'
> condition, which you'll find on indexes with INCLUDE-d columns or
> filters on non-index columns.

Why not combine them? And both call them Filter? In a sense this
filtering acts very similar to INCLUDE based filtering (for btrees at
least). Although I might be wrong about that, because when I try to
confirm the same perf using the following script I do get quite
different timings (maybe you have an idea what's going on here). But
even if it does mean something slightly different perf wise, I think
using Filter for both is unlikely to confuse anyone. Since, while
allowed, it seems extremely unlikely in practice that someone will use
the same column as part of the indexed columns and as part of the
INCLUDE-d columns (why would you store the same info twice).

CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));
INSERT INTO test (SELECT i % 10, i % 1000, i, 'hello' FROM
generate_series(1,1000000) s(i));
vacuum freeze test;
CREATE INDEX test_idx_include ON test(id1, id2) INCLUDE (id3);
ANALYZE test;
EXPLAIN (VERBOSE, ANALYZE, BUFFERS) SELECT id1, id3 FROM test WHERE
id1 = 1 AND id3 = 101;
CREATE INDEX test_idx ON test(id1, id2, id3);
ANALYZE test;
EXPLAIN (VERBOSE, ANALYZE, BUFFERS) SELECT id1, id3 FROM test WHERE
id1 = 1 AND id3 = 101;

                                                              QUERY PLAN
───────────────────────────────────────
 Index Only Scan using test_idx_include on public.test
(cost=0.42..3557.09 rows=1 width=8) (actual time=0.708..6.639 rows=1
loops=1)
   Output: id1, id3
   Index Cond: (test.id1 = 1)
   Filter: (test.id3 = 101)
   Rows Removed by Filter: 99999
   Heap Fetches: 0
   Buffers: shared hit=1 read=386
 Query Identifier: 471139784017641093
 Planning:
   Buffers: shared hit=8 read=1
 Planning Time: 0.091 ms
 Execution Time: 6.656 ms
(12 rows)

Time: 7.139 ms
                                                          QUERY PLAN
─────────────────────────────────────
 Index Only Scan using test_idx on public.test  (cost=0.42..2591.77
rows=1 width=8) (actual time=0.238..2.110 rows=1 loops=1)
   Output: id1, id3
   Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
   Heap Fetches: 0
   Buffers: shared hit=1 read=386
 Query Identifier: 471139784017641093
 Planning:
   Buffers: shared hit=10 read=1
 Planning Time: 0.129 ms
 Execution Time: 2.128 ms
(10 rows)

Time: 2.645 ms



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Matthias van de Meent
Date:
On Mon, 24 Jun 2024 at 11:58, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> +1 for the idea.
>
> On Mon, 24 Jun 2024 at 11:11, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > I think this is too easy to confuse with the pre-existing 'Filter'
> > condition, which you'll find on indexes with INCLUDE-d columns or
> > filters on non-index columns.
>
> Why not combine them? And both call them Filter? In a sense this
> filtering acts very similar to INCLUDE based filtering (for btrees at
> least).

It does not really behave similar: index scan keys (such as the
id3=101 scankey) don't require visibility checks in the btree code,
while the Filter condition _does_ require a visibility check, and
delegates the check to the table AM if the scan isn't Index-Only, or
if the VM didn't show all-visible during the check.

Furthermore, the index could use the scankey to improve the number of
keys to scan using "skip scans"; by realising during a forward scan
that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you
can skip forward to (1, 3, _), rather than having to go through tuples
(1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for
INCLUDE-d columns, because their datatypes and structure are opaque to
the index AM; the AM cannot assume anything about or do anything with
those values.

> Although I might be wrong about that, because when I try to
> confirm the same perf using the following script I do get quite
> different timings (maybe you have an idea what's going on here). But
> even if it does mean something slightly different perf wise, I think
> using Filter for both is unlikely to confuse anyone.

I don't want A to to be the plan, while showing B' to the user, as the
performance picture for the two may be completely different. And, as I
mentioned upthread, the differences between AMs in the (lack of)
meaning in index column order also makes it quite wrong to generally
separate prefixes equalities from the rest of the keys.

> Since, while
> allowed, it seems extremely unlikely in practice that someone will use
> the same column as part of the indexed columns and as part of the
> INCLUDE-d columns (why would you store the same info twice).

Yeah, people don't generally include the same index column more than
once in the same index.

> CREATE INDEX test_idx_include ON test(id1, id2) INCLUDE (id3);
> CREATE INDEX test_idx ON test(id1, id2, id3);
>
>                                                               QUERY PLAN
> ───────────────────────────────────────
>  Index Only Scan using test_idx_include on public.test
[...]
> Time: 7.139 ms
>                                                           QUERY PLAN
> ─────────────────────────────────────
>  Index Only Scan using test_idx on public.test  (cost=0.42..2591.77
[...]
> Time: 2.645 ms

As you can see, there's a huge difference in performance. Putting both
non-bound and "normal" filter clauses in the same Filter clause will
make it more difficult to explain performance issues based on only the
explain output.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Jelte Fennema-Nio
Date:
On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> It does not really behave similar: index scan keys (such as the
> id3=101 scankey) don't require visibility checks in the btree code,
> while the Filter condition _does_ require a visibility check, and
> delegates the check to the table AM if the scan isn't Index-Only, or
> if the VM didn't show all-visible during the check.

Any chance you could point me in the right direction for the
code/docs/comment about this? I'd like to learn a bit more about why
that is the case, because I didn't realize visibility checks worked
differently for index scan keys and Filter keys.

> Furthermore, the index could use the scankey to improve the number of
> keys to scan using "skip scans"; by realising during a forward scan
> that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you
> can skip forward to (1, 3, _), rather than having to go through tuples
> (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for
> INCLUDE-d columns, because their datatypes and structure are opaque to
> the index AM; the AM cannot assume anything about or do anything with
> those values.

Does Postgres actually support this currently? I thought skip scans
were not available (yet).

> I don't want A to to be the plan, while showing B' to the user, as the
> performance picture for the two may be completely different. And, as I
> mentioned upthread, the differences between AMs in the (lack of)
> meaning in index column order also makes it quite wrong to generally
> separate prefixes equalities from the rest of the keys.

Yeah, that makes sense. These specific explain lines probably
only/mostly make sense for btree. So yeah we'd want the index AM to be
able to add some stuff to the explain plan.

> As you can see, there's a huge difference in performance. Putting both
> non-bound and "normal" filter clauses in the same Filter clause will
> make it more difficult to explain performance issues based on only the
> explain output.

Fair enough, that's of course the main point of this patch in the
first place: being able to better interpret the explain plan when you
don't have access to the schema. Still I think Filter is the correct
keyword for both, so how about we make it less confusing by making the
current "Filter" more specific by calling it something like "Non-key
Filter" or "INCLUDE Filter" and then call the other something like
"Index Filter" or "Secondary Bound Filter".



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Ashutosh Bapat
Date:


On Mon, Jun 24, 2024 at 8:08 AM <Masahiro.Ikeda@nttdata.com> wrote:
> I am unable to decide whether reporting the bound quals is just enough to decide the efficiency of index without knowing the difference in the number of index tuples selectivity and heap tuple selectivity. The difference seems to be a better indicator of index efficiency whereas the bound quals will help debug the in-efficiency, if any. 
> Also, do we want to report bound quals even if they are the same as index conditions or just when they are different?

Thank you for your comment. After receiving your comment, I thought it would be better to also report information that would make the difference in selectivity understandable. One idea I had is to output the number of index tuples inefficiently extracted, like “Rows Removed by Filter”. Users can check the selectivity and efficiency by looking at the number.

Also, I thought it would be better to change the way bound quals are reported to align with the "Filter". I think it would be better to modify it so that it does not output when the bound quals are the same as the index conditions.

In my local PoC patch, I have modified the output as follows, what do you think?

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 = 101;
                                                       QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_idx on ikedamsh.test  (cost=0.42..8.45 rows=1 width=18) (actual time=0.082..0.086 rows=1 loops=1)
   Output: id1, id2, id3, value
   Index Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- If it’s efficient, the output won’t change.
 Planning Time: 5.088 ms
 Execution Time: 0.162 ms
(5 rows)

This looks fine. We may highlight in the documentation that lack of Index bound quals in EXPLAIN output indicate that they are same as Index Cond:. Other idea is to use Index Cond and bound quals as property name but that's too long.
 

=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_idx on ikedamsh.test  (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1 loops=1)
   Output: id1, id2, id3, value
   Index Cond: (test.id1 = 1)                 -- Change the output. Show only the bound quals.
   Index Filter: (test.id3 = 101)              -- New. Output quals which are not used as the bound quals
   Rows Removed by Index Filter: 499999    -- New. Output when ANALYZE option is specified
 Planning Time: 0.354 ms
 Execution Time: 279.908 ms
(7 rows)

I don't think we want to split these clauses. Index Cond should indicate the conditions applied to the index scan. Bound quals should be listed separately even though they will have an intersection with Index Cond. I am not sure whether Index Filter is the right name, maybe Index Bound Cond: But I don't know this area enough to make a final call.

About Rows Removed by Index Filter: it's good to provide a number when ANALYZE is specified, but it will be also better to specify what was estimated. We do that for (cost snd rows etc.) but doing that somewhere in the plan output may not have a precedent. I think we should try that and see what others think.

--
Best Wishes,
Ashutosh Bapat

Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Matthias van de Meent
Date:
On Mon, 24 Jun 2024 at 14:42, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > It does not really behave similar: index scan keys (such as the
> > id3=101 scankey) don't require visibility checks in the btree code,
> > while the Filter condition _does_ require a visibility check, and
> > delegates the check to the table AM if the scan isn't Index-Only, or
> > if the VM didn't show all-visible during the check.
>
> Any chance you could point me in the right direction for the
> code/docs/comment about this? I'd like to learn a bit more about why
> that is the case, because I didn't realize visibility checks worked
> differently for index scan keys and Filter keys.

This can be derived by combining how Filter works (it only filters the
returned live tuples) and how Index-Only scans work (return the index
tuple, unless !ALL_VISIBLE, in which case the heap tuple is
projected). There have been several threads more or less recently that
also touch this topic and closely related topics, e.g. [0][1].

> > Furthermore, the index could use the scankey to improve the number of
> > keys to scan using "skip scans"; by realising during a forward scan
> > that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you
> > can skip forward to (1, 3, _), rather than having to go through tuples
> > (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for
> > INCLUDE-d columns, because their datatypes and structure are opaque to
> > the index AM; the AM cannot assume anything about or do anything with
> > those values.
>
> Does Postgres actually support this currently? I thought skip scans
> were not available (yet).

Peter Geoghegan has been working on it as project after PG17's
IN()-list improvements were committed, and I hear he has the basics
working but the further details need fleshing out.

> > As you can see, there's a huge difference in performance. Putting both
> > non-bound and "normal" filter clauses in the same Filter clause will
> > make it more difficult to explain performance issues based on only the
> > explain output.
>
> Fair enough, that's of course the main point of this patch in the
> first place: being able to better interpret the explain plan when you
> don't have access to the schema. Still I think Filter is the correct
> keyword for both, so how about we make it less confusing by making the
> current "Filter" more specific by calling it something like "Non-key
> Filter" or "INCLUDE Filter" and then call the other something like
> "Index Filter" or "Secondary Bound Filter".

I'm not sure how debuggable explain plans are without access to the
schema, especially when VERBOSE isn't configured, so I would be
hesitant to accept that as an argument here.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0]
https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU%3D%40yamlcoder.me
[1] https://www.postgresql.org/message-id/flat/cf85f46f-b02f-05b2-5248-5000b894ebab%40enterprisedb.com



> On Mon, 24 Jun 2024 at 04:38, <Masahiro.Ikeda@nttdata.com> wrote:
> >
> > In my local PoC patch, I have modified the output as follows, what do you think?
> >
> > =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id2 =
> 101;
> >                                                        QUERY PLAN
> > ----------------------------------------------------------------------
> > ---------------------------------------------------
> >  Index Scan using test_idx on ikedamsh.test  (cost=0.42..8.45 rows=1 width=18)
> (actual time=0.082..0.086 rows=1 loops=1)
> >    Output: id1, id2, id3, value
> >    Index Cond: ((test.id1 = 1) AND (test.id2 = 101))  -- If it’s efficient, the output
> won’t change.
> >  Planning Time: 5.088 ms
> >  Execution Time: 0.162 ms
> > (5 rows)
> >
> > =# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 =
> 101;
> >                                                           QUERY PLAN
> > ----------------------------------------------------------------------
> > ---------------------------------------------------------
> >  Index Scan using test_idx on ikedamsh.test  (cost=0.42..12630.10 rows=1
> width=18) (actual time=0.175..279.819 rows=1 loops=1)
> >    Output: id1, id2, id3, value
> >    Index Cond: (test.id1 = 1)                 -- Change the output. Show only the
> bound quals.
> >    Index Filter: (test.id3 = 101)              -- New. Output quals which are not
> used as the bound quals
> 
> I think this is too easy to confuse with the pre-existing 'Filter'
> condition, which you'll find on indexes with INCLUDE-d columns or filters on non-index
> columns.

Thanks for your comment. I forgot the case.

> Furthermore, I think this is probably not helpful (maybe even harmful) for index types
> like GIN and BRIN, where index searchkey order is mostly irrelevant to the index shape
> and performance.

Yes, I expected that only B-Tree index support the feature.

> Finally, does this change the index AM API? Does this add another scankey argument to
> ->amrescan?

Yes, I think so. But since I'd like to make users know the index scan will happen without
ANALYZE, I planned to change amcostestimate for "Index Filter" and amrescan() for 
"Rows Removed by Index Filter".

> >    Rows Removed by Index Filter: 499999    -- New. Output when ANALYZE option
> is specified
> 
> Separate from the changes to Index Cond/Index Filter output changes I think this can
> be useful output, though I'd probably let the AM specify what kind of filter data to
> display.
> E.g. BRIN may well want to display how many ranges matched the predicate, vs how
> many ranges were unsummarized and thus returned; two conditions which aren't as
> easy to differentiate but can be important debugging query performance.

OK, thanks. I understood that it would be nice if we could customize to output information
specific to other indexes like BRIN.

> >  Planning Time: 0.354 ms
> >  Execution Time: 279.908 ms
> > (7 rows)
> 
> Was this a test against the same dataset as the one you'd posted your measurements of
> your first patchset with? The execution time seems to have slown down quite
> significantly, so if the testset is the same then this doesn't bode well for your patchset.

Yes, the reason is that the cache hit ratio is very low since I tested after I restarted the 
machine. I had to add BUFFERS option.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

> +1 for the idea.

Thanks! I was interested in the test result that you shared.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

>>=# EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test WHERE id1 = 1 AND id3 = 101;
>>                                                          QUERY PLAN                                                 
        
 

>>-------------------------------------------------------------------------------------------------------------------------------
>> Index Scan using test_idx on ikedamsh.test  (cost=0.42..12630.10 rows=1 width=18) (actual time=0.175..279.819 rows=1
loops=1)
>>   Output: id1, id2, id3, value
>>   Index Cond: (test.id1 = 1)                 -- Change the output. Show only the bound quals. 
>>   Index Filter: (test.id3 = 101)              -- New. Output quals which are not used as the bound quals
>>   Rows Removed by Index Filter: 499999    -- New. Output when ANALYZE option is specified
>> Planning Time: 0.354 ms
>> Execution Time: 279.908 ms
>> (7 rows)
>
> I don't think we want to split these clauses. Index Cond should indicate the conditions applied
> to the index scan. Bound quals should be listed separately even though they will have an
> intersection with Index Cond. I am not sure whether Index Filter is the right name, 
> maybe Index Bound Cond: But I don't know this area enough to make a final call.

OK, I understood that it's better to only add new ones. I think "Index Filter" fits other than "Index
Bound Cond" if we introduce "Rows Removed By Index Filter".

> About Rows Removed by Index Filter: it's good to provide a number when ANALYZE is
> specified, but it will be also better to specify what was estimated. We do that for (cost snd rows etc.)
> but doing that somewhere in the plan output may not have a precedent. I think we should try that
> and see what others think.

It's interesting! It’s an idea that can be applied not only to multi-column indexes, right?
I will consider the implementation and discuss it in a new thread. However, I would like to
focus on the feature to output information about multi-column indexes at first.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

> On Mon, 24 Jun 2024 at 14:42, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> >
> > On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> > > It does not really behave similar: index scan keys (such as the
> > > id3=101 scankey) don't require visibility checks in the btree code,
> > > while the Filter condition _does_ require a visibility check, and
> > > delegates the check to the table AM if the scan isn't Index-Only, or
> > > if the VM didn't show all-visible during the check.
> >
> > Any chance you could point me in the right direction for the
> > code/docs/comment about this? I'd like to learn a bit more about why
> > that is the case, because I didn't realize visibility checks worked
> > differently for index scan keys and Filter keys.
> 
> This can be derived by combining how Filter works (it only filters the returned live tuples)
> and how Index-Only scans work (return the index tuple, unless !ALL_VISIBLE, in which
> case the heap tuple is projected). There have been several threads more or less
> recently that also touch this topic and closely related topics, e.g. [0][1].

Thanks! I could understand what is difference between INCLUDE based filter and index filter.

> > > As you can see, there's a huge difference in performance. Putting
> > > both non-bound and "normal" filter clauses in the same Filter clause
> > > will make it more difficult to explain performance issues based on
> > > only the explain output.
> >
> > Fair enough, that's of course the main point of this patch in the
> > first place: being able to better interpret the explain plan when you
> > don't have access to the schema. Still I think Filter is the correct
> > keyword for both, so how about we make it less confusing by making the
> > current "Filter" more specific by calling it something like "Non-key
> > Filter" or "INCLUDE Filter" and then call the other something like
> > "Index Filter" or "Secondary Bound Filter".
> 
> I'm not sure how debuggable explain plans are without access to the schema, especially
> when VERBOSE isn't configured, so I would be hesitant to accept that as an argument
> here.

IMHO, it's nice to be able to understand the differences between each
FILTER even without the VERBOSE option. (+1 for Jelte Fennema-Nio's idea)

Even without access to the schema, it would be possible to quickly know if
the plan is not as expected, and I believe there are virtually no disadvantages
to having multiple "XXX FILTER" outputs.

If it's better to output such information only with the VERBOSE option, 
What do you think about the following idea?
* When the VERBOSE option is not specified, output as "Filter" in all cases
* When the VERBOSE option is specified, output as "Non-key Filter", "INCLUDE Filter" 
  and "Index Filter".

In addition, I think it would be good to mention the differences between each filter in 
the documentation.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Peter Geoghegan
Date:
On Fri, Jun 21, 2024 at 3:12 AM <Masahiro.Ikeda@nttdata.com> wrote:
> Regarding the multicolumn B-Tree Index, I'm considering
> if we can enhance the EXPLAIN output. There have been requests
> for this from our customer.

I agree that this is a real problem -- I'm not surprised to hear that
your customer asked about it.

In the past, we've heard complaints about this problem from Markus Winand, too:

https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates

As it happens I have been thinking about this problem a lot recently.
Specifically the user-facing aspects, what we show in EXPLAIN. It is
relevant to my ongoing work on skip scan:

https://commitfest.postgresql.org/48/5081/
https://www.postgresql.org/message-id/flat/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com

Unfortunately, my patch will make the situation more complicated for
your patch. I would like to resolve the tension between the two
patches, but I'm not sure how to do that.

If you look at the example query that I included in my introductory
email on the skip scan thread (the query against the sales_mdam_paper
table), you'll see that my patch makes it go much faster. My patch
will effectively "convert" nbtree scan keys that would traditionally
have to use non-index-bound conditions/filter predicates, into
index-bound conditions/access predicates. This all happens at runtime,
during nbtree preprocessing (not during planning).

This may mean that your patch's approach of determining which
columns/scan keys are in which category (bound vs. non-bound) cannot
rely on its current approach of placing each type of clause into one
of two categories inside btcostestimate() -- the view that we see from
btcostestimate() will be made less authoritative by skip scan. What
actually matters in what happens during nbtree preprocessing, inside
_bt_preprocess_keys().

Unfortunately, this is even more complicated than it sounds. It would
be a good idea if we moved _bt_preprocess_keys() to plan time, so that
btcostestimate() operated off of authoritative information, rather
than independently figuring out the same details for the purposes of
costing. We've talked about this before, even [1]. That way your patch
could just work off of this authoritative information. But even that
doesn't necessarily fix the problem.

Note that the skip scan patch makes _bt_preprocess_keys()
*indiscriminately* "convert" *all* scan keys to index bound conditions
-- at least where that's possible at all. There are minor
implementation restrictions that mean that we can't always do that.
But overall, the patch more or less eliminates non-bound index
conditions. That is, it'll be rare to non-existent for nbtree to fail
to mark *any* scan key as SK_BT_REQFWD/SK_BT_REQBKWD. Technically
speaking, non-bound conditions mostly won't exist anymore.

Of course, this doesn't mean that the problem that your patch is
solving will actually go away. I fully expect that the skip scan patch
will merely make some scan keys "required-by-scan/index bound
condition scan keys in name only". Technically they won't be the
problematic kind of index condition, but that won't actually be true
in any practical sense. Because users (like your customer) will still
get full index scans, and be surprised, just like today.

As I explain in my email on the skip scan thread, I believe that the
patch's aggressive approach to "converting" scan keys is an advantage.
The amount of skipping that actually takes place should be decided
dynamically, at runtime. It is a decision that should be made at the
level of individual leaf pages (or small groups of leaf pages), not at
the level of the whole scan. The distinction between index bound
conditions and non-bound conditions becomes much more "squishy", which
is mostly (though not entirely) a good thing.

I really don't know what to do about this. As I said, I agree with the
general idea of this patch -- this is definitely a real problem. And,
I don't pretend that my skip scan patch will actually define the
problem out of existence (except perhaps in those cases that it
actually makes it much faster). Maybe we could make a guess (based on
statistics) whether or not any skip attributes will leave the
lower-order clauses as useful index bound conditions at runtime. But I
don't know...that condition is actually a "continuous" condition now
-- it is not a strict dichotomy (it is not either/or, but rather a
question of degree, perhaps on a scale of 0.0 - 1.0).

It's also possible that we should just do something simple, like your
patch, even though technically it won't really be accurate in cases
where skip scan is used to good effect. Maybe showing the "default
working assumption" about how the scan keys/clauses will behave at
runtime is actually the right thing to do. Maybe I am just
overthinking it.

[1] https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us
-- the final full paragraph mentions moving _bt_preprocess_keys() into
the planner
--
Peter Geoghegan



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Jelte Fennema-Nio
Date:
On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote:
> It's also possible that we should just do something simple, like your
> patch, even though technically it won't really be accurate in cases
> where skip scan is used to good effect. Maybe showing the "default
> working assumption" about how the scan keys/clauses will behave at
> runtime is actually the right thing to do. Maybe I am just
> overthinking it.

IIUC, you're saying that your skip scan will improve the situation
Masahiro describes dramatically in some/most cases. But it still won't
be as good as a pure index "prefix" scan.

If that's the case then I do think you're overthinking this a bit.
Because then you'd still want to see this difference between the
prefix-scan keys and the skip-scan keys. I think the main thing that
the introduction of the skip scan changes is the name that we should
show, e.g. instead of "Non-key Filter" we might want to call it "Skip
Scan Cond"

I do think though that in addition to a "Skip Scan Filtered" count for
ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count
(if that's possible to measure/estimate somehow). This would allow
users to determine how effective the skip scan was, i.e. were they
able to skip over large swaths of the index? Or did they skip over
nothing because the second column of the index (on which there was no
filter) was unique within the table



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Peter Geoghegan
Date:
On Thu, Jun 27, 2024 at 4:46 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote:
> > It's also possible that we should just do something simple, like your
> > patch, even though technically it won't really be accurate in cases
> > where skip scan is used to good effect. Maybe showing the "default
> > working assumption" about how the scan keys/clauses will behave at
> > runtime is actually the right thing to do. Maybe I am just
> > overthinking it.
>
> IIUC, you're saying that your skip scan will improve the situation
> Masahiro describes dramatically in some/most cases.

"Most cases" seems likely to be overstating it. Overall, I doubt that
it makes sense to try to generalize like that.

The breakdown of the cases that we see in the field right now
(whatever it really is) is bound to be strongly influenced by the
current capabilities of Postgres. If something is intolerably slow,
then it just isn't tolerated. If something works adequately, then
users don't usually care why it is so.

> But it still won't
> be as good as a pure index "prefix" scan.

Typically, no, it won't be. But there's really no telling for sure.
The access patterns for a composite index on '(a, b)' with a qual
"WHERE b = 5" are identical to a qual explicitly written "WHERE a =
any(<every possible value in 'a'>) AND b = 5".

> If that's the case then I do think you're overthinking this a bit.
> Because then you'd still want to see this difference between the
> prefix-scan keys and the skip-scan keys. I think the main thing that
> the introduction of the skip scan changes is the name that we should
> show, e.g. instead of "Non-key Filter" we might want to call it "Skip
> Scan Cond"

What about cases where we legitimately have to vary our strategy
during the same index scan? We might very well be able to skip over
many leaf pages when scanning through a low cardinality subset of the
index (low cardinality in respect of a leading column 'a'). Then we
might find that there are long runs on leaf pages where no skipping is
possible.

I don't expect this to be uncommon. I do expect it to happen when the
optimizer wasn't particularly expecting it. Like when a full index
scan was the fastest plan anyway. Or when a skip scan wasn't quite as
good as expected, but nevertheless turned out to be the fastest plan.

> I do think though that in addition to a "Skip Scan Filtered" count for
> ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count
> (if that's possible to measure/estimate somehow). This would allow
> users to determine how effective the skip scan was, i.e. were they
> able to skip over large swaths of the index? Or did they skip over
> nothing because the second column of the index (on which there was no
> filter) was unique within the table

Yeah, EXPLAIN ANALYZE should probably be showing something about
skipping. That provides us with a way of telling the user what really
happened, which could help when EXPLAIN output alone turns out to be
quite misleading.

In fact, that'd make sense even today, without skip scan (just with
the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index
scans, the number of primitive scans is hard to predict, and quite
indicative of what's really going on with the scan.

--
Peter Geoghegan



On Thu, 27 Jun 2024 at 22:02, Peter Geoghegan <pg@bowt.ie> wrote:
> Unfortunately, my patch will make the situation more complicated
> for your patch. I would like to resolve the tension between the
> two patches, but I'm not sure how to do that.

OK. I would like to understand more about your proposed patch. I
have also registered as a reviewer in the commitfests entry.

On 2024-06-28 07:40, Peter Geoghegan wrote:
> On Thu, Jun 27, 2024 at 4:46 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>> I do think though that in addition to a "Skip Scan Filtered" count for
>> ANALYZE, it would be very nice to also get a "Skip Scan Skipped" count
>> (if that's possible to measure/estimate somehow). This would allow
>> users to determine how effective the skip scan was, i.e. were they
>> able to skip over large swaths of the index? Or did they skip overx
>> nothing because the second column of the index (on which there was no
>> filter) was unique within the table
>
> Yeah, EXPLAIN ANALYZE should probably be showing something about
> skipping. That provides us with a way of telling the user what really
> happened, which could help when EXPLAIN output alone turns out to be
> quite misleading.
>
> In fact, that'd make sense even today, without skip scan (just with
> the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index
> scans, the number of primitive scans is hard to predict, and quite
> indicative of what's really going on with the scan.

I agree as well.

Although I haven't looked on your patch yet, if it's difficult to know
how it can optimize during the planning phase, it's enough for me to just
show "Skip Scan Cond (or Non-Key Filter)". This is because users can
understand that inefficient index scans *may* occur.

If users want more detail, they can execute "EXPLAIN ANALYZE". This will
allow them to understand the execution effectively and determine if there
is any room to optimize the plan by looking at the counter of
"Skip Scan Filtered (or Skip Scan Skipped)".

In terms of the concept of EXPLAIN output, I thought that runtime partition
pruning is similar. "EXPLAIN without ANALYZE" only shows the possibilities and
"EXPLAIN ANALYZE" shows the actual results.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Jelte Fennema-Nio
Date:
On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <pg@bowt.ie> wrote:
> Typically, no, it won't be. But there's really no telling for sure.
> The access patterns for a composite index on '(a, b)' with a qual
> "WHERE b = 5" are identical to a qual explicitly written "WHERE a =
> any(<every possible value in 'a'>) AND b = 5".

Hmm, that's true. But in that case the explain plan gives a clear hint
that something like that might be going on, because you'll see:

Index Cond: a = any(<every possible value in 'a'>) AND b = 5

That does make me think of another way, and maybe more "correct" way,
of representing Masahiros situation than adding a new "Skip Scan Cond"
row to the EXPLAIN output. We could explicitly include a comparison to
all prefix columns in the Index Cond:

Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101))

Or if you want to go with syntactically correct SQL we could do the following:

Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS
NOT NULL)) AND (test.id3 = 101))

An additional benefit this provides is that you now know which
additional column you should use a more specific filter on to speed up
the query. In this case test.id2

OTOH, maybe it's not a great way because actually running that puts
the IS NULL+ IS NOT NULL query in the Filter clause (which honestly
surprises me because I had expected this "always true expression"
would have been optimized away by the planner).

> EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 AND (id2 IS NULL OR id2 IS NOT NULL) AND id3
=101; 
                                                           QUERY PLAN
─────────────────────────────────────────────────────
 Index Only Scan using test_idx on public.test  (cost=0.42..12809.10
rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1)
   Output: id1, id2, id3
   Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
   Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL))

> What about cases where we legitimately have to vary our strategy
> during the same index scan?

Would my new suggestion above address this?

> In fact, that'd make sense even today, without skip scan (just with
> the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index
> scans, the number of primitive scans is hard to predict, and quite
> indicative of what's really going on with the scan.

*googles nbtree SAOP scans and finds the very helpful[1]*

Yes, I feel like this should definitely be part of the ANALYZE output.
Seeing how Lukas has to look at pg_stat_user_tables to get this
information seems quite annoying[2] and only possible on systems that
have no concurrent queries.

So it sounds like we'd want a "Primitive Index Scans" counter in
ANALYZE too. In addition to the number of filtered rows by, which if
we go with my proposal above should probably be called "Rows Removed
by Index Cond".

[1]: https://www.youtube.com/watch?v=jg2KeSB5DR8
[2]: https://youtu.be/jg2KeSB5DR8?t=188



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Matthias van de Meent
Date:
On Fri, 28 Jun 2024 at 10:59, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Fri, 28 Jun 2024 at 00:41, Peter Geoghegan <pg@bowt.ie> wrote:
> > Typically, no, it won't be. But there's really no telling for sure.
> > The access patterns for a composite index on '(a, b)' with a qual
> > "WHERE b = 5" are identical to a qual explicitly written "WHERE a =
> > any(<every possible value in 'a'>) AND b = 5".
>
> Hmm, that's true. But in that case the explain plan gives a clear hint
> that something like that might be going on, because you'll see:
>
> Index Cond: a = any(<every possible value in 'a'>) AND b = 5
>
> That does make me think of another way, and maybe more "correct" way,
> of representing Masahiros situation than adding a new "Skip Scan Cond"
> row to the EXPLAIN output. We could explicitly include a comparison to
> all prefix columns in the Index Cond:
>
> Index Cond: ((test.id1 = 1) AND (test.id2 = ANY) AND (test.id3 = 101))
>
> Or if you want to go with syntactically correct SQL we could do the following:
>
> Index Cond: ((test.id1 = 1) AND ((test.id2 IS NULL) OR (test.id2 IS
> NOT NULL)) AND (test.id3 = 101))
>
> An additional benefit this provides is that you now know which
> additional column you should use a more specific filter on to speed up
> the query. In this case test.id2
>
> OTOH, maybe it's not a great way because actually running that puts
> the IS NULL+ IS NOT NULL query in the Filter clause (which honestly
> surprises me because I had expected this "always true expression"
> would have been optimized away by the planner).
>
> > EXPLAIN (VERBOSE, ANALYZE) SELECT id1, id2, id3 FROM test WHERE id1 = 1 AND (id2 IS NULL OR id2 IS NOT NULL) AND
id3= 101; 
>                                                            QUERY PLAN
> ─────────────────────────────────────────────────────
>  Index Only Scan using test_idx on public.test  (cost=0.42..12809.10
> rows=1 width=12) (actual time=0.027..11.234 rows=1 loops=1)
>    Output: id1, id2, id3
>    Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
>    Filter: ((test.id2 IS NULL) OR (test.id2 IS NOT NULL))
>
> > What about cases where we legitimately have to vary our strategy
> > during the same index scan?
>
> Would my new suggestion above address this?
>
> > In fact, that'd make sense even today, without skip scan (just with
> > the 17 work on nbtree SAOP scans). Even with regular SAOP nbtree index
> > scans, the number of primitive scans is hard to predict, and quite
> > indicative of what's really going on with the scan.
>
> *googles nbtree SAOP scans and finds the very helpful[1]*
>
> Yes, I feel like this should definitely be part of the ANALYZE output.
> Seeing how Lukas has to look at pg_stat_user_tables to get this
> information seems quite annoying[2] and only possible on systems that
> have no concurrent queries.
>
> So it sounds like we'd want a "Primitive Index Scans" counter in
> ANALYZE too. In addition to the number of filtered rows by, which if
> we go with my proposal above should probably be called "Rows Removed
> by Index Cond".

This all just made me more confident that this shows a need to enable
index AMs to provide output for EXPLAIN: The knowledge about how index
scankeys are actually used is exclusively known to the index AM,
because the strategies are often unique to the index AM (or even
chosen operator classes), and sometimes can only be applied at
runtime: while the index scankeys' sizes, attribute numbers and
operators are known in advance (even if not all arguments are filled
in; `FROM a JOIN b ON b.id = ANY (a.ref_array)`), the AM can at least
show what strategy it is likely going to choose, and how (in)efficient
that strategy could be.

Right now, Masahiro-san's patch tries to do that with an additional
field in IndexPath populated (by proxy) exclusively in btcostestimate.
I think that design is wrong, because it wires explain- and
btree-specific data through the planner, adding overhead everywhere
which is only used for btree- and btree-compatible indexes.

I think the better choice would be adding an IndexAmRoutine->amexplain
support function, which would get called in e.g. explain.c's
ExplainIndexScanDetails to populate a new "Index Scan Details" (name
to be bikeshed) subsection of explain plans. This would certainly be
possible, as the essentials for outputting things to EXPLAIN are
readily available in the explain.h header.


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Improve EXPLAIN output for multicolumn B-Tree Index

From
Peter Geoghegan
Date:
On Thu, Jun 27, 2024 at 11:06 PM <Masahiro.Ikeda@nttdata.com> wrote:
> OK. I would like to understand more about your proposed patch. I
> have also registered as a reviewer in the commitfests entry.

Great!

> Although I haven't looked on your patch yet, if it's difficult to know
> how it can optimize during the planning phase, it's enough for me to just
> show "Skip Scan Cond (or Non-Key Filter)". This is because users can
> understand that inefficient index scans *may* occur.

That makes sense.

The goal of your patch is to highlight when an index scan is using an
index that is suboptimal for a particular query (a query that the user
runs through EXPLAIN or EXPLAIN ANALYZE). The underlying rules that
determine "access predicate vs. filter predicate" are not very
complicated -- they're intuitive, even. But even an expert can easily
make a mistake on a bad day.

It seems to me that all your patch really needs to do is to give the
user a friendly nudge in that direction, when it makes sense to. You
want to subtly suggest to the user "hey, are you sure that the index
the plan uses is exactly what you expected?". Fortunately, even when
skip scan works well that should still be a useful nudge. If we assume
that the query that the user is looking at is much more important than
other queries, then the user really shouldn't be using skip scan in
the first place. Even a good skip scan is a little suspicious (it's
okay if it "stands out" a bit).

> In terms of the concept of EXPLAIN output, I thought that runtime partition
> pruning is similar. "EXPLAIN without ANALYZE" only shows the possibilities and
> "EXPLAIN ANALYZE" shows the actual results.

That seems logical.

--
Peter Geoghegan