Thread: Postgres Query Plan using wrong index

Postgres Query Plan using wrong index

From
Manikandan Swaminathan
Date:
Hello,

I'm running on the docker postgres:17.0 image and trying to test out the behavior of adding a new index to a table. Specifically, I wanted to verify that my new index is actually used by looking at the output of "EXPLAIN ANALYZE". However, I found that my index is often not being used and wanted to see the rationale of the query planner when choosing the index.

Reproduction steps
postgres=# select version();
                                                          version                                                          
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

1. Create database

CREATE DATABASE test LOCALE_PROVIDER icu ICU_LOCALE "en-US-x-icu" LOCALE "en_US.utf8" TEMPLATE template0;

2. Create table and indices
CREATE TABLE test_table (
    col_a int,
    col_b INT NOT NULL    
);
CREATE INDEX IF NOT EXISTS idx_col_a_btree ON test_table(col_b);
CREATE INDEX IF NOT EXISTS idx_col_a_brin ON test_table USING brin (col_b);
CREATE INDEX IF NOT EXISTS idx_col_b_a ON test_table(col_a, col_b);

3. Load 10 million rows into table

DO $$
DECLARE
    batch_count INT := 0;
    b_var INT := 0;
    a_var INT := 1;
    prev_a INT := 1;
    a_null BOOLEAN := FALSE;
    batch_size INT := 1000;
BEGIN
    FOR i IN 1..10000000 LOOP
        IF batch_count = batch_size THEN
            b_var := b_var + 1;
            a_null := NOT a_null;
            IF NOT a_null THEN
                a_var := prev_a + 1;
            ELSE
                prev_a := a_var;
                a_var := NULL;
            END IF;
            batch_count := 0;
        END IF;
        INSERT INTO test_table (col_a, col_b) VALUES (a_var, b_var);
        batch_count := batch_count + 1;
    END LOOP;
END $$;

4. When running the following query, I would expect the index "idx_col_b_a" to be used: select min(col_b) from test_table  where col_a > 4996.
I have a range-based filter on col_a, and am aggregating the result with min(col_b). Both columns are covered by "idx_col_b_a". However, explain analyze indicates otherwise:

postgres=# explain analyze select min(col_b) from test_table  where col_a > 4996;
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=63.86..63.87 rows=1 width=4) (actual time=587.550..587.550 rows=1 loops=1)
   InitPlan 1
     ->  Limit  (cost=0.43..63.86 rows=1 width=4) (actual time=587.542..587.543 rows=1 loops=1)
           ->  Index Scan using idx_col_a_btree on test_table  (cost=0.43..259400.27 rows=4090 width=4) (actual time=587.541..587.541 rows=1 loops=1)
                 Filter: (col_a > 4996)
                 Rows Removed by Filter: 9992000
 Planning Time: 0.305 ms
 Execution Time: 587.579 ms
(8 rows)

Instead of using idx_col_b_a, it does an index scan on idx_col_a_btree. This is a problem because of the way how data is structured in my table. The higher col_a values are associated with higher col_b values. As a result, the index scan ends up having to scan through most of the index before finding the first record that matches the critieria "col_a > 4996".

When I DROP the idx_col_a_btree index, the resulting query plan looks much better because it's using the correct index on col_b:
postgres=# explain analyze select min(col_b) from test_table  where col_a > 4996;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=102.23..102.24 rows=1 width=4) (actual time=0.591..0.592 rows=1 loops=1)
   ->  Index Only Scan using idx_col_b_a on test_table  (cost=0.43..92.01 rows=4090 width=4) (actual time=0.021..0.341 rows=4000 loops=1)
         Index Cond: (col_a > 4996)
         Heap Fetches: 0
 Planning Time: 0.283 ms
 Execution Time: 0.613 ms
(6 rows)

I tried fiddling with the table statistics and the random_page_cost but neither seemed to make a difference. Is there some nuance here that I'm missing? Why is the query planner using an index that drastically worsens the performance of the query?


Re: Postgres Query Plan using wrong index

From
Tom Lane
Date:
Manikandan Swaminathan <maniswami23@gmail.com> writes:
> 4. When running the following query, I would expect the index "idx_col_b_a"
> to be used: select min(col_b) from test_table  where col_a > 4996.
> I have a range-based filter on col_a, and am aggregating the result with
> min(col_b). Both columns are covered by "idx_col_b_a".

They may be covered, but sort order matters, and that index has the
wrong sort order to help with this query.  Try

create index on test_table(col_b, col_a);

            regards, tom lane



Re: Postgres Query Plan using wrong index

From
Manikandan Swaminathan
Date:
Thanks so much for your help, Tom.

Sorry, I didn’t quite understand the answer — I have a few follow-up questions.  Sorry, I'm new to Postgres so I am a
bitignorant here and would appreciate any tips on the query planner you could give. 

1) Why is the query currently picking the poorly performing index? I already have an index on (col_a, col_b) that
performswell. When I remove the separate index on (col_b), it correctly uses the (col_a, col_b) index and the query
runsefficiently. But when both indexes are present, it chooses the slower (col_b) index instead. 

2) Why would the index you suggested, (col_b, col_a), perform better than (col_a, col_b)? I would’ve expected the
filteron col_a to come first, followed by the aggregate on col_b. In my mind, it needs to find rows matching the col_a
conditionbefore calculating the MIN(col_b), and I assumed it would traverse the B-tree accordingly.  I'm more used to
MySQLwhere I think it is called a "lose index scan".  I must have a gap in my understanding of how Postgres approaches
this. Thanks for your help! 

3) Why does the planner choose the better-performing (col_a, col_b) index when the filter is col_a > 5000, but switch
tothe slower (col_b) index when the filter is not at the edge of the range, like col_a > 4996? For reference, here’s
thequery plan when filtering for col_a > 5000. It uses the correct index on (col_a, col_b). 

postgres=# explain analyze select min(col_b) from test_table  where col_a > 5000;

 Aggregate  (cost=4.46..4.46 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1)
   ->  Index Only Scan using idx_col_b_a on test_table  (cost=0.43..4.45 rows=1 width=4) (actual time=0.004..0.005
rows=0loops=1) 
         Index Cond: (col_a > 5000)
         Heap Fetches: 0
 Planning Time: 2.279 ms
 Execution Time: 0.028 ms
(6 rows)


>
> On Apr 1, 2025, at 5:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Manikandan Swaminathan <maniswami23@gmail.com> writes:
>> 4. When running the following query, I would expect the index "idx_col_b_a"
>> to be used: select min(col_b) from test_table  where col_a > 4996.
>> I have a range-based filter on col_a, and am aggregating the result with
>> min(col_b). Both columns are covered by "idx_col_b_a".
>
> They may be covered, but sort order matters, and that index has the
> wrong sort order to help with this query.  Try
>
> create index on test_table(col_b, col_a);
>
>          regards, tom lane



Re: Postgres Query Plan using wrong index

From
Tom Lane
Date:
Manikandan Swaminathan <maniswami23@gmail.com> writes:
> 1) Why is the query currently picking the poorly performing index?

Because the planner thinks that one will be cheaper, as you can see by
comparing the cost estimates in EXPLAIN.  It's wrong, but this is a
hard problem to estimate well.  Especially when the behavior depends
on a correlation between columns that the planner knows nothing about.

> 2) Why would the index you suggested, (col_b, col_a), perform better than (col_a, col_b)? I would’ve expected the
filteron col_a to come first, followed by the aggregate on col_b. In my mind, it needs to find rows matching the col_a
conditionbefore calculating the MIN(col_b), and I assumed it would traverse the B-tree accordingly. 

The idea the planner is using is "scan the index in order (that is,
in col_b order) until you find the first row satisfying the other
constraints (that is, the col_a condition).  Then that row's col_b
value is the correct MIN(), and you can stop."  Since it knows nothing
of the cross-column correlation, its estimate of how many rows it'll
have to scan through is overly optimistic.  But it knows that the
other way involves scanning a whole lot of the index --- there's no
chance of stopping early --- so that's estimated as higher-cost.

The index I suggested on (col_b, col_a) is amenable to this same
plan shape, since col_b is still the major sort column.  The
reason it wins is that the col_a condition can be checked in the
index without having to visit the heap, thus eliminating a lot of
random access to the heap.

> 3) Why does the planner choose the better-performing (col_a, col_b) index when the filter is col_a > 5000, but switch
tothe slower (col_b) index when the filter is not at the edge of the range, like col_a > 4996? 

At some point, as less and less of the col_a-major index would need to
be scanned, there's a crossover in the cost estimates for the two ways
of doing this.  I would not have cared to predict where the crossover
is, but you evidently found it empirically.

> For reference, here’s the query plan when filtering for col_a > 5000. It uses the correct index on (col_a, col_b).

You would do a lot better to approach this without rigid notions of
which is the "correct" index.  All of the ones we've discussed are
potentially usable for this query, and they all have different cost
curves depending on how selective the col_a condition is.  Even the
index on col_b alone could potentially be the best, because it'll be
smaller than the two-column indexes.  So if the col_a condition is
very unselective then it's (at least in theory) possible that that
would be the best choice.

            regards, tom lane



Re: Postgres Query Plan using wrong index

From
Manikandan Swaminathan
Date:
Thanks Tom.

Since you mentioned the planner not knowing about the correlation between the columns, I’m curious, why doesn’t making
amultivariate statistic make a difference? 

CREATE STATISTICS col_a_col_b_stats (dependencies) ON col_a, col_b FROM test_table;
ANALYZE test_table;

And the resulting query plan which uses just the index on col_b:

postgres=# explain analyze select min(col_b) from test_table  where col_a > 4996;

 Result  (cost=62.13..62.14 rows=1 width=4) (actual time=536.648..536.649 rows=1 loops=1)
   InitPlan 1
     ->  Limit  (cost=0.43..62.13 rows=1 width=4) (actual time=536.641..536.641 rows=1 loops=1)
           ->  Index Scan using idx_col_a_btree on test_table  (cost=0.43..254987.43 rows=4133 width=4) (actual
time=536.640..536.640rows=1 loops=1) 
                 Filter: (col_a > 4996)
                 Rows Removed by Filter: 9992000
 Planning Time: 0.285 ms
 Execution Time: 536.681 ms

>
> On Apr 2, 2025, at 5:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Manikandan Swaminathan <maniswami23@gmail.com> writes:
>> 1) Why is the query currently picking the poorly performing index?
>
> Because the planner thinks that one will be cheaper, as you can see by
> comparing the cost estimates in EXPLAIN.  It's wrong, but this is a
> hard problem to estimate well.  Especially when the behavior depends
> on a correlation between columns that the planner knows nothing about.
>
>> 2) Why would the index you suggested, (col_b, col_a), perform better than (col_a, col_b)? I would’ve expected the
filteron col_a to come first, followed by the aggregate on col_b. In my mind, it needs to find rows matching the col_a
conditionbefore calculating the MIN(col_b), and I assumed it would traverse the B-tree accordingly. 
>
> The idea the planner is using is "scan the index in order (that is,
> in col_b order) until you find the first row satisfying the other
> constraints (that is, the col_a condition).  Then that row's col_b
> value is the correct MIN(), and you can stop."  Since it knows nothing
> of the cross-column correlation, its estimate of how many rows it'll
> have to scan through is overly optimistic.  But it knows that the
> other way involves scanning a whole lot of the index --- there's no
> chance of stopping early --- so that's estimated as higher-cost.
>
> The index I suggested on (col_b, col_a) is amenable to this same
> plan shape, since col_b is still the major sort column.  The
> reason it wins is that the col_a condition can be checked in the
> index without having to visit the heap, thus eliminating a lot of
> random access to the heap.
>
>> 3) Why does the planner choose the better-performing (col_a, col_b) index when the filter is col_a > 5000, but
switchto the slower (col_b) index when the filter is not at the edge of the range, like col_a > 4996? 
>
> At some point, as less and less of the col_a-major index would need to
> be scanned, there's a crossover in the cost estimates for the two ways
> of doing this.  I would not have cared to predict where the crossover
> is, but you evidently found it empirically.
>
>> For reference, here’s the query plan when filtering for col_a > 5000. It uses the correct index on (col_a, col_b).
>
> You would do a lot better to approach this without rigid notions of
> which is the "correct" index.  All of the ones we've discussed are
> potentially usable for this query, and they all have different cost
> curves depending on how selective the col_a condition is.  Even the
> index on col_b alone could potentially be the best, because it'll be
> smaller than the two-column indexes.  So if the col_a condition is
> very unselective then it's (at least in theory) possible that that
> would be the best choice.
>
>            regards, tom lane



Re: Postgres Query Plan using wrong index

From
David Rowley
Date:
On Thu, 3 Apr 2025 at 16:24, Manikandan Swaminathan
<maniswami23@gmail.com> wrote:
> Since you mentioned the planner not knowing about the correlation between the columns, I’m curious, why doesn’t
makinga multivariate statistic make a difference? 
>
>
> CREATE STATISTICS col_a_col_b_stats (dependencies) ON col_a, col_b FROM test_table;
> ANALYZE test_table;

Extended statistics won't help you here. "dependencies" just estimates
functional dependencies between the columns mentioned in the ON
clause.  What we'd need to store to do better in your example query is
positional information of where certain values are within indexes
according to an ordered scan of the index. I don't quite know how we'd
represent that exactly, but if we knew that a row matching col_a >
4996 wasn't until somewhere near the end of idx_col_a_btree index,
then we'd likely not want to use that index for this query.

David



Re: Postgres Query Plan using wrong index

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 3 Apr 2025 at 16:24, Manikandan Swaminathan
> <maniswami23@gmail.com> wrote:
>> why doesn’t making a multivariate statistic make a difference?

> Extended statistics won't help you here. "dependencies" just estimates
> functional dependencies between the columns mentioned in the ON
> clause.  What we'd need to store to do better in your example query is
> positional information of where certain values are within indexes
> according to an ordered scan of the index. I don't quite know how we'd
> represent that exactly, but if we knew that a row matching col_a >
> 4996 wasn't until somewhere near the end of idx_col_a_btree index,
> then we'd likely not want to use that index for this query.

A simple-minded approach could be to just be pessimistic, and
increase our estimate of how many rows would need to be scanned as a
consequence of noticing that the columns have significant correlation.
The shape of that penalty function would be mostly guesswork though,
I fear.  (Even with a clear idea of what to do, making this happen
seems a little complex --- just a SMOP, but I'm not very sure how to
wire it up.)

            regards, tom lane



Re: Postgres Query Plan using wrong index

From
David Rowley
Date:
On Thu, 3 Apr 2025 at 18:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A simple-minded approach could be to just be pessimistic, and
> increase our estimate of how many rows would need to be scanned as a
> consequence of noticing that the columns have significant correlation.
> The shape of that penalty function would be mostly guesswork though,
> I fear.  (Even with a clear idea of what to do, making this happen
> seems a little complex --- just a SMOP, but I'm not very sure how to
> wire it up.)

The problem with being pessimistic is that if the distribution was
even or the col_a > 4996 were right at the start of an ordered scan of
the idx_col_b_a index, it would have been a good plan. There might be
just as many people getting good plans as there are bad and adding
pessimism here might just make one set of people happy and the others
sad.

I don't have a clear idea, but from a few minutes thinking about it,
maybe when we build the statistics on a table, once we're about done
with what happens today, if we then took the sample rows set and
sorted them according to the sort order of each amcanorder index then
go through the MCVs lists and histograms for each column and record a
min and max value for the percentage of the way through the sorted
sample rows that we find each MCV value and value within each
histogram bucket and store those in some new statistics kind against
the index (maybe indexes could opt into this).  This might sound like
it'd need lots of storage, but even 1 byte each for the min and max
would be better than today. 0 could mean near the start, 127 in the
middle and 255 at the end.

I imagine this could be done fairly efficiently by sorting the MCVs
lists by value and bsearching that array as we walk through the
indexed-ordered sample rows. The histograms should be sorted already.

When planning the query in question, to figure out the weighting of
how many rows we expect to have to read, we look at the stats against
the index to find the minimum position for values in the col_a
histogram where col_a > 4996. Instead of assuming an even
distribution, you use that minimum value to tell you what percentage
of the index must be read before a match is found.  The stored maximum
position value would do the same job for backward index scans.

David