query plan using partial index expects a much larger number of rows than is possible - Mailing list pgsql-performance

From Olivier Poquet
Subject query plan using partial index expects a much larger number of rows than is possible
Date
Msg-id 59eb4cf7-4c68-448d-a2d8-541464803353@www.fastmail.com
Whole thread Raw
Responses Re: query plan using partial index expects a much larger number of rows than is possible  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hello,
I have a large table (about 13 million rows) full of customer order information. Since most of that information is for
ordersthat have already been fulfilled, I have a partial index to help quickly zero in on rows that have not been
fulfilled.This works well, but I noticed today when joining with another large table using its primary key that even
thoughthe planner was using my partial index, it decided to do a merge join to the second large table instead of the
nestedloop I would have expected.
 

Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do a
simplequery that does an index scan on my partial index:
 

=> \d orderitems_committed_unfulfilled 
Index "public.orderitems_committed_unfulfilled"
 Column |  Type  | Key? | Definition 
--------+--------+------+------------
 id     | bigint | yes  | id
btree, for table "public.orderitems", predicate (LEAST(committed, quantity) > fulfilled)

=> explain (analyze, buffers) select oi.id from orderitems oi where LEAST(oi.committed, oi.quantity) > oi.fulfilled;
                                                                            QUERY PLAN
                                         
 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using orderitems_committed_unfulfilled on orderitems oi  (cost=0.41..31688.23 rows=2861527 width=8)
(actualtime=0.039..2.092 rows=2274 loops=1)
 
   Heap Fetches: 2493
   Buffers: shared hit=1883
 Planning Time: 0.110 ms
 Execution Time: 2.255 ms
(5 rows)

So nice and quick, but the planner thought it would get back 2861527 rows instead of the 2274 I actually have.  That
explainswhy it thought it would make sense to do a merge join with my other large table instead of the nested loop over
the2k rows.  I would have expected the planner to know that there's no way it'll get back 2 million rows though, given
that:

=> select relname, relpages, reltuples from pg_class where relname = 'orderitems_committed_unfulfilled';
             relname              | relpages | reltuples 
----------------------------------+----------+-----------
 orderitems_committed_unfulfilled |     3051 |      2112
(1 row)

It knows there's only 2k-ish of them from the index. The 2 mil number is the same as what the planner expects if I
disableusing indexes and it does a seq scan, so I'm assuming it's just the guess from the column statistics and the
planneris not using the size of the partial index.
 

I'm running:
=> select version();
                                          version                                          
-------------------------------------------------------------------------------------------
 PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc, a 97d579287 p 0be2109a97, 64-bit
(1 row)

I'm wondering if the behavior that I'm seeing is expected in 12.4, and if so if it changes in a later version or if I
shouldfile an enhancement request? Or if it's not expected is there's something I'm doing wrong, or should file a bug?
 

Thanks for your time.

-- 
  Olivier Poquet
  opoquet@plumdev.com



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Next
From: Tom Lane
Date:
Subject: Re: query plan using partial index expects a much larger number of rows than is possible