Hello,
I have a large table (about 13 million rows) full of customer order information. Since most of that information is for
ordersthat have already been fulfilled, I have a partial index to help quickly zero in on rows that have not been
fulfilled.This works well, but I noticed today when joining with another large table using its primary key that even
thoughthe planner was using my partial index, it decided to do a merge join to the second large table instead of the
nestedloop I would have expected.
Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do a
simplequery that does an index scan on my partial index:
=> \d orderitems_committed_unfulfilled
Index "public.orderitems_committed_unfulfilled"
Column | Type | Key? | Definition
--------+--------+------+------------
id | bigint | yes | id
btree, for table "public.orderitems", predicate (LEAST(committed, quantity) > fulfilled)
=> explain (analyze, buffers) select oi.id from orderitems oi where LEAST(oi.committed, oi.quantity) > oi.fulfilled;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using orderitems_committed_unfulfilled on orderitems oi (cost=0.41..31688.23 rows=2861527 width=8)
(actualtime=0.039..2.092 rows=2274 loops=1)
Heap Fetches: 2493
Buffers: shared hit=1883
Planning Time: 0.110 ms
Execution Time: 2.255 ms
(5 rows)
So nice and quick, but the planner thought it would get back 2861527 rows instead of the 2274 I actually have. That
explainswhy it thought it would make sense to do a merge join with my other large table instead of the nested loop over
the2k rows. I would have expected the planner to know that there's no way it'll get back 2 million rows though, given
that:
=> select relname, relpages, reltuples from pg_class where relname = 'orderitems_committed_unfulfilled';
relname | relpages | reltuples
----------------------------------+----------+-----------
orderitems_committed_unfulfilled | 3051 | 2112
(1 row)
It knows there's only 2k-ish of them from the index. The 2 mil number is the same as what the planner expects if I
disableusing indexes and it does a seq scan, so I'm assuming it's just the guess from the column statistics and the
planneris not using the size of the partial index.
I'm running:
=> select version();
version
-------------------------------------------------------------------------------------------
PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc, a 97d579287 p 0be2109a97, 64-bit
(1 row)
I'm wondering if the behavior that I'm seeing is expected in 12.4, and if so if it changes in a later version or if I
shouldfile an enhancement request? Or if it's not expected is there's something I'm doing wrong, or should file a bug?
Thanks for your time.
--
Olivier Poquet
opoquet@plumdev.com