Thread: Trying to understand why a query is filtering when there is a composite index
Trying to understand why a query is filtering when there is a composite index
From
"Stephen Samuel (Sam)"
Date:
Hi folks.
I have a table with 4.5m rows per partition (16 partitions) (I know, very small, probably didn't need to be partitioned).
The table has two columns, a bigint and b text.
There is a unique index on (a,b)
The query is:
SELECT b
FROM table
WHERE a = <id>
AND b IN (<ids>)
The visibility map is almost exclusively true.
This table gets few updates.
The planner says index only scan, but is filtering on b.
Index Only Scan using pkey on table (cost=0.46..29.09 rows=1 width=19) (actual time=0.033..0.053 rows=10 loops=1)
Index Cond: (a = 662028765)
" Filter: (b = ANY ('{634579987:662028765,561730945:662028765,505555183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))"
Rows Removed by Filter: 1
Heap Fetches: 11
Planning Time: 0.095 ms
Execution Time: 0.070 ms
My question is, why isn't it using the index for column b? Is this expected? And why is it doing heap lookups for every row,.
Performance is still good, but I am curious.
Thanks in advance!
Re: Trying to understand why a query is filtering when there is a composite index
From
Peter Geoghegan
Date:
On Sun, Aug 18, 2024 at 9:56 PM Stephen Samuel (Sam) <sam@sksamuel.com> wrote: > My question is, why isn't it using the index for column b? Is this expected? And why is it doing heap lookups for everyrow,. This has been fixed for Postgres 17: https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans -- Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
"Stephen Samuel (Sam)"
Date:
Oh as simple as upgrading!
Ok great, appreciate the quick reply. Will have to wait for AWS to support 17 :)
On Sun, 18 Aug 2024 at 20:59, Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, Aug 18, 2024 at 9:56 PM Stephen Samuel (Sam) <sam@sksamuel.com> wrote:
> My question is, why isn't it using the index for column b? Is this expected? And why is it doing heap lookups for every row,.
This has been fixed for Postgres 17:
https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans
--
Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
Peter Geoghegan
Date:
On Sun, Aug 18, 2024 at 10:01 PM Stephen Samuel (Sam) <sam@sksamuel.com> wrote: > Oh as simple as upgrading! > Ok great, appreciate the quick reply. Will have to wait for AWS to support 17 :) It is possible to use index quals for both a and b on earlier versions, with certain restrictions. You might try setting random_page_cost to a much lower value, to see if that allows the planner to use such a plan with your real query. In my experience it's very unlikely that the planner will do that, though, even when coaxed. At least when there are this many IN() constants. So you probably really will need to upgrade to 17. -- Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
"Stephen Samuel (Sam)"
Date:
Performance is pretty good anyway, and I'm only running 5 r7.large readers on this service, I was just looking at the query planner and it surprised me.
On Sun, 18 Aug 2024 at 21:08, Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, Aug 18, 2024 at 10:01 PM Stephen Samuel (Sam) <sam@sksamuel.com> wrote:
> Oh as simple as upgrading!
> Ok great, appreciate the quick reply. Will have to wait for AWS to support 17 :)
It is possible to use index quals for both a and b on earlier
versions, with certain restrictions. You might try setting
random_page_cost to a much lower value, to see if that allows the
planner to use such a plan with your real query.
In my experience it's very unlikely that the planner will do that,
though, even when coaxed. At least when there are this many IN()
constants. So you probably really will need to upgrade to 17.
--
Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
Tom Lane
Date:
"Stephen Samuel (Sam)" <sam@sksamuel.com> writes: > There is a unique index on (a,b) > The query is: > SELECT b > FROM table > WHERE a = <id> > AND b IN (<ids>) > The planner says index only scan, but is filtering on b. > Index Only Scan using pkey on table (cost=0.46..29.09 rows=1 > width=19) (actual time=0.033..0.053 rows=10 loops=1) > Index Cond: (a = 662028765) > " Filter: (b = ANY > ('{634579987:662028765,561730945:662028765,505555183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))" > Rows Removed by Filter: 1 > Heap Fetches: 11 > Planning Time: 0.095 ms > Execution Time: 0.070 ms I think it's a good bet that this query would be *slower* if it were done the other way. The filter condition is eliminating only one of the 11 rows matching "a = 662028765". If we did what you think you want, we'd initiate ten separate index descents to find the other ten rows. Whether the planner is costing this out accurately enough to realize that, or whether it's just accidentally falling into the right plan, I'm not sure; you've not provided nearly enough details for anyone to guess what the other cost estimate was. > And why is it doing heap lookups for every row,. Yeah, that part is a weakness I've wanted to fix for a long time: it could do the filter condition by fetching b from the index, but it doesn't notice that and has to go to the heap to get b. (If the other plan does win, it'd likely be because of that problem and not because the index scanning strategy per se is better.) regards, tom lane
Re: Trying to understand why a query is filtering when there is a composite index
From
Peter Geoghegan
Date:
On Sun, Aug 18, 2024 at 10:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think it's a good bet that this query would be *slower* if > it were done the other way. The filter condition is eliminating > only one of the 11 rows matching "a = 662028765". If we did what > you think you want, we'd initiate ten separate index descents > to find the other ten rows. True - on versions prior to Postgres 17. On 17 the number of index descents will be minimal. If there are less than a few hundred index tuples with the value a = <whatever>, then there'll only be one descent. > Yeah, that part is a weakness I've wanted to fix for a long > time: it could do the filter condition by fetching b from the > index, but it doesn't notice that and has to go to the heap > to get b. It was fixed? At least on 17. -- Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > On Sun, Aug 18, 2024 at 10:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, that part is a weakness I've wanted to fix for a long >> time: it could do the filter condition by fetching b from the >> index, but it doesn't notice that and has to go to the heap >> to get b. > It was fixed? At least on 17. Oh, sorry, I was thinking of a related problem that doesn't apply here: matching indexes on expressions to fragments of a filter condition. However, the fact that the OP's EXPLAIN shows heap fetches from a supposedly all-visible table suggests that his IN isn't getting optimized that way. I wonder why --- it seems to work for me, even in fairly old versions. Taking a parallel example from the regression database, even v12 can do regression=# explain analyze select tenthous from tenk1 where thousand=99 and tenthous in (1,4,7,9,11,55,66,88,99,77,8876,9876); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..4.61 rows=1 width=4) (actual time=0.016..0.018 rows=1 loops=1) Index Cond: (thousand = 99) Filter: (tenthous = ANY ('{1,4,7,9,11,55,66,88,99,77,8876,9876}'::integer[])) Rows Removed by Filter: 9 Heap Fetches: 0 Planning Time: 0.298 ms Execution Time: 0.036 ms (7 rows) No heap fetches, so it must have done the filter from the index. Why not in the original case? regards, tom lane
Re: Trying to understand why a query is filtering when there is a composite index
From
"Stephen Samuel (Sam)"
Date:
select all_visible, count(*)
from pg_visibility('table')
group by all_visible
from pg_visibility('table')
group by all_visible
false,1614
true,30575
true,30575
The table is partitioned if that matters (but same results if I run the queries directly on the partition).
On Sun, 18 Aug 2024 at 23:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, Aug 18, 2024 at 10:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, that part is a weakness I've wanted to fix for a long
>> time: it could do the filter condition by fetching b from the
>> index, but it doesn't notice that and has to go to the heap
>> to get b.
> It was fixed? At least on 17.
Oh, sorry, I was thinking of a related problem that doesn't apply
here: matching indexes on expressions to fragments of a filter
condition. However, the fact that the OP's EXPLAIN shows heap
fetches from a supposedly all-visible table suggests that his
IN isn't getting optimized that way. I wonder why --- it seems
to work for me, even in fairly old versions. Taking a parallel
example from the regression database, even v12 can do
regression=# explain analyze select tenthous from tenk1 where thousand=99 and tenthous in (1,4,7,9,11,55,66,88,99,77,8876,9876);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..4.61 rows=1 width=4) (actual time=0.016..0.018 rows=1 loops=1)
Index Cond: (thousand = 99)
Filter: (tenthous = ANY ('{1,4,7,9,11,55,66,88,99,77,8876,9876}'::integer[]))
Rows Removed by Filter: 9
Heap Fetches: 0
Planning Time: 0.298 ms
Execution Time: 0.036 ms
(7 rows)
No heap fetches, so it must have done the filter from the index.
Why not in the original case?
regards, tom lane
Re: Trying to understand why a query is filtering when there is a composite index
From
Shiv Iyer
Date:
Hello,
The query's behavior is expected due to how PostgreSQL handles composite indexes and MVCC. The index on `(a, b)` is used efficiently for the `a` condition, but the `b IN (<ids>)` filter is more complex, leading to additional filtering rather than direct index usage. Although the index-only scan is utilized, heap fetches still occur to verify tuple visibility, a necessary step when the visibility map doesn’t confirm visibility or to apply the `b` filter accurately. This is standard in PostgreSQL’s handling of such queries, ensuring data consistency and accuracy. Performance remains good, but these heap fetches could be optimized if needed by reconsidering the index structure or query design. Thank you!
On Mon, Aug 19, 2024 at 7:26 AM Stephen Samuel (Sam) <sam@sksamuel.com> wrote:
Hi folks.I have a table with 4.5m rows per partition (16 partitions) (I know, very small, probably didn't need to be partitioned).The table has two columns, a bigint and b text.There is a unique index on (a,b)The query is:SELECT b
FROM table
WHERE a = <id>
AND b IN (<ids>)
The visibility map is almost exclusively true.
This table gets few updates.
The planner says index only scan, but is filtering on b.
Index Only Scan using pkey on table (cost=0.46..29.09 rows=1 width=19) (actual time=0.033..0.053 rows=10 loops=1)
Index Cond: (a = 662028765)
" Filter: (b = ANY ('{634579987:662028765,561730945:662028765,505555183:662028765,472806302:662028765,401361055:662028765,363587258:662028765,346093772:662028765,314369897:662028765,289498328:662028765,217993946:662028765}'::text[]))"
Rows Removed by Filter: 1
Heap Fetches: 11
Planning Time: 0.095 ms
Execution Time: 0.070 ms
My question is, why isn't it using the index for column b? Is this expected? And why is it doing heap lookups for every row,.Performance is still good, but I am curious.Thanks in advance!
Best Regards
Shiv Iyer
Re: Trying to understand why a query is filtering when there is a composite index
From
Peter Geoghegan
Date:
On Mon, Aug 19, 2024 at 12:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It was fixed? At least on 17. > > Oh, sorry, I was thinking of a related problem that doesn't apply > here: matching indexes on expressions to fragments of a filter > condition. However, the fact that the OP's EXPLAIN shows heap > fetches from a supposedly all-visible table suggests that his > IN isn't getting optimized that way. As you pointed out, the number of tuples filtered out by the filter qual is only a small proportion of the total in this particular example (wasn't really paying attention to that aspect myself). I guess that that factor makes the Postgres 17 nbtree SAOP work almost irrelevant to the exact scenario shown, since even if true index quals could be used they'd only save at most one heap page access. I would still expect the 17 work to make the query slightly faster, since my testing showed that avoiding expression evaluation is slightly faster. Plus it would *definitely* make similar queries faster by avoiding heap access entirely -- cases where the use of true index quals can eliminate most heap page accesses. > No heap fetches, so it must have done the filter from the index. > Why not in the original case? My guess is that that's due to some kind of naturally occuring correlation. The few unset-in-VM pages are disproportionately likely to become heap fetches. The difficulty at predicting this kind of variation argues for an approach that makes as many decisions as possible at runtime. This is particularly true of how we skip within the index scan. I wouldn't expect skipping to be useful in the exact scenario shown, but why not be open to the possibility? If the planner only has one choice then there are no wrong choices. -- Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
"Stephen Samuel (Sam)"
Date:
Just for my own knowledge:
This index covers both columns needed in the predicate/projection, and the visibility bit is almost always set, why does it need to go to the heap at all and doesn't just get what it needs from the index?
Or does scanning the _vm table count as a heap access in the planner ?
On Mon, 19 Aug 2024 at 10:21, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Aug 19, 2024 at 12:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It was fixed? At least on 17.
>
> Oh, sorry, I was thinking of a related problem that doesn't apply
> here: matching indexes on expressions to fragments of a filter
> condition. However, the fact that the OP's EXPLAIN shows heap
> fetches from a supposedly all-visible table suggests that his
> IN isn't getting optimized that way.
As you pointed out, the number of tuples filtered out by the filter
qual is only a small proportion of the total in this particular
example (wasn't really paying attention to that aspect myself). I
guess that that factor makes the Postgres 17 nbtree SAOP work almost
irrelevant to the exact scenario shown, since even if true index quals
could be used they'd only save at most one heap page access.
I would still expect the 17 work to make the query slightly faster,
since my testing showed that avoiding expression evaluation is
slightly faster. Plus it would *definitely* make similar queries
faster by avoiding heap access entirely -- cases where the use of true
index quals can eliminate most heap page accesses.
> No heap fetches, so it must have done the filter from the index.
> Why not in the original case?
My guess is that that's due to some kind of naturally occuring
correlation. The few unset-in-VM pages are disproportionately likely
to become heap fetches.
The difficulty at predicting this kind of variation argues for an
approach that makes as many decisions as possible at runtime. This is
particularly true of how we skip within the index scan. I wouldn't
expect skipping to be useful in the exact scenario shown, but why not
be open to the possibility? If the planner only has one choice then
there are no wrong choices.
--
Peter Geoghegan
Re: Trying to understand why a query is filtering when there is a composite index
From
Tom Lane
Date:
"Stephen Samuel (Sam)" <sam@sksamuel.com> writes: > This index covers both columns needed in the predicate/projection, and the > visibility bit is almost always set, why does it need to go to the heap at > all and doesn't just get what it needs from the index? Peter's theory was that the particular tuples you were fetching were in not-all-visible pages. That seems plausible to me. regards, tom lane
Re: Trying to understand why a query is filtering when there is a composite index
From
"Stephen Samuel (Sam)"
Date:
Ah, his theory was that I got unlucky in my sample queries.
If I pick data that's much older in the table, then it would seem to confirm his theory.
Index Only Scan using xx (cost=0.52..25.07 rows=1 width=19) (actual time=0.032..0.039 rows=34 loops=1)
Index Cond: (a = 1654)
" Filter: (b = ANY ('{1654:150843999,1654:178559906,1654:196691125,1654:213859809,1654:215290364,1654:232833953,1654:234187139,1654:235553821,1654:2514914,1654:27042020,1654:28414362,1654:290939423,1654:294364845,1654:302084789,1654:308624761,1654:321909343,1654:325450448,1654:333349583,1654:333780122,1654:352705002,1654:357720420,1654:360894242,1654:37357227,1654:38419057,1654:397848555,1654:398104037,1654:414568491,1654:415804877,1654:425839729,1654:428927290,1654:430795031,1654:432428733,1654:485645738,1654:490213252}'::text[]))"
Rows Removed by Filter: 8
Heap Fetches: 1
Planning Time: 0.348 ms
Execution Time: 0.058 ms
Index Cond: (a = 1654)
" Filter: (b = ANY ('{1654:150843999,1654:178559906,1654:196691125,1654:213859809,1654:215290364,1654:232833953,1654:234187139,1654:235553821,1654:2514914,1654:27042020,1654:28414362,1654:290939423,1654:294364845,1654:302084789,1654:308624761,1654:321909343,1654:325450448,1654:333349583,1654:333780122,1654:352705002,1654:357720420,1654:360894242,1654:37357227,1654:38419057,1654:397848555,1654:398104037,1654:414568491,1654:415804877,1654:425839729,1654:428927290,1654:430795031,1654:432428733,1654:485645738,1654:490213252}'::text[]))"
Rows Removed by Filter: 8
Heap Fetches: 1
Planning Time: 0.348 ms
Execution Time: 0.058 ms
On Mon, 19 Aug 2024 at 18:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Stephen Samuel (Sam)" <sam@sksamuel.com> writes:
> This index covers both columns needed in the predicate/projection, and the
> visibility bit is almost always set, why does it need to go to the heap at
> all and doesn't just get what it needs from the index?
Peter's theory was that the particular tuples you were fetching were
in not-all-visible pages. That seems plausible to me.
regards, tom lane