Thread: BUG #17975: Nested Loop Index Scan returning wrong result
The following bug has been logged on the website: Bug reference: 17975 Logged by: Tor Erik Linnerud Email address: tel@jklm.no PostgreSQL version: 15.3 Operating system: MacOS 13.4, Linux 5.16 Description: Hi, first let me say thanks for all the hard work that goes into Postgres. I ran into a very specific query + index + data combination that appears to return the wrong result. After much trial and error I’ve been able to construct a dump to reproduce the problem, when running ANALYZE after the import. 1. Grab the DB dump (13 MB) curl -L "https://www.dropbox.com/s/k1ai0765gc2k98f/bug5.sql?dl=1" -o bug5.sql 2. Create an empty database, import the dump and analyze: createdb bug5 && psql -d bug5 -f bug5.sql && psql -d bug5 -c "ANALYZE" 3. Run queries: psql -d bug5 -c "set enable_indexscan = 'off'; SELECT a.id FROM a JOIN b ON b.a_id = a.id JOIN c ON c.id = b.c_id WHERE c.tag = '13880'" 1 row expected, get 1 psql -d bug5 -c "set enable_indexscan = 'on'; SELECT a.id FROM a JOIN b ON b.a_id = a.id JOIN c ON c.id = b.c_id WHERE c.tag = '13880'" 1 row expected, get 0 Thanks, Tor Erik
Hi, On 2023-06-14 15:17:31 +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17975 > Logged by: Tor Erik Linnerud > Email address: tel@jklm.no > PostgreSQL version: 15.3 > Operating system: MacOS 13.4, Linux 5.16 > Description: > > Hi, first let me say thanks for all the hard work that goes into Postgres. > > > I ran into a very specific query + index + data combination that appears to > return the wrong result. After much trial and error I’ve been able to > construct a dump to reproduce the problem, when running ANALYZE after the > import. > > 1. Grab the DB dump (13 MB) > > curl -L "https://www.dropbox.com/s/k1ai0765gc2k98f/bug5.sql?dl=1" -o > bug5.sql > > 2. Create an empty database, import the dump and analyze: > > createdb bug5 && psql -d bug5 -f bug5.sql && psql -d bug5 -c "ANALYZE" > > 3. Run queries: > > psql -d bug5 -c "set enable_indexscan = 'off'; SELECT a.id FROM a JOIN b ON > b.a_id = a.id JOIN c ON c.id = b.c_id WHERE c.tag = '13880'" > > 1 row expected, get 1 > > psql -d bug5 -c "set enable_indexscan = 'on'; SELECT a.id FROM a JOIN b ON > b.a_id = a.id JOIN c ON c.id = b.c_id WHERE c.tag = '13880'" > > 1 row expected, get 0 Uh, huh. I can reproduce that. And it's not just 15, reproduces in old versions too. The plan doesn't look obviously wrong: ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Nested Loop (cost=1.26..17.46 rows=1 width=8) (actual time=0.063..0.065 rows=0 loops=1) │ │ Output: a.id │ │ Inner Unique: true │ │ -> Nested Loop (cost=0.84..16.88 rows=1 width=8) (actual time=0.056..0.059 rows=1 loops=1) │ │ Output: b.a_id │ │ Inner Unique: true │ │ -> Index Scan using c_tag_idx on public.c (cost=0.42..8.44 rows=1 width=8) (actual time=0.034..0.036 rows=1 loops=1) │ │ Output: c.id, c.tag │ │ Index Cond: ((c.tag)::text = '13880'::text) │ │ -> Index Scan using index_b_c_id on public.b (cost=0.42..8.44 rows=1 width=16) (actual time=0.015..0.016 rows=1loops=1) │ │ Output: b.id, b.c_id, b.a_id │ │ Index Cond: (b.c_id = c.id) │ │ -> Index Only Scan using a_pkey on public.a (cost=0.42..0.59 rows=1 width=8) (actual time=0.002..0.003 rows=0 loops=1) │ │ Output: a.id │ │ Index Cond: (a.id = b.a_id) │ │ Heap Fetches: 0 │ │ Planning Time: 0.998 ms │ │ Execution Time: 0.148 ms │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (18 rows) It's not an IOS issue, it happens without IOS as well. Something seems to be off with the relevant param - it's NULL. Haven't dug deeper. Greetings, Andres Freund
On Wed, Jun 14, 2023 at 1:37 PM Andres Freund <andres@anarazel.de> wrote: > Something seems to be off with the relevant param - it's NULL. Haven't dug > deeper. I see that the bad plan has _bt_first() return false early during a btgettuple() call for one of the a_pkey index scans. That is, _bt_first() returns false early in the "Quit now if _bt_preprocess_keys() discovered that the scan keys can never be satisfied (eg, x == 1 AND x > 2)" path. It's possible that this is a red herring -- this is not the only difference between the "bad" index scan heavy plan and the "good" plan that uses bitmap scans. But right now my suspicion falls on _bt_preprocess_keys(). -- Peter Geoghegan
Hi, On 2023-06-14 13:36:51 -0700, Andres Freund wrote: > On 2023-06-14 15:17:31 +0000, PG Bug reporting form wrote: > > The following bug has been logged on the website: > > > > Bug reference: 17975 > > Logged by: Tor Erik Linnerud > > Email address: tel@jklm.no > > PostgreSQL version: 15.3 > > Operating system: MacOS 13.4, Linux 5.16 > > Description: > > > > Hi, first let me say thanks for all the hard work that goes into Postgres. > > > > > > I ran into a very specific query + index + data combination that appears to > > return the wrong result. After much trial and error I’ve been able to > > construct a dump to reproduce the problem, when running ANALYZE after the > > import. > > > > 1. Grab the DB dump (13 MB) > > > > curl -L "https://www.dropbox.com/s/k1ai0765gc2k98f/bug5.sql?dl=1" -o > > bug5.sql > > > > 2. Create an empty database, import the dump and analyze: > > > > createdb bug5 && psql -d bug5 -f bug5.sql && psql -d bug5 -c "ANALYZE" > > > > 3. Run queries: > > > > psql -d bug5 -c "set enable_indexscan = 'off'; SELECT a.id FROM a JOIN b ON > > b.a_id = a.id JOIN c ON c.id = b.c_id WHERE c.tag = '13880'" > > > > 1 row expected, get 1 > > > > psql -d bug5 -c "set enable_indexscan = 'on'; SELECT a.id FROM a JOIN b ON > > b.a_id = a.id JOIN c ON c.id = b.c_id WHERE c.tag = '13880'" > > > > 1 row expected, get 0 > > Uh, huh. I can reproduce that. And it's not just 15, reproduces in old > versions too. > > The plan doesn't look obviously wrong: > ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN │ > ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ Nested Loop (cost=1.26..17.46 rows=1 width=8) (actual time=0.063..0.065 rows=0 loops=1) │ > │ Output: a.id │ > │ Inner Unique: true │ > │ -> Nested Loop (cost=0.84..16.88 rows=1 width=8) (actual time=0.056..0.059 rows=1 loops=1) │ > │ Output: b.a_id │ > │ Inner Unique: true │ > │ -> Index Scan using c_tag_idx on public.c (cost=0.42..8.44 rows=1 width=8) (actual time=0.034..0.036 rows=1loops=1) │ > │ Output: c.id, c.tag │ > │ Index Cond: ((c.tag)::text = '13880'::text) │ > │ -> Index Scan using index_b_c_id on public.b (cost=0.42..8.44 rows=1 width=16) (actual time=0.015..0.016 rows=1loops=1) │ > │ Output: b.id, b.c_id, b.a_id │ > │ Index Cond: (b.c_id = c.id) │ > │ -> Index Only Scan using a_pkey on public.a (cost=0.42..0.59 rows=1 width=8) (actual time=0.002..0.003 rows=0 loops=1) │ > │ Output: a.id │ > │ Index Cond: (a.id = b.a_id) │ > │ Heap Fetches: 0 │ > │ Planning Time: 0.998 ms │ > │ Execution Time: 0.148 ms │ > └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > (18 rows) > > It's not an IOS issue, it happens without IOS as well. > > > Something seems to be off with the relevant param - it's NULL. Haven't dug > deeper. I think it's a problem with the uniqueness determination / missing a qual / index selection. There are two rows in b with b.c_id = 13880, except that one of them has a NULL a_id: => SELECT * FROM b WHERE c_id = 13880; ┌────────┬───────┬────────┐ │ id │ c_id │ a_id │ ├────────┼───────┼────────┤ │ 326048 │ 13880 │ (null) │ │ 572151 │ 13880 │ 955968 │ └────────┴───────┴────────┘ (2 rows) . The uniqueness information comes from: "index_a_cannot_share_c" UNIQUE, btree (c_id) WHERE a_id IS NOT NULL But note that we aren't using that index, we use "index_b_c_id" btree (c_id) which of course also contains the a_id = NULL row. We either need to force the index that we got the uniqueness information to be used when it is partial, or add the quals from the partial unique index to all other index scans. Greetings, Andres Freund
Hi, On 2023-06-14 14:10:02 -0700, Peter Geoghegan wrote: > On Wed, Jun 14, 2023 at 1:37 PM Andres Freund <andres@anarazel.de> wrote: > > Something seems to be off with the relevant param - it's NULL. Haven't dug > > deeper. > > I see that the bad plan has _bt_first() return false early during a > btgettuple() call for one of the a_pkey index scans. That is, > _bt_first() returns false early in the "Quit now if > _bt_preprocess_keys() discovered that the scan keys can never be > satisfied (eg, x == 1 AND x > 2)" path. I think that's a consequence of the planner bug (see subsequent email) - the bad plan wrongly uses an index that first returns a null a_id, which then quite legitimately makes _bt_preprocess_keys() exit early. That bit of smartness did confuse me for a moment, I was expecting the index scan to actually scan the index at first :). I haven't figured out where we're dropping the implied index qual on the floor yet... Greetings, Andres Freund
On Wed, Jun 14, 2023 at 2:21 PM Andres Freund <andres@anarazel.de> wrote: > I think that's a consequence of the planner bug (see subsequent email) - the > bad plan wrongly uses an index that first returns a null a_id, which then > quite legitimately makes _bt_preprocess_keys() exit early. That bit of > smartness did confuse me for a moment, I was expecting the index scan to > actually scan the index at first :). It certainly looks that way now. On further examination the only substantive difference is that the indexes themselves differ in one leaf of the plan. The good plan and the bad plan match everywhere else (except that the good plan uses bitmap index scans rather than index scans everywhere). -- Peter Geoghegan
Hi, On 2023-06-14 14:12:31 -0700, Andres Freund wrote: > I think it's a problem with the uniqueness determination / missing a > qual / index selection. > > There are two rows in b with b.c_id = 13880, except that one of them has a > NULL a_id: > > => SELECT * FROM b WHERE c_id = 13880; > ┌────────┬───────┬────────┐ > │ id │ c_id │ a_id │ > ├────────┼───────┼────────┤ > │ 326048 │ 13880 │ (null) │ > │ 572151 │ 13880 │ 955968 │ > └────────┴───────┴────────┘ > (2 rows) > > . The uniqueness information comes from: > "index_a_cannot_share_c" UNIQUE, btree (c_id) WHERE a_id IS NOT NULL > > But note that we aren't using that index, we use > "index_b_c_id" btree (c_id) > > which of course also contains the a_id = NULL row. > > > We either need to force the index that we got the uniqueness information to be > used when it is partial, or add the quals from the partial unique index to all > other index scans. I suspect this is an issue going back to 9c7f5229. Indeed, 9.6 doesn't reproduce the issue (9c7f5229 was in 10). I haven't bisected it down to that, but it seems pretty likely - and if it's not that commit, it's a closely related one. It's not immediately obvious to me how to nicely fix this in a backpatchable way. An easy fix would be to not allow predicate indexes at all anymore in relation_has_unique_index_for(), but that's a pretty big cannon - fixes the issue though. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-06-14 14:12:31 -0700, Andres Freund wrote: >> We either need to force the index that we got the uniqueness information to be >> used when it is partial, or add the quals from the partial unique index to all >> other index scans. I did not study this example yet, but generally we ignore predicate indexes unless their predicates can be proven from base restrictions of their table (that's what predOK means). If the base restrictions are enforced at scan level, which they should be, then uniqueness should hold at any join level regardless of whether we actually scanned with that index or some other way. Maybe we broke that chain of reasoning somehow? > It's not immediately obvious to me how to nicely fix this in a backpatchable > way. An easy fix would be to not allow predicate indexes at all anymore in > relation_has_unique_index_for(), but that's a pretty big cannon - fixes the > issue though. Yeah, that would be the easy way out if we don't find a better answer. But right at the moment I don't understand why this is failing. regards, tom lane
Hi, On 2023-06-14 19:02:48 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2023-06-14 14:12:31 -0700, Andres Freund wrote: > >> We either need to force the index that we got the uniqueness information to be > >> used when it is partial, or add the quals from the partial unique index to all > >> other index scans. > > I did not study this example yet, but generally we ignore predicate > indexes unless their predicates can be proven from base restrictions > of their table (that's what predOK means). If the base restrictions > are enforced at scan level, which they should be, then uniqueness > should hold at any join level regardless of whether we actually > scanned with that index or some other way. Maybe we broke that > chain of reasoning somehow? It doesn't really hold at lower join levels with partial unique indexes, at least as far as inner_unique goes. In this case we have one partial unique index on b(c_id) WHERE a_id IS NOT NULL, and we have a plain index on b(c_id). inner_unique is set to true based on the partial index - but then we decide use the non-partial index for the index scan. That ends up returning a row which with a_is = NULL, which won't find a match in the upper join levels. But because the join is inner_unique, it'll not try to find another row on the inner side. > > It's not immediately obvious to me how to nicely fix this in a backpatchable > > way. An easy fix would be to not allow predicate indexes at all anymore in > > relation_has_unique_index_for(), but that's a pretty big cannon - fixes the > > issue though. > > Yeah, that would be the easy way out if we don't find a better answer. > But right at the moment I don't understand why this is failing. Hope the above made it a bit clearer? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2023-06-14 19:02:48 -0400, Tom Lane wrote: >> I did not study this example yet, but generally we ignore predicate >> indexes unless their predicates can be proven from base restrictions >> of their table (that's what predOK means). > It doesn't really hold at lower join levels with partial unique indexes, at > least as far as inner_unique goes. In this case we have one partial unique > index on b(c_id) WHERE a_id IS NOT NULL, and we have a plain index on b(c_id). > inner_unique is set to true based on the partial index - but then we decide > use the non-partial index for the index scan. That ends up returning a row > which with a_is = NULL, which won't find a match in the upper join > levels. But how did it decide that the partial index is predOK, if there's not a qual forcing a_id to not be null? regards, tom lane
Hi, On 2023-06-14 19:59:26 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2023-06-14 19:02:48 -0400, Tom Lane wrote: > >> I did not study this example yet, but generally we ignore predicate > >> indexes unless their predicates can be proven from base restrictions > >> of their table (that's what predOK means). > > > It doesn't really hold at lower join levels with partial unique indexes, at > > least as far as inner_unique goes. In this case we have one partial unique > > index on b(c_id) WHERE a_id IS NOT NULL, and we have a plain index on b(c_id). > > inner_unique is set to true based on the partial index - but then we decide > > use the non-partial index for the index scan. That ends up returning a row > > which with a_is = NULL, which won't find a match in the upper join > > levels. > > But how did it decide that the partial index is predOK, if there's not > a qual forcing a_id to not be null? There is - but it's at a higher join level. That would prevent us from returning a wrongly matching row, but in the inner_unique case we don't even get to that point. We obviously could make it correct by injecting the relevant check into the index scan on the inner side, but it doesn't look trivial to do so. Greetings, Andres Freund
On Thu, 15 Jun 2023 at 11:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > It doesn't really hold at lower join levels with partial unique indexes, at > > least as far as inner_unique goes. In this case we have one partial unique > > index on b(c_id) WHERE a_id IS NOT NULL, and we have a plain index on b(c_id). > > inner_unique is set to true based on the partial index - but then we decide > > use the non-partial index for the index scan. That ends up returning a row > > which with a_is = NULL, which won't find a match in the upper join > > levels. > > But how did it decide that the partial index is predOK, if there's not > a qual forcing a_id to not be null? In check_index_predicates(), we seem to use the results of generate_join_implied_equalities() as index predicate proofs. That seems fishy because surely those are only valid to use in cases after the join for the particular predicate is evaluated. In this case a.id = b.a_id AND c.id = b.c_id are used as proofs. I didn't debug all the way, but I assume we deduce that the NOT NULL index is ok due to the strict join quals. David
On Thu, 15 Jun 2023 at 12:16, David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 15 Jun 2023 at 11:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But how did it decide that the partial index is predOK, if there's not > > a qual forcing a_id to not be null? > > In check_index_predicates(), we seem to use the results of > generate_join_implied_equalities() as index predicate proofs. That > seems fishy because surely those are only valid to use in cases after > the join for the particular predicate is evaluated. > > In this case a.id = b.a_id AND c.id = b.c_id are used as proofs. I > didn't debug all the way, but I assume we deduce that the NOT NULL > index is ok due to the strict join quals. Isn't it just a wrong assumption to use predOK for unique proofs for all cases here. predOK seems ok to set to true for when it comes to using that index for the query as by the time we finish evaluating all the joins we'd have filtered out all rows that the partial index won't contain... The problem just lies with the assumption that it's ok to use the unique partial index as proofs before evaluation of all quals that were used to determine predOK is true. Perhaps we need another predOK field that just uses baserestrictinfos... David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 15 Jun 2023 at 11:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But how did it decide that the partial index is predOK, if there's not >> a qual forcing a_id to not be null? > In this case a.id = b.a_id AND c.id = b.c_id are used as proofs. I > didn't debug all the way, but I assume we deduce that the NOT NULL > index is ok due to the strict join quals. Ah. So the index is okay to use as far as the final result is concerned, but it's not okay for the purpose to which it's being put. I concur that we'd better just not use partial indexes in relation_has_unique_index_for. regards, tom lane
On Thu, 15 Jun 2023 at 12:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I concur that we'd better just not use partial indexes in > relation_has_unique_index_for. I wonder if that's ok for a backpatch. This affects both left join removals and unique joins. Seems like suddenly making a left join removal not work might cause someone some pain. Is it worth trying to jam in a new boolean field into IndexOptInfo into some spare padding to that we run predicate_implied_by() just using baserestrictinfo and use those in relation_has_unique_index_for()? One problem might be that I don't see anywhere we could jam another boolean into IndexOptInfo on 32-bit builds. Maybe we could put it at the end given that we should only be making these structs in core? David
David Rowley <dgrowleyml@gmail.com> writes: > On Thu, 15 Jun 2023 at 12:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I concur that we'd better just not use partial indexes in >> relation_has_unique_index_for. > I wonder if that's ok for a backpatch. This affects both left join > removals and unique joins. Seems like suddenly making a left join > removal not work might cause someone some pain. I kind of doubt that this will affect any large number of users. If it did, we'd have had reports sooner. > Is it worth trying to jam in a new boolean field into IndexOptInfo > into some spare padding to that we run predicate_implied_by() just > using baserestrictinfo and use those in > relation_has_unique_index_for()? How will that work with the caching in innerrel_is_unique? I also seriously doubt that we can make such a thing work without adding parameters to any externally-visible functions. regards, tom lane
On Thu, 15 Jun 2023 at 12:50, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > On Thu, 15 Jun 2023 at 12:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I concur that we'd better just not use partial indexes in > >> relation_has_unique_index_for. > > > I wonder if that's ok for a backpatch. This affects both left join > > removals and unique joins. Seems like suddenly making a left join > > removal not work might cause someone some pain. > > I kind of doubt that this will affect any large number of users. > If it did, we'd have had reports sooner. OK, I've attached a patch for just not using partial indexes as uniqueness proofs. > > Is it worth trying to jam in a new boolean field into IndexOptInfo > > into some spare padding to that we run predicate_implied_by() just > > using baserestrictinfo and use those in > > relation_has_unique_index_for()? > > How will that work with the caching in innerrel_is_unique? > I also seriously doubt that we can make such a thing work > without adding parameters to any externally-visible functions. relation_has_unique_index_for() would just check this new field that gets set only from baserestrictinfo quals. We'd just completely ignore join quals for the new boolean field that'll be used in relation_has_unique_index_for(). You're probably thinking that we could use join quals up to the level that we've already joined to for unique joins. That's probably true, but seems more complex than it might be worth to do that. I imagined the majority of predOK are set to true based on baserestrictinfo quals. I've no data to back that up, however. I was just trying to think of ways to lessen the cases we regress from fixing this bug. David
Attachment
On Thu, 15 Jun 2023 at 16:33, David Rowley <dgrowleyml@gmail.com> wrote: > OK, I've attached a patch for just not using partial indexes as > uniqueness proofs. I still have concerns about not doing anything for partial unique indexes that are predOK as far as the baserestrictinfos are concerned. I did push the patch to disable the unique join optimisation for all partial unique indexes for now as we do need to at least fix the bug. I've attached another patch which adds predOKBase and uses just baserestrictinfo quals to set that field. If we were to backpatch something like this, we'd need to figure out how to represent this new field in IndexOptInfo. Changing the predOK bool to a uint8 and having 3 values might be an ok method. David