Index-only scans vs. partially-retrievable indexes - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Index-only scans vs. partially-retrievable indexes |
Date | |
Msg-id | 3179992.1641150853@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Index-only scans vs. partially-retrievable indexes
Re: Index-only scans vs. partially-retrievable indexes |
List | pgsql-hackers |
Yesterday I pushed what I thought was a quick fix for bug #17350 [1]. In short, if we have an index that stores both "x" and "f(x)", where the "x" column can be retrieved in index-only scans but "f(x)" cannot, it's possible for the planner to generate an IOS plan that nonetheless tries to read the f(x) index column. The bug report concerns the case where f(x) is needed in the IOS plan node's targetlist, and I did fix that --- but I now realize that we still have a problem with respect to rechecks of the plan node's indexquals. Here's an example: regression=# create extension pg_trgm; CREATE EXTENSION regression=# create table t(a text); CREATE TABLE regression=# create index on t using gist(lower(a) gist_trgm_ops) include (a); CREATE INDEX regression=# insert into t values('zed'); INSERT 0 1 regression=# insert into t values('z'); INSERT 0 1 regression=# select * from t where lower(a) like 'z'; a --- z (1 row) That's the correct answer, but we're using a bitmap scan to get it. If we force an IOS plan: regression=# set enable_bitmapscan = 0; SET regression=# explain select * from t where lower(a) like 'z'; QUERY PLAN ------------------------------------------------------------------------------ Index Only Scan using t_lower_a_idx on t (cost=0.14..28.27 rows=7 width=32) Index Cond: ((lower(a)) ~~ 'z'::text) (2 rows) regression=# select * from t where lower(a) like 'z'; a --- (0 rows) That's from a build a few days old. As of HEAD it's even worse; not only do we fail to return the rows we should, but EXPLAIN says regression=# explain select * from t where lower(a) like 'z'; QUERY PLAN ------------------------------------------------------------------------------ Index Only Scan using t_lower_a_idx on t (cost=0.14..28.27 rows=7 width=32) Index Cond: ((NULL::text) ~~ 'z'::text) (2 rows) At least this is showing us what's happening: the index recheck condition sees a NULL for the value of lower(a). That's because it's trying to get the value of lower(a) out of the index, instead of recomputing it from the value of a. AFAICS this has been broken since 9.5 allowed indexes to contain both retrievable and non-retrievable columns, so it's a bit surprising that it hasn't been reported before. I suppose that the case was harder to hit before we introduced INCLUDE columns. The relevant code actually claims that it's impossible: /* * If the index was lossy, we have to recheck the index quals. * (Currently, this can never happen, but we should support the case * for possible future use, eg with GiST indexes.) */ if (scandesc->xs_recheck) { econtext->ecxt_scantuple = slot; if (!ExecQualAndReset(node->indexqual, econtext)) { /* Fails recheck, so drop it and loop back for another */ InstrCountFiltered2(node, 1); continue; } } That comment may have been true when written (it dates to 9.2) but it's demonstrably not true now; the test case I just gave traverses this code, and gets the wrong answer. I don't think there is any way to fix this that doesn't involve adding another field to structs IndexOnlyScan and IndexOnlyScanState. We need a version of the indexqual that references the retrievable index column x and computes f(x) from that, but the indexqual that's passed to the index AM still has to reference the f(x) index column. That's annoying from an API stability standpoint. In the back branches, we can add the new fields at the end to minimize ABI breakage, but we will still be breaking any extension code that thinks it knows how to generate an IndexOnlyScan node directly. (But maybe there isn't any. The Path representation doesn't need to change, so typical planner extensions should be OK.) Unless somebody's got a better idea, I'll push forward with making this happen. regards, tom lane [1] https://www.postgresql.org/message-id/flat/17350-b5bdcf476e5badbb%40postgresql.org
pgsql-hackers by date: