Thread: Index-only scan for btree_gist turns bpchar to char
Hello hackers, While testing the index-only scan fix, I've discovered that replacing the index-only scan with the index scan changes contrib/btree_gist output because index-only scan for btree_gist returns a string without padding. A simple demonstration (based on btree_gist/sql/char.sql): CREATE EXTENSION btree_gist; CREATE TABLE chartmp (a char(32)); INSERT INTO chartmp VALUES('31b0'); CREATE INDEX charidx ON chartmp USING gist ( a ); SET enable_seqscan=off; EXPLAIN VERBOSE SELECT *, octet_length(a) FROM chartmp WHERE a BETWEEN '31a' AND '31c'; SELECT *, octet_length(a) FROM chartmp WHERE a BETWEEN '31a' AND '31c'; SET enable_indexonlyscan=off; EXPLAIN VERBOSE SELECT *, octet_length(a) FROM chartmp WHERE a BETWEEN '31a' AND '31c'; SELECT *, octet_length(a) FROM chartmp WHERE a BETWEEN '31a' AND '31c'; QUERY PLAN ------------------------------------------------------------------------------ Index Only Scan using charidx on chartmp (cost=0.12..8.15 rows=1 width=136) Index Cond: ((a >= '31a'::bpchar) AND (a <= '31c'::bpchar)) (2 rows) a | octet_length ------+-------------- 31b0 | 4 (1 row) QUERY PLAN ------------------------------------------------------------------------- Index Scan using charidx on chartmp (cost=0.12..8.15 rows=1 width=136) Index Cond: ((a >= '31a'::bpchar) AND (a <= '31c'::bpchar)) (2 rows) a | octet_length ----------------------------------+-------------- 31b0 | 32 (1 row) It seems that loosing blank padding is incorrect (btree and btree_gin preserve padding with index-only scan) but it's recorded in contrib/btree_gist/expected/char.out. Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > While testing the index-only scan fix, I've discovered that replacing > the index-only scan with the index scan changes contrib/btree_gist > output because index-only scan for btree_gist returns a string without > padding. Ugh, yeah. This seems to be because gbt_bpchar_compress() strips trailing spaces (using rtrim1) before storing the value. The idea evidently is to simplify gbt_bpchar_consistent, but it's not acceptable if the opclass is supposed to support index-only scan. I see two ways to fix this: * Disallow index-only scan, by removing the fetch function for this opclass. This'd require a module version bump, so people wouldn't get that fix automatically. * Change gbt_bpchar_compress to not trim spaces (it becomes just like gbt_text_compress), and adapt gbt_bpchar_consistent to cope. This does nothing for the problem immediately, unless you REINDEX affected indexes --- but over time an index's entries would get replaced with untrimmed versions. I also wondered if we could make the fetch function reconstruct the padding, but it doesn't seem to have access to the necessary info. regards, tom lane
04.01.2022 22:19, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >> While testing the index-only scan fix, I've discovered that replacing >> the index-only scan with the index scan changes contrib/btree_gist >> output because index-only scan for btree_gist returns a string without >> padding. > Ugh, yeah. This seems to be because gbt_bpchar_compress() strips > trailing spaces (using rtrim1) before storing the value. The > idea evidently is to simplify gbt_bpchar_consistent, but it's not > acceptable if the opclass is supposed to support index-only scan. > > I see two ways to fix this: > > * Disallow index-only scan, by removing the fetch function for this > opclass. This'd require a module version bump, so people wouldn't > get that fix automatically. > > * Change gbt_bpchar_compress to not trim spaces (it becomes just > like gbt_text_compress), and adapt gbt_bpchar_consistent to cope. > This does nothing for the problem immediately, unless you REINDEX > affected indexes --- but over time an index's entries would get > replaced with untrimmed versions. I think that the second way is preferable in the long run. It doesn't need an explanation after years, why index-only scan is not supported for that type. One-time mentioning the change and the need for REINDEX in release notes seems more future-oriented to me. Best regards, Alexander
On Wed, 05 Jan 2022 at 03:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >> While testing the index-only scan fix, I've discovered that replacing >> the index-only scan with the index scan changes contrib/btree_gist >> output because index-only scan for btree_gist returns a string without >> padding. > > Ugh, yeah. This seems to be because gbt_bpchar_compress() strips > trailing spaces (using rtrim1) before storing the value. The > idea evidently is to simplify gbt_bpchar_consistent, but it's not > acceptable if the opclass is supposed to support index-only scan. > > I see two ways to fix this: > > * Disallow index-only scan, by removing the fetch function for this > opclass. This'd require a module version bump, so people wouldn't > get that fix automatically. > > * Change gbt_bpchar_compress to not trim spaces (it becomes just > like gbt_text_compress), and adapt gbt_bpchar_consistent to cope. > This does nothing for the problem immediately, unless you REINDEX > affected indexes --- but over time an index's entries would get > replaced with untrimmed versions. > > I also wondered if we could make the fetch function reconstruct the > padding, but it doesn't seem to have access to the necessary info. > If we fix this In the second way, the range query has the same results in both seq scan and index only scan. However, it will incur other problems. For the following query: SELECT *, octet_length(a) FROM chartmp WHERE a = '31b0'; Currently, we can get a | octet_length ------+-------------- 31b0 | 4 After fixed, we cannot get any result. For the equal condition, we must put the extra spaces to make it work. Here is a patch for POC testing. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c index 8019d11281..5fd425047f 100644 --- a/contrib/btree_gist/btree_text.c +++ b/contrib/btree_gist/btree_text.c @@ -121,16 +121,7 @@ gbt_bpchar_compress(PG_FUNCTION_ARGS) } if (entry->leafkey) - { - - Datum d = DirectFunctionCall1(rtrim1, entry->key); - GISTENTRY trim; - - gistentryinit(trim, d, - entry->rel, entry->page, - entry->offset, true); - retval = gbt_var_compress(&trim, &tinfo); - } + retval = gbt_var_compress(entry, &tinfo); else retval = entry; @@ -179,7 +170,6 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS) bool retval; GBT_VARKEY *key = (GBT_VARKEY *) DatumGetPointer(entry->key); GBT_VARKEY_R r = gbt_var_key_readable(key); - void *trim = (void *) DatumGetPointer(DirectFunctionCall1(rtrim1, PointerGetDatum(query))); /* All cases served by this function are exact */ *recheck = false; @@ -189,7 +179,7 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS) tinfo.eml = pg_database_encoding_max_length(); } - retval = gbt_var_consistent(&r, trim, strategy, PG_GET_COLLATION(), + retval = gbt_var_consistent(&r, query, strategy, PG_GET_COLLATION(), GIST_LEAF(entry), &tinfo, fcinfo->flinfo); PG_RETURN_BOOL(retval); }
Japin Li <japinli@hotmail.com> writes: > Here is a patch for POC testing. This is certainly not right. You've made gbt_bpchar_consistent work identically to gbt_text_consistent, but it needs to implement a test equivalent to bpchareq, ie ignore trailing spaces in both inputs. The minimum-effort fix would be to apply rtrim1 to both strings in gbt_bpchar_consistent, but I wonder if we can improve on that by pushing the ignore-trailing-spaces behavior further down. I didn't look yet at whether gbt_var_consistent can support any type-specific behavior. regards, tom lane
On Thu, 06 Jan 2022 at 00:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Japin Li <japinli@hotmail.com> writes: >> Here is a patch for POC testing. > > This is certainly not right. You've made gbt_bpchar_consistent > work identically to gbt_text_consistent, but it needs to implement > a test equivalent to bpchareq, ie ignore trailing spaces in both > inputs. > Thanks for your explaintion! The bpchareq already ignore trailing spaces in both inputs. The question is that the bpchar in btree_gist do not call bpchareq, it always call texteq. I tried the patch[1] and it works as expected, howerver, I don't think it's good way to fix this problem. > The minimum-effort fix would be to apply rtrim1 to both strings > in gbt_bpchar_consistent, but I wonder if we can improve on that > by pushing the ignore-trailing-spaces behavior further down. > I didn't look yet at whether gbt_var_consistent can support > any type-specific behavior. > Adding type-specific for gbt_var_consistent looks like more generally. For example, for bpchar type, we should call bpchareq rather than texteq. Am I understand right? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. [1] diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c index 8019d11281..7f45ee6e3b 100644 --- a/contrib/btree_gist/btree_text.c +++ b/contrib/btree_gist/btree_text.c @@ -121,16 +121,7 @@ gbt_bpchar_compress(PG_FUNCTION_ARGS) } if (entry->leafkey) - { - - Datum d = DirectFunctionCall1(rtrim1, entry->key); - GISTENTRY trim; - - gistentryinit(trim, d, - entry->rel, entry->page, - entry->offset, true); - retval = gbt_var_compress(&trim, &tinfo); - } + retval = gbt_var_compress(entry, &tinfo); else retval = entry; @@ -189,6 +180,11 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS) tinfo.eml = pg_database_encoding_max_length(); } + r.lower = (bytea *) DatumGetPointer(DirectFunctionCall1(rtrim1, + PointerGetDatum(r.lower))); + r.upper = (bytea *) DatumGetPointer(DirectFunctionCall1(rtrim1, + PointerGetDatum(r.upper))); + retval = gbt_var_consistent(&r, trim, strategy, PG_GET_COLLATION(), GIST_LEAF(entry), &tinfo, fcinfo->flinfo); PG_RETURN_BOOL(retval);
Japin Li <japinli@hotmail.com> writes: > On Thu, 06 Jan 2022 at 00:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The minimum-effort fix would be to apply rtrim1 to both strings >> in gbt_bpchar_consistent, but I wonder if we can improve on that >> by pushing the ignore-trailing-spaces behavior further down. >> I didn't look yet at whether gbt_var_consistent can support >> any type-specific behavior. > Adding type-specific for gbt_var_consistent looks like more generally. > For example, for bpchar type, we should call bpchareq rather than texteq. I looked at this and it does seem like it might work, as per attached patch. The one thing that is troubling me is that the opclass is set up to apply gbt_text_same, which is formally the Wrong Thing for bpchar, because the equality semantics shouldn't be quite the same. But we could not fix that without a module version bump, which is annoying. I think that it might not be necessary to change it, because (1) There's no such thing as unique GIST indexes, so it should not matter if the "same" function is a bit stricter than the datatype's nominal notion of equality. It's certainly okay for that to vary from the semantics applied by the consistent function --- GIST has no idea that the consistent function is allegedly testing equality. (2) If all the input values for a column have been coerced to the same typmod, then it doesn't matter because two values that are equal after space-stripping would be equal without space-stripping, too. However, (2) doesn't hold for an existing index that the user has failed to REINDEX, because then the index would contain some space-stripped values that same() will not say are equal to incoming new values. Again, I think this doesn't matter much, but maybe I'm missing something. I've not really dug into what GIST uses the same() function for. In any case, if we do need same() to implement the identical behavior to bpchareq(), then the other solution isn't sufficient either. So in short, it seems like we ought to do some compatibility testing and see if this code misbehaves at all with an index created by the old code. I don't particularly want to do that ... any volunteers? regards, tom lane diff --git a/contrib/btree_gist/btree_text.c b/contrib/btree_gist/btree_text.c index 8019d11281..be0eac7975 100644 --- a/contrib/btree_gist/btree_text.c +++ b/contrib/btree_gist/btree_text.c @@ -90,6 +90,76 @@ static gbtree_vinfo tinfo = NULL }; +/* bpchar needs its own comparison rules */ + +static bool +gbt_bpchargt(const void *a, const void *b, Oid collation, FmgrInfo *flinfo) +{ + return DatumGetBool(DirectFunctionCall2Coll(bpchargt, + collation, + PointerGetDatum(a), + PointerGetDatum(b))); +} + +static bool +gbt_bpcharge(const void *a, const void *b, Oid collation, FmgrInfo *flinfo) +{ + return DatumGetBool(DirectFunctionCall2Coll(bpcharge, + collation, + PointerGetDatum(a), + PointerGetDatum(b))); +} + +static bool +gbt_bpchareq(const void *a, const void *b, Oid collation, FmgrInfo *flinfo) +{ + return DatumGetBool(DirectFunctionCall2Coll(bpchareq, + collation, + PointerGetDatum(a), + PointerGetDatum(b))); +} + +static bool +gbt_bpcharle(const void *a, const void *b, Oid collation, FmgrInfo *flinfo) +{ + return DatumGetBool(DirectFunctionCall2Coll(bpcharle, + collation, + PointerGetDatum(a), + PointerGetDatum(b))); +} + +static bool +gbt_bpcharlt(const void *a, const void *b, Oid collation, FmgrInfo *flinfo) +{ + return DatumGetBool(DirectFunctionCall2Coll(bpcharlt, + collation, + PointerGetDatum(a), + PointerGetDatum(b))); +} + +static int32 +gbt_bpcharcmp(const void *a, const void *b, Oid collation, FmgrInfo *flinfo) +{ + return DatumGetInt32(DirectFunctionCall2Coll(bpcharcmp, + collation, + PointerGetDatum(a), + PointerGetDatum(b))); +} + +static gbtree_vinfo bptinfo = +{ + gbt_t_bpchar, + 0, + false, + gbt_bpchargt, + gbt_bpcharge, + gbt_bpchareq, + gbt_bpcharle, + gbt_bpcharlt, + gbt_bpcharcmp, + NULL +}; + /************************************************** * Text ops @@ -112,29 +182,8 @@ gbt_text_compress(PG_FUNCTION_ARGS) Datum gbt_bpchar_compress(PG_FUNCTION_ARGS) { - GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); - GISTENTRY *retval; - - if (tinfo.eml == 0) - { - tinfo.eml = pg_database_encoding_max_length(); - } - - if (entry->leafkey) - { - - Datum d = DirectFunctionCall1(rtrim1, entry->key); - GISTENTRY trim; - - gistentryinit(trim, d, - entry->rel, entry->page, - entry->offset, true); - retval = gbt_var_compress(&trim, &tinfo); - } - else - retval = entry; - - PG_RETURN_POINTER(retval); + /* This should never have been distinct from gbt_text_compress */ + return gbt_text_compress(fcinfo); } @@ -179,18 +228,17 @@ gbt_bpchar_consistent(PG_FUNCTION_ARGS) bool retval; GBT_VARKEY *key = (GBT_VARKEY *) DatumGetPointer(entry->key); GBT_VARKEY_R r = gbt_var_key_readable(key); - void *trim = (void *) DatumGetPointer(DirectFunctionCall1(rtrim1, PointerGetDatum(query))); /* All cases served by this function are exact */ *recheck = false; - if (tinfo.eml == 0) + if (bptinfo.eml == 0) { - tinfo.eml = pg_database_encoding_max_length(); + bptinfo.eml = pg_database_encoding_max_length(); } - retval = gbt_var_consistent(&r, trim, strategy, PG_GET_COLLATION(), - GIST_LEAF(entry), &tinfo, fcinfo->flinfo); + retval = gbt_var_consistent(&r, query, strategy, PG_GET_COLLATION(), + GIST_LEAF(entry), &bptinfo, fcinfo->flinfo); PG_RETURN_BOOL(retval); } diff --git a/contrib/btree_gist/expected/char.out b/contrib/btree_gist/expected/char.out index d715c045cc..45cf9f38d6 100644 --- a/contrib/btree_gist/expected/char.out +++ b/contrib/btree_gist/expected/char.out @@ -75,8 +75,8 @@ SELECT * FROM chartmp WHERE a BETWEEN '31a' AND '31c'; (2 rows) SELECT * FROM chartmp WHERE a BETWEEN '31a' AND '31c'; - a ------- - 31b0 + a +---------------------------------- + 31b0 (1 row) diff --git a/contrib/btree_gist/expected/char_1.out b/contrib/btree_gist/expected/char_1.out index 867318002b..7b7824b717 100644 --- a/contrib/btree_gist/expected/char_1.out +++ b/contrib/btree_gist/expected/char_1.out @@ -75,8 +75,8 @@ SELECT * FROM chartmp WHERE a BETWEEN '31a' AND '31c'; (2 rows) SELECT * FROM chartmp WHERE a BETWEEN '31a' AND '31c'; - a ------- - 31b0 + a +---------------------------------- + 31b0 (1 row)
On Fri, 07 Jan 2022 at 03:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked at this and it does seem like it might work, as per attached > patch. The one thing that is troubling me is that the opclass is set > up to apply gbt_text_same, which is formally the Wrong Thing for bpchar, > because the equality semantics shouldn't be quite the same. But we > could not fix that without a module version bump, which is annoying. > I think that it might not be necessary to change it, because > > (1) There's no such thing as unique GIST indexes, so it should not > matter if the "same" function is a bit stricter than the datatype's > nominal notion of equality. It's certainly okay for that to vary > from the semantics applied by the consistent function --- GIST has > no idea that the consistent function is allegedly testing equality. > > (2) If all the input values for a column have been coerced to the same > typmod, then it doesn't matter because two values that are equal after > space-stripping would be equal without space-stripping, too. > > However, (2) doesn't hold for an existing index that the user has failed > to REINDEX, because then the index would contain some space-stripped > values that same() will not say are equal to incoming new values. > Again, I think this doesn't matter much, but maybe I'm missing > something. I've not really dug into what GIST uses the same() > function for. > > In any case, if we do need same() to implement the identical > behavior to bpchareq(), then the other solution isn't sufficient > either. > > So in short, it seems like we ought to do some compatibility testing > and see if this code misbehaves at all with an index created by the > old code. I don't particularly want to do that ... any volunteers? > Thanks for your patch, it looks good to me. I'm not sure how to test this. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Hello, 07.01.2022 09:26, Japin Li wrote: > On Fri, 07 Jan 2022 at 03:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > In any case, if we do need same() to implement the identical > behavior to bpchareq(), then the other solution isn't sufficient > either. > > So in short, it seems like we ought to do some compatibility testing > and see if this code misbehaves at all with an index created by the > old code. I don't particularly want to do that ... any volunteers? > > Thanks for your patch, it looks good to me. I'm not sure how to test this. I will test it tomorrow. Best regards, Alexander
07.01.2022 12:00, Alexander Lakhin wrote: > Hello, > 07.01.2022 09:26, Japin Li wrote: >> On Fri, 07 Jan 2022 at 03:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> In any case, if we do need same() to implement the identical >> behavior to bpchareq(), then the other solution isn't sufficient >> either. >> >> So in short, it seems like we ought to do some compatibility testing >> and see if this code misbehaves at all with an index created by the >> old code. I don't particularly want to do that ... any volunteers? >> >> Thanks for your patch, it looks good to me. I'm not sure how to test this. > I will test it tomorrow. I've made a simple test based on the regression test (see attachment) and can confirm that REINDEX after upgrade fixes the index contents. Differences after upgrade but before REINDEX: --- /tmp/pgtest/char.out 2022-01-08 21:27:43.912274805 +0300 +++ /tmp/pgtest/char.expected 2022-01-08 21:27:43.896274765 +0300 @@ -40,8 +40,8 @@ (2 rows) SELECT * FROM chartmp WHERE a BETWEEN '31a' AND '31c'; - a ------- - 31b0 + a +---------------------------------- + 31b0 (1 row) REINDEX INDEX charidx Differences after upgrade and REINDEX: Files /tmp/pgtest/char.out and /tmp/pgtest/char.expected are identical (Unfortunately for me) I found no anomalies related to gbt_text_same() with an index created with the previous implementation. I've added diagnostic logging that shows when gbt_text_same() returns 0 for keys that are the equal but have different padding. So I've observed that gbt_text_same() returns incorrect result, but all the btree_gist tests still pass. Moreover, unconditional "*result = 0;" in gbt_text_same() doesn't affect the tests at all. I've found that gbt_text_same() is called by gistKeyIsEQ() from backend/access/gist/gistutil.c, and made gistKeyIsEQ() return false any time. And even with such change all check-world tests still pass (except for isolation/predicate-gist that failed due to locking of pages split differently). So for now, I still don't know how to get incorrect query results due to incorrect gistKeyIsEQ() behavior/excessive page splitting. Best regards, Alexander
Attachment
Alexander Lakhin <exclusion@gmail.com> writes: > (Unfortunately for me) I found no anomalies related to gbt_text_same() > with an index created with the previous implementation. I've added > diagnostic logging that shows when gbt_text_same() returns 0 for keys > that are the equal but have different padding. So I've observed that > gbt_text_same() returns incorrect result, but all the btree_gist tests > still pass. Moreover, unconditional "*result = 0;" in gbt_text_same() > doesn't affect the tests at all. > I've found that gbt_text_same() is called by gistKeyIsEQ() from > backend/access/gist/gistutil.c, and made gistKeyIsEQ() return false any > time. And even with such change all check-world tests still pass (except > for isolation/predicate-gist that failed due to locking of pages split > differently). So for now, I still don't know how to get incorrect query > results due to incorrect gistKeyIsEQ() behavior/excessive page splitting. Yeah, if that's the only use-case then it's pretty easy to see that an overly strict equality test isn't going to hurt us much. At worst it'll cause the index to be a little bit inefficiently stored due to unnecessary node splits. Even then, that won't happen much in normal use, since the discrepancy could only arise once in the lifespan of an index node (when it first sees a new-style entry that could have been considered equal to the old-style value). So I think this solution will work, and I'll go ahead and push it. Thanks for testing! (and for the original report ...) regards, tom lane