Thread: [PATCH] Add support function for containment operators
I had noticed that performance wasn't great when using the @> or <@ operators when examining if an element is contained in a range. Based on the discussion in [1] I would like to suggest the following changes: This patch attempts to improve the row estimation, as well as opening the possibility of using a btree index scan when using the containment operators. This is done via a new support function handling the following 2 requests: * SupportRequestIndexCondition find_index_quals will build an operator clause, given at least one finite RangeBound. * SupportRequestSimplify find_simplified_clause will rewrite the containment operator into a clause using inequality operators from the btree family (if available for the element type). A boolean constant is returned if the range is either empty or has no bounds. Performing the rewrite here lets the clausesel machinery provide the same estimates as for normal scalar inequalities. In both cases build_bound_expr is used to build the operator clauses from RangeBounds. Thanks to Laurenz Albe for giving the patch a look before submission. [1] https://www.postgresql.org/message-id/222c75fd-43b8-db3e-74a6-bb4fe22f76db@kimmet.dk Regards, Kim Johan Andersson
Attachment
Any thoughts on this change?
On Sat, 2023-04-29 at 17:07 +0200, Kim Johan Andersson wrote: > I had noticed that performance wasn't great when using the @> or <@ > operators when examining if an element is contained in a range. > Based on the discussion in [1] I would like to suggest the following > changes: > > This patch attempts to improve the row estimation, as well as opening > the possibility of using a btree index scan when using the containment > operators. > > This is done via a new support function handling the following 2 requests: > > * SupportRequestIndexCondition > find_index_quals will build an operator clause, given at least one > finite RangeBound. > > * SupportRequestSimplify > find_simplified_clause will rewrite the containment operator into a > clause using inequality operators from the btree family (if available > for the element type). > > A boolean constant is returned if the range is either empty or has no > bounds. > > Performing the rewrite here lets the clausesel machinery provide the > same estimates as for normal scalar inequalities. > > In both cases build_bound_expr is used to build the operator clauses > from RangeBounds. I think that this is a small, but useful improvement. The patch applies and builds without warning and passes "make installcheck-world" with the (ample) new regression tests. Some comments: - About the regression tests: You are using EXPLAIN (ANALYZE, SUMMARY OFF, TIMING OFF, COSTS OFF). While that returns stable results, I don't see the added value. I think you should use EXPLAIN (COSTS OFF). You don't need to test the actual number of result rows; we can trust that the executor processes >= and < correctly. Plain EXPLAIN would speed up the regression tests, which is a good thing. - About the implementation: You implement both "SupportRequestIndexCondition" and "SupportRequestSimplify", but when I experimented, the former was never called. That does not surprise me, since any expression of the shape "expr <@ range constant" can be simplified. Is the "SupportRequestIndexCondition" branch dead code? If not, do you have an example that triggers it? - About the code: +static Node * +find_index_quals(Const *rangeConst, Expr *otherExpr, Oid opfamily) +{ [...] + + if (!(lower.infinite && upper.infinite)) + { [...] + } + + return NULL; To avoid deep indentation and to make the code more readable, I think it would be better to write if (!(lower.infinite && upper.infinite)) return NULL; and unindent the rest of the code +static Node * +match_support_request(Node *rawreq) +{ [...] + switch (req->funcid) + { + case F_ELEM_CONTAINED_BY_RANGE: [...] + case F_RANGE_CONTAINS_ELEM: [...] + default: + return NULL; + } (This code appears twice.) The default clause should not be reachable, right? I think that there should be an Assert() to verify that. Perhaps something like Assert(req->funcid == F_ELEM_CONTAINED_BY_RANGE || req->funcid == F_RANGE_CONTAINS_ELEM); if (req->funcid == F_ELEM_CONTAINED_BY_RANGE) { [...] } else if (req->funcid == F_RANGE_CONTAINS_ELEM) { [...] } Yours, Laurenz Albe
> On Sat, 2023-04-29 at 17:07 +0200, Kim Johan Andersson wrote: > > I had noticed that performance wasn't great when using the @> or <@ > > operators when examining if an element is contained in a range. > > Based on the discussion in [1] I would like to suggest the following > > changes: > > > > This patch attempts to improve the row estimation, as well as opening > > the possibility of using a btree index scan when using the containment > > operators. I managed to break the patch: CREATE DATABASE czech ICU_LOCALE "cs-CZ" LOCALE "cs_CZ.utf8" TEMPLATE template0; \c czech CREATE TYPE textrange AS RANGE (SUBTYPE = text, SUBTYPE_OPCLASS = text_pattern_ops); CREATE TABLE tx (t text); INSERT INTO tx VALUES ('a'), ('c'), ('d'), ('ch'); EXPLAIN SELECT * FROM tx WHERE t <@ textrange('a', 'd'); QUERY PLAN ════════════════════════════════════════════════════ Seq Scan on tx (cost=0.00..30.40 rows=7 width=32) Filter: ((t >= 'a'::text) AND (t < 'd'::text)) (2 rows) SELECT * FROM tx WHERE t <@ textrange('a', 'd'); ERROR: could not determine which collation to use for string comparison HINT: Use the COLLATE clause to set the collation explicitly. The replacement operators are wrong; it should be ~>=~ and ~<~ . Also, there should be no error message. The result should be 'a', 'c' and 'ch'. Yours, Laurenz Albe
I wrote: > You implement both "SupportRequestIndexCondition" and "SupportRequestSimplify", > but when I experimented, the former was never called. That does not > surprise me, since any expression of the shape "expr <@ range constant" > can be simplified. Is the "SupportRequestIndexCondition" branch dead code? > If not, do you have an example that triggers it? I had an idea about this: So far, you only consider constant ranges. But if we have a STABLE range expression, you could use an index scan for "expr <@ range", for example Index Cond (expr >= lower(range) AND expr < upper(range)). Yours, Laurenz Albe
On 07-07-2023 13:20, Laurenz Albe wrote: > I wrote: >> You implement both "SupportRequestIndexCondition" and "SupportRequestSimplify", >> but when I experimented, the former was never called. That does not >> surprise me, since any expression of the shape "expr <@ range constant" >> can be simplified. Is the "SupportRequestIndexCondition" branch dead code? >> If not, do you have an example that triggers it? I would think it is dead code, I came to the same conclusion. So we can drop SupportRequestIndexCondition, since the simplification happens to take care of everything. > I had an idea about this: > So far, you only consider constant ranges. But if we have a STABLE range > expression, you could use an index scan for "expr <@ range", for example > Index Cond (expr >= lower(range) AND expr < upper(range)). > I will try to look into this. Originally that was what I was hoping for, but didn't see way of going about it. Thanks for your comments, I will also look at the locale-related breakage you spotted. Regards, Kimjand
On Sat, 2023-07-08 at 08:11 +0200, Kim Johan Andersson wrote: > On 07-07-2023 13:20, Laurenz Albe wrote: > > I wrote: > > > You implement both "SupportRequestIndexCondition" and "SupportRequestSimplify", > > > but when I experimented, the former was never called. That does not > > > surprise me, since any expression of the shape "expr <@ range constant" > > > can be simplified. Is the "SupportRequestIndexCondition" branch dead code? > > > If not, do you have an example that triggers it? > > I would think it is dead code, I came to the same conclusion. So we can > drop SupportRequestIndexCondition, since the simplification happens to > take care of everything. > > > > I had an idea about this: > > So far, you only consider constant ranges. But if we have a STABLE range > > expression, you could use an index scan for "expr <@ range", for example > > Index Cond (expr >= lower(range) AND expr < upper(range)). > > > > I will try to look into this. Originally that was what I was hoping for, > but didn't see way of going about it. > > Thanks for your comments, I will also look at the locale-related > breakage you spotted. I have marked the patch as "returned with feedback". I encourage you to submit an improved version in a future commitfest! Yours, Laurenz Albe
On Tue, Aug 1, 2023 at 10:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > > > > > I had an idea about this: > > > So far, you only consider constant ranges. But if we have a STABLE range > > > expression, you could use an index scan for "expr <@ range", for example > > > Index Cond (expr >= lower(range) AND expr < upper(range)). > > > The above part, not sure how to implement it, not sure it is doable. Refactor: drop SupportRequestIndexCondition and related code, since mentioned in upthread, it's dead code. refactor the regression test. (since data types with infinity cover more cases than int4range, so I deleted some tests). now there are 3 helper functions (build_bound_expr, find_simplified_clause, match_support_request), 2 entry functions (elem_contained_by_range_support, range_contains_elem_support) Collation problem seems solved. Putting the following test on the src/test/regress/sql/rangetypes.sql will not work. Maybe because of the order of the regression test, in SQL-ASCII encoding, I cannot create collation="cs-CZ-x-icu". drop type if EXISTS textrange1, textrange2; drop table if EXISTS collate_test1, collate_test2; CREATE TYPE textrange1 AS RANGE (SUBTYPE = text, collation="C"); create type textrange2 as range(subtype=text, collation="cs-CZ-x-icu"); CREATE TABLE collate_test1 (b text COLLATE "en-x-icu" NOT NULL); INSERT INTO collate_test1(b) VALUES ('a'), ('c'), ('d'), ('ch'); CREATE TABLE collate_test2 (b text NOT NULL); INSERT INTO collate_test2(b) VALUES ('a'), ('c'), ('d'), ('ch'); --should include 'ch' SELECT * FROM collate_test1 WHERE b <@ textrange1('a', 'd'); --should not include 'ch' SELECT * FROM collate_test1 WHERE b <@ textrange2('a', 'd'); --should include 'ch' SELECT * FROM collate_test2 WHERE b <@ textrange1('a', 'd'); --should not include 'ch' SELECT * FROM collate_test2 WHERE b <@ textrange2('a', 'd'); -----------------
Attachment
On Fri, 2023-10-13 at 14:26 +0800, jian he wrote: > Collation problem seems solved. I didn't review your patch in detail, there is still a problem with my example: CREATE TYPE textrange AS RANGE ( SUBTYPE = text, SUBTYPE_OPCLASS = text_pattern_ops ); CREATE TABLE tx (t text COLLATE "cs-CZ-x-icu"); INSERT INTO tx VALUES ('a'), ('c'), ('d'), ('ch'); SELECT * FROM tx WHERE t <@ textrange('a', 'd'); t ════ a c ch (3 rows) That was correct. EXPLAIN SELECT * FROM tx WHERE t <@ textrange('a', 'd'); QUERY PLAN ════════════════════════════════════════════════════ Seq Scan on tx (cost=0.00..30.40 rows=7 width=32) Filter: ((t >= 'a'::text) AND (t < 'd'::text)) (2 rows) But that was weird. The operators seem wrong. Look at that query: SELECT * FROM tx WHERE t >= 'a' AND t < 'd'; t ═══ a c (2 rows) But the execution plan is identical... I am not sure what is the problem here, but in my opinion the operators shown in the execution plan should be like this: SELECT * FROM tx WHERE t ~>=~ 'a' AND t ~<~ 'd'; t ════ a c ch (3 rows) Yours, Laurenz Albe
On Fri, Oct 20, 2023 at 12:01 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Fri, 2023-10-13 at 14:26 +0800, jian he wrote: > > Collation problem seems solved. > > I didn't review your patch in detail, there is still a problem > with my example: > > CREATE TYPE textrange AS RANGE ( > SUBTYPE = text, > SUBTYPE_OPCLASS = text_pattern_ops > ); > > CREATE TABLE tx (t text COLLATE "cs-CZ-x-icu"); > > INSERT INTO tx VALUES ('a'), ('c'), ('d'), ('ch'); > > SELECT * FROM tx WHERE t <@ textrange('a', 'd'); > > t > ════ > a > c > ch > (3 rows) > > That was correct. > > EXPLAIN SELECT * FROM tx WHERE t <@ textrange('a', 'd'); > > QUERY PLAN > ════════════════════════════════════════════════════ > Seq Scan on tx (cost=0.00..30.40 rows=7 width=32) > Filter: ((t >= 'a'::text) AND (t < 'd'::text)) > (2 rows) > > But that was weird. The operators seem wrong. Look at that Thanks for pointing this out! The problem is that TypeCacheEntry->rngelemtype typcaheentry don't have the range's SUBTYPE_OPCLASS info. So in find_simplified_clause, we need to get the range's SUBTYPE_OPCLASS from the pg_catalog table. Also in pg_range, column rngsubopc is not null. so this should be fine.
Attachment
On Fri, 2023-10-20 at 16:24 +0800, jian he wrote: > [new patch] Thanks, that patch works as expected and passes regression tests. Some comments about the code: > --- a/src/backend/utils/adt/rangetypes.c > +++ b/src/backend/utils/adt/rangetypes.c > @@ -558,7 +570,6 @@ elem_contained_by_range(PG_FUNCTION_ARGS) > PG_RETURN_BOOL(range_contains_elem_internal(typcache, r, val)); > } > > - > /* range, range -> bool functions */ > > /* equality (internal version) */ Please don't change unrelated whitespace. > +static Node * > +find_simplified_clause(Const *rangeConst, Expr *otherExpr) > +{ > + Form_pg_range pg_range; > + HeapTuple tup; > + Oid opclassOid; > + RangeBound lower; > + RangeBound upper; > + bool empty; > + Oid rng_collation; > + TypeCacheEntry *elemTypcache; > + Oid opfamily = InvalidOid; > + > + RangeType *range = DatumGetRangeTypeP(rangeConst->constvalue); > + TypeCacheEntry *rangetypcache = lookup_type_cache(RangeTypeGetOid(range), TYPECACHE_RANGE_INFO); > + { This brace is unnecessary. Perhaps a leftover from a removed conditional statement. > + /* this part is get the range's SUBTYPE_OPCLASS from pg_range catalog. > + * Refer load_rangetype_info function last line. > + * TypeCacheEntry->rngelemtype typcaheenetry either don't have opclass entry or with default opclass. > + * Range's subtype opclass only in catalog table. > + */ The comments in the patch need some more love. Apart from the language, you should have a look at the style guide: - single-line comments should start with lower case and have no period: /* example of a single-line comment */ - Multi-line comments should start with /* on its own line and end with */ on its own line. They should use whole sentences: /* * In case a comment spans several lines, it should look like * this. Try not to exceed 80 characters. */ > + tup = SearchSysCache1(RANGETYPE, ObjectIdGetDatum(RangeTypeGetOid(range))); > + > + /* should not fail, since we already checked typtype ... */ > + if (!HeapTupleIsValid(tup)) > + elog(ERROR, "cache lookup failed for range type %u", RangeTypeGetOid(range)); If this is a "can't happen" case, it should be an Assert. > + > + pg_range = (Form_pg_range) GETSTRUCT(tup); > + > + opclassOid = pg_range->rngsubopc; > + > + ReleaseSysCache(tup); > + > + /* get opclass properties and look up the comparison function */ > + opfamily = get_opclass_family(opclassOid); > + } > + > + range_deserialize(rangetypcache, range, &lower, &upper, &empty); > + rng_collation = rangetypcache->rng_collation; > + > + if (empty) > + { > + /* If the range is empty, then there can be no matches. */ > + return makeBoolConst(false, false); > + } > + else if (lower.infinite && upper.infinite) > + { > + /* The range has no bounds, so matches everything. */ > + return makeBoolConst(true, false); > + } > + else > + { Many of the variables declared at the beginning of the function are only used in this branch. You should declare them here. > +static Node * > +match_support_request(Node *rawreq) > +{ > + if (IsA(rawreq, SupportRequestSimplify)) > + { To keep the indentation shallow, the preferred style is: if (/* case we don't handle */) return NULL; /* proceed without indentation */ > + SupportRequestSimplify *req = (SupportRequestSimplify *) rawreq; > + FuncExpr *fexpr = req->fcall; > + Node *leftop; > + Node *rightop; > + Const *rangeConst; > + Expr *otherExpr; > + > + Assert(list_length(fexpr->args) == 2); > + > + leftop = linitial(fexpr->args); > + rightop = lsecond(fexpr->args); > + > + switch (fexpr->funcid) > + { > + case F_ELEM_CONTAINED_BY_RANGE: > + if (!IsA(rightop, Const) || ((Const *) rightop)->constisnull) > + return NULL; > + > + rangeConst = (Const *) rightop; > + otherExpr = (Expr *) leftop; > + break; > + > + case F_RANGE_CONTAINS_ELEM: > + if (!IsA(leftop, Const) || ((Const *) leftop)->constisnull) > + return NULL; > + > + rangeConst = (Const *) leftop; > + otherExpr = (Expr *) rightop; > + break; > + > + default: > + return NULL; > + } > + > + return find_simplified_clause(rangeConst, otherExpr); > + } > + return NULL; > +} > \ No newline at end of file You are calling this funtion from both "elem_contained_by_range_support" and "range_contains_elem_support", only to branch based on the function type. I think the code would be simpler if you did away with "match_support_request" at all. I adjusted your patch according to my comments; what do you think? I also went over the regression tests. I did away with the comparison function, instead I used examples that don't return too many rows. I cut down on the test cases a little bit. I added a test that uses the "text_pattern_ops" operator class. Yours, Laurenz Albe
Attachment
On Sun, 2023-11-12 at 18:15 +0100, Laurenz Albe wrote: > I adjusted your patch according to my comments; what do you think? I have added the patch to the January commitfest, with Jian and Kim as authors. I hope that is OK with you. Yours, Laurenz Albe
On 12-11-2023 20:20, Laurenz Albe wrote: > On Sun, 2023-11-12 at 18:15 +0100, Laurenz Albe wrote: >> I adjusted your patch according to my comments; what do you think? > > I have added the patch to the January commitfest, with Jian and Kim as authors. > I hope that is OK with you. Sounds great to me. Thanks to Jian for picking this up. Regards, Kim Johan Andersson
fix a typo and also did a minor change.
from
+ if (lowerExpr != NULL && upperExpr != NULL)
+ return (Node *) makeBoolExpr(AND_EXPR, list_make2(lowerExpr, upperExpr), -1);
+ else if (lowerExpr != NULL)
+ return (Node *) lowerExpr;
+ else if (upperExpr != NULL)
+ return (Node *) upperExpr;
to
+ if (lowerExpr != NULL && upperExpr != NULL)
+ return (Node *) makeBoolExpr(AND_EXPR, list_make2(lowerExpr, upperExpr), -1);
+ else if (lowerExpr != NULL)
+ return (Node *) lowerExpr;
+ else if (upperExpr != NULL)
+ return (Node *) upperExpr;
+ else
+ {
+ Assert(false);
+ return NULL;
+ }
because cfbot says:
15:04:38.116] make -s -j${BUILD_JOBS} clean
[15:04:38.510] time make -s -j${BUILD_JOBS} world-bin
[15:04:43.272] rangetypes.c:2908:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
[15:04:43.272] }
[15:04:43.272] ^
[15:04:43.272] 1 error generated.
also add some commit messages, I hope it will be useful.
Attachment
jian he <jian.universality@gmail.com> writes: > [ v5-0001-Simplify-containment-in-range-constants-with-supp.patch ] I spent some time reviewing and cleaning up this code. The major problem I noted was that it doesn't spend any effort thinking about cases where it'd be unsafe or undesirable to apply the transformation. In particular, it's entirely uncool to produce a double-sided comparison if the elemExpr is volatile. These two expressions do not have the same result: select random() <@ float8range(0.1, 0.2); select random() >= 0.1 and random() < 0.2; (Yes, I'm aware that BETWEEN is broken in this respect. All the more reason why we mustn't break <@.) Another problem is that even if the elemExpr isn't volatile, it might be expensive enough that evaluating it twice is bad. I am not sure where we ought to put the cutoff. There are some existing places where we set a 10X cpu_operator_cost limit on comparable transformations, so I stole that logic in the attached. But perhaps someone has an argument for a different rule? Anyway, pending discussion of that point, I think the code is good to go. I don't like the test cases much though: they expend many more cycles than necessary. You could prove the same points just by looking at the expansion of expressions, eg. regression=# explain (verbose, costs off) select current_date <@ daterange(null,null); QUERY PLAN ---------------- Result Output: true (2 rows) regression=# explain (verbose, costs off) select current_date <@ daterange('-Infinity', '1997-04-10'::date, '[)'); QUERY PLAN ----------------------------------------------------------------------------------------- Result Output: ((CURRENT_DATE >= '-infinity'::date) AND (CURRENT_DATE < '1997-04-10'::date)) (2 rows) I'd suggest losing the temp table and just coding tests like these. regards, tom lane diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c index d99b00b590..2d94a6b877 100644 --- a/src/backend/utils/adt/rangetypes.c +++ b/src/backend/utils/adt/rangetypes.c @@ -30,19 +30,20 @@ */ #include "postgres.h" -#include "access/tupmacs.h" #include "common/hashfn.h" -#include "lib/stringinfo.h" #include "libpq/pqformat.h" #include "miscadmin.h" +#include "nodes/makefuncs.h" #include "nodes/miscnodes.h" -#include "port/pg_bitutils.h" +#include "nodes/supportnodes.h" +#include "optimizer/clauses.h" +#include "optimizer/cost.h" +#include "optimizer/optimizer.h" #include "utils/builtins.h" #include "utils/date.h" #include "utils/lsyscache.h" #include "utils/rangetypes.h" #include "utils/timestamp.h" -#include "varatt.h" /* fn_extra cache entry for one of the range I/O functions */ @@ -69,6 +70,12 @@ static Size datum_compute_size(Size data_length, Datum val, bool typbyval, char typalign, int16 typlen, char typstorage); static Pointer datum_write(Pointer ptr, Datum datum, bool typbyval, char typalign, int16 typlen, char typstorage); +static Node *find_simplified_clause(PlannerInfo *root, + Expr *rangeExpr, Expr *elemExpr); +static Expr *build_bound_expr(Expr *elemExpr, Datum val, + bool isLowerBound, bool isInclusive, + TypeCacheEntry *typeCache, + Oid opfamily, Oid rng_collation); /* @@ -2173,6 +2180,58 @@ make_empty_range(TypeCacheEntry *typcache) return make_range(typcache, &lower, &upper, true, NULL); } +/* + * Planner support function for elem_contained_by_range (<@ operator). + */ +Datum +elem_contained_by_range_support(PG_FUNCTION_ARGS) +{ + Node *rawreq = (Node *) PG_GETARG_POINTER(0); + Node *ret = NULL; + + if (IsA(rawreq, SupportRequestSimplify)) + { + SupportRequestSimplify *req = (SupportRequestSimplify *) rawreq; + FuncExpr *fexpr = req->fcall; + Expr *leftop, + *rightop; + + Assert(list_length(fexpr->args) == 2); + leftop = linitial(fexpr->args); + rightop = lsecond(fexpr->args); + + ret = find_simplified_clause(req->root, rightop, leftop); + } + + PG_RETURN_POINTER(ret); +} + +/* + * Planner support function for range_contains_elem (@> operator). + */ +Datum +range_contains_elem_support(PG_FUNCTION_ARGS) +{ + Node *rawreq = (Node *) PG_GETARG_POINTER(0); + Node *ret = NULL; + + if (IsA(rawreq, SupportRequestSimplify)) + { + SupportRequestSimplify *req = (SupportRequestSimplify *) rawreq; + FuncExpr *fexpr = req->fcall; + Expr *leftop, + *rightop; + + Assert(list_length(fexpr->args) == 2); + leftop = linitial(fexpr->args); + rightop = lsecond(fexpr->args); + + ret = find_simplified_clause(req->root, leftop, rightop); + } + + PG_RETURN_POINTER(ret); +} + /* *---------------------------------------------------------- @@ -2715,3 +2774,180 @@ datum_write(Pointer ptr, Datum datum, bool typbyval, char typalign, return ptr; } + +/* + * Common code for the elem_contained_by_range and range_contains_elem + * support functions. The caller has extracted the function argument + * expressions, and swapped them if necessary to pass the range first. + * + * Returns a simplified replacement expression, or NULL if we can't simplify. + */ +static Node * +find_simplified_clause(PlannerInfo *root, Expr *rangeExpr, Expr *elemExpr) +{ + RangeType *range; + TypeCacheEntry *rangetypcache; + RangeBound lower; + RangeBound upper; + bool empty; + + /* can't do anything unless the range is a non-null constant */ + if (!IsA(rangeExpr, Const) || ((Const *) rangeExpr)->constisnull) + return NULL; + range = DatumGetRangeTypeP(((Const *) rangeExpr)->constvalue); + + rangetypcache = lookup_type_cache(RangeTypeGetOid(range), + TYPECACHE_RANGE_INFO); + if (rangetypcache->rngelemtype == NULL) + elog(ERROR, "type %u is not a range type", RangeTypeGetOid(range)); + + range_deserialize(rangetypcache, range, &lower, &upper, &empty); + + if (empty) + { + /* if the range is empty, then there can be no matches */ + return makeBoolConst(false, false); + } + else if (lower.infinite && upper.infinite) + { + /* the range has infinite bounds, so it matches everything */ + return makeBoolConst(true, false); + } + else + { + /* at least one bound is available, we have something to work with */ + TypeCacheEntry *elemTypcache = rangetypcache->rngelemtype; + Oid opfamily = rangetypcache->rng_opfamily; + Oid rng_collation = rangetypcache->rng_collation; + Expr *lowerExpr = NULL; + Expr *upperExpr = NULL; + + if (!lower.infinite && !upper.infinite) + { + /* + * When both bounds are present, we have a problem: the + * "simplified" clause would need to evaluate the elemExpr twice. + * That's definitely not okay if the elemExpr is volatile, and + * it's also unattractive if the elemExpr is expensive. + */ + QualCost eval_cost; + + if (contain_volatile_functions((Node *) elemExpr)) + return NULL; + + /* + * We define "expensive" as "contains any subplan or more than 10 + * operators". Note that the subplan search has to be done + * explicitly, since cost_qual_eval() will barf on unplanned + * subselects. + */ + if (contain_subplans((Node *) elemExpr)) + return NULL; + cost_qual_eval_node(&eval_cost, (Node *) elemExpr, root); + if (eval_cost.startup + eval_cost.per_tuple > + 10 * cpu_operator_cost) + return NULL; + } + + /* Okay, try to build boundary comparison expressions */ + if (!lower.infinite) + { + lowerExpr = build_bound_expr(elemExpr, + lower.val, + true, + lower.inclusive, + elemTypcache, + opfamily, + rng_collation); + if (lowerExpr == NULL) + return NULL; + } + + if (!upper.infinite) + { + /* Copy the elemExpr if we need two copies */ + if (!lower.infinite) + elemExpr = copyObject(elemExpr); + upperExpr = build_bound_expr(elemExpr, + upper.val, + false, + upper.inclusive, + elemTypcache, + opfamily, + rng_collation); + if (upperExpr == NULL) + return NULL; + } + + if (lowerExpr != NULL && upperExpr != NULL) + return (Node *) make_andclause(list_make2(lowerExpr, upperExpr)); + else if (lowerExpr != NULL) + return (Node *) lowerExpr; + else if (upperExpr != NULL) + return (Node *) upperExpr; + else + { + Assert(false); + return NULL; + } + } +} + +/* + * Helper function for find_simplified_clause(). + * + * Build the expression (elemExpr Operator val), where the operator is + * the appropriate member of the given opfamily depending on + * isLowerBound and isInclusive. typeCache is the typcache entry for + * the "val" value (presently, this will be the same type as elemExpr). + * rng_collation is the collation to use in the comparison. + * + * Return NULL on failure (if, for some reason, we can't find the operator). + */ +static Expr * +build_bound_expr(Expr *elemExpr, Datum val, + bool isLowerBound, bool isInclusive, + TypeCacheEntry *typeCache, + Oid opfamily, Oid rng_collation) +{ + Oid elemType = typeCache->type_id; + int16 elemTypeLen = typeCache->typlen; + bool elemByValue = typeCache->typbyval; + Oid elemCollation = typeCache->typcollation; + int16 strategy; + Oid oproid; + Expr *constExpr; + + /* Identify the comparison operator to use */ + if (isLowerBound) + strategy = isInclusive ? BTGreaterEqualStrategyNumber : BTGreaterStrategyNumber; + else + strategy = isInclusive ? BTLessEqualStrategyNumber : BTLessStrategyNumber; + + /* + * We could use exprType(elemExpr) here, if it ever becomes possible that + * elemExpr is not the exact same type as the range elements. + */ + oproid = get_opfamily_member(opfamily, elemType, elemType, strategy); + + /* We don't really expect failure here, but just in case ... */ + if (!OidIsValid(oproid)) + return NULL; + + /* OK, convert "val" to a full-fledged Const node, and make the OpExpr */ + constExpr = (Expr *) makeConst(elemType, + -1, + elemCollation, + elemTypeLen, + val, + false, + elemByValue); + + return make_opclause(oproid, + BOOLOID, + false, + elemExpr, + constExpr, + InvalidOid, + rng_collation); +} diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c index c260012bd0..3431c3cd98 100644 --- a/src/backend/utils/adt/rangetypes_selfuncs.c +++ b/src/backend/utils/adt/rangetypes_selfuncs.c @@ -196,9 +196,10 @@ rangesel(PG_FUNCTION_ARGS) else if (operator == OID_RANGE_ELEM_CONTAINED_OP) { /* - * Here, the Var is the elem, not the range. For now we just punt and - * return the default estimate. In future we could disassemble the - * range constant and apply scalarineqsel ... + * Here, the Var is the elem, not the range. In typical cases + * elem_contained_by_range_support will have simplified this case, so + * that we won't get here. If we do get here we'll fall back on a + * default estimate. */ } else if (((Const *) other)->consttype == vardata.vartype) diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index 4625be4d5c..84fc83cb11 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -940,6 +940,7 @@ load_rangetype_info(TypeCacheEntry *typentry) /* get opclass properties and look up the comparison function */ opfamilyOid = get_opclass_family(opclassOid); opcintype = get_opclass_input_type(opclassOid); + typentry->rng_opfamily = opfamilyOid; cmpFnOid = get_opfamily_proc(opfamilyOid, opcintype, opcintype, BTORDER_PROC); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 58811a6530..5eddeb0840 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10503,13 +10503,15 @@ proname => 'range_overlaps', prorettype => 'bool', proargtypes => 'anyrange anyrange', prosrc => 'range_overlaps' }, { oid => '3858', - proname => 'range_contains_elem', prorettype => 'bool', - proargtypes => 'anyrange anyelement', prosrc => 'range_contains_elem' }, + proname => 'range_contains_elem', prosupport => 'range_contains_elem_support', + prorettype => 'bool', proargtypes => 'anyrange anyelement', + prosrc => 'range_contains_elem' }, { oid => '3859', proname => 'range_contains', prorettype => 'bool', proargtypes => 'anyrange anyrange', prosrc => 'range_contains' }, { oid => '3860', - proname => 'elem_contained_by_range', prorettype => 'bool', + proname => 'elem_contained_by_range', + prosupport => 'elem_contained_by_range_support', prorettype => 'bool', proargtypes => 'anyelement anyrange', prosrc => 'elem_contained_by_range' }, { oid => '3861', proname => 'range_contained_by', prorettype => 'bool', @@ -10532,6 +10534,12 @@ { oid => '3867', proname => 'range_union', prorettype => 'anyrange', proargtypes => 'anyrange anyrange', prosrc => 'range_union' }, +{ oid => '9998', descr => 'planner support for range_contains_elem', + proname => 'range_contains_elem_support', prorettype => 'internal', + proargtypes => 'internal', prosrc => 'range_contains_elem_support' }, +{ oid => '9999', descr => 'planner support for elem_contained_by_range', + proname => 'elem_contained_by_range_support', prorettype => 'internal', + proargtypes => 'internal', prosrc => 'elem_contained_by_range_support' }, { oid => '4057', descr => 'the smallest range which includes both of the given ranges', proname => 'range_merge', prorettype => 'anyrange', diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h index b72daa7fb4..f506cc4aa3 100644 --- a/src/include/utils/typcache.h +++ b/src/include/utils/typcache.h @@ -96,6 +96,7 @@ typedef struct TypeCacheEntry * btree comparison function. */ struct TypeCacheEntry *rngelemtype; /* range's element type */ + Oid rng_opfamily; /* opfamily to use for range comparisons */ Oid rng_collation; /* collation for comparisons, if any */ FmgrInfo rng_cmp_proc_finfo; /* comparison function */ FmgrInfo rng_canonical_finfo; /* canonicalization function, if any */ diff --git a/src/test/regress/expected/rangetypes.out b/src/test/regress/expected/rangetypes.out index 07d5621ef8..4bd23eb582 100644 --- a/src/test/regress/expected/rangetypes.out +++ b/src/test/regress/expected/rangetypes.out @@ -1834,3 +1834,196 @@ create function table_fail(i anyelement) returns table(i anyelement, r anyrange) as $$ select $1, '[1,10]' $$ language sql; ERROR: cannot determine result data type DETAIL: A result of type anyrange requires at least one input of type anyrange or anymultirange. +-- +-- Test support functions +-- +CREATE TEMP TABLE date_support_test AS + SELECT '2000-01-01'::DATE + g AS some_date + FROM generate_series(-1000, 1000) sub(g); +CREATE UNIQUE INDEX date_support_test_idx ON date_support_test (some_date); +INSERT INTO date_support_test values ('-infinity'), ('infinity'); +ANALYZE date_support_test; +-- empty ranges +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange 'empty'; + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +-- only lower bound present +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2000-01-01', NULL, '[)'); + QUERY PLAN +--------------------------------------------- + Seq Scan on date_support_test + Filter: (some_date >= '01-01-2000'::date) +(2 rows) + +-- only upper bound present +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange(NULL, '2000-01-01', '(]'); + QUERY PLAN +-------------------------------------------- + Seq Scan on date_support_test + Filter: (some_date < '01-02-2000'::date) +(2 rows) + +-- unbounded range +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange(NULL, NULL); + QUERY PLAN +------------------------------- + Seq Scan on date_support_test +(1 row) + +-- lower range "-Infinity" excluded +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '()'); + QUERY PLAN +-------------------------------------------------------------------------------------- + Index Only Scan using date_support_test_idx on date_support_test + Index Cond: ((some_date > '-infinity'::date) AND (some_date < '04-10-1997'::date)) +(2 rows) + +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '()'); + some_date +------------ + 04-06-1997 + 04-07-1997 + 04-08-1997 + 04-09-1997 +(4 rows) + +-- lower range "-Infinity" included +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '[)'); + QUERY PLAN +--------------------------------------------------------------------------------------- + Index Only Scan using date_support_test_idx on date_support_test + Index Cond: ((some_date >= '-infinity'::date) AND (some_date < '04-10-1997'::date)) +(2 rows) + +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '[)'); + some_date +------------ + -infinity + 04-06-1997 + 04-07-1997 + 04-08-1997 + 04-09-1997 +(5 rows) + +-- upper range "Infinity" excluded +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[)'); + QUERY PLAN +-------------------------------------------------------------------------------------- + Index Only Scan using date_support_test_idx on date_support_test + Index Cond: ((some_date >= '09-25-2002'::date) AND (some_date < 'infinity'::date)) +(2 rows) + +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[)'); + some_date +------------ + 09-25-2002 + 09-26-2002 + 09-27-2002 +(3 rows) + +-- upper range "Infinity" included +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[]'); + QUERY PLAN +--------------------------------------------------------------------------------------- + Index Only Scan using date_support_test_idx on date_support_test + Index Cond: ((some_date >= '09-25-2002'::date) AND (some_date <= 'infinity'::date)) +(2 rows) + +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[]'); + some_date +------------ + 09-25-2002 + 09-26-2002 + 09-27-2002 + infinity +(4 rows) + +-- should also work if we use "@>" +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE daterange('-Infinity', '1997-04-10'::DATE, '()') @> some_date; + QUERY PLAN +-------------------------------------------------------------------------------------- + Index Only Scan using date_support_test_idx on date_support_test + Index Cond: ((some_date > '-infinity'::date) AND (some_date < '04-10-1997'::date)) +(2 rows) + +SELECT some_date FROM date_support_test +WHERE daterange('-Infinity', '1997-04-10'::DATE, '()') @> some_date; + some_date +------------ + 04-06-1997 + 04-07-1997 + 04-08-1997 + 04-09-1997 +(4 rows) + +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE daterange('2002-09-25'::DATE, 'Infinity', '[]') @> some_date; + QUERY PLAN +--------------------------------------------------------------------------------------- + Index Only Scan using date_support_test_idx on date_support_test + Index Cond: ((some_date >= '09-25-2002'::date) AND (some_date <= 'infinity'::date)) +(2 rows) + +SELECT some_date FROM date_support_test +WHERE daterange('2002-09-25'::DATE, 'Infinity', '[]') @> some_date; + some_date +------------ + 09-25-2002 + 09-26-2002 + 09-27-2002 + infinity +(4 rows) + +DROP TABLE date_support_test; +-- test a custom range type with a non-default operator class +CREATE TYPE textrange_supp AS RANGE ( + SUBTYPE = text, + SUBTYPE_OPCLASS = text_pattern_ops +); +CREATE TEMP TABLE text_support_test (t text COLLATE "C"); +INSERT INTO text_support_test VALUES ('a'), ('c'), ('d'), ('ch'); +EXPLAIN (COSTS OFF) +SELECT * FROM text_support_test WHERE t <@ textrange_supp('a', 'd'); + QUERY PLAN +------------------------------------------------------ + Seq Scan on text_support_test + Filter: ((t ~>=~ 'a'::text) AND (t ~<~ 'd'::text)) +(2 rows) + +SELECT * FROM text_support_test WHERE t <@ textrange_supp('a', 'd'); + t +---- + a + c + ch +(3 rows) + +DROP TABLE text_support_test; +DROP TYPE textrange_supp; diff --git a/src/test/regress/sql/rangetypes.sql b/src/test/regress/sql/rangetypes.sql index c5dbe0c04f..399f7cf74b 100644 --- a/src/test/regress/sql/rangetypes.sql +++ b/src/test/regress/sql/rangetypes.sql @@ -629,3 +629,95 @@ create function inoutparam_fail(inout i anyelement, out r anyrange) --should fail create function table_fail(i anyelement) returns table(i anyelement, r anyrange) as $$ select $1, '[1,10]' $$ language sql; + +-- +-- Test support functions +-- + +CREATE TEMP TABLE date_support_test AS + SELECT '2000-01-01'::DATE + g AS some_date + FROM generate_series(-1000, 1000) sub(g); +CREATE UNIQUE INDEX date_support_test_idx ON date_support_test (some_date); +INSERT INTO date_support_test values ('-infinity'), ('infinity'); +ANALYZE date_support_test; + +-- empty ranges +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange 'empty'; + +-- only lower bound present +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2000-01-01', NULL, '[)'); + +-- only upper bound present +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange(NULL, '2000-01-01', '(]'); + +-- unbounded range +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange(NULL, NULL); + +-- lower range "-Infinity" excluded +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '()'); +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '()'); + +-- lower range "-Infinity" included +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '[)'); +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('-Infinity', '1997-04-10'::DATE, '[)'); + +-- upper range "Infinity" excluded +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[)'); +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[)'); + +-- upper range "Infinity" included +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[]'); +SELECT some_date FROM date_support_test +WHERE some_date <@ daterange('2002-09-25'::DATE, 'Infinity', '[]'); + +-- should also work if we use "@>" +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE daterange('-Infinity', '1997-04-10'::DATE, '()') @> some_date; +SELECT some_date FROM date_support_test +WHERE daterange('-Infinity', '1997-04-10'::DATE, '()') @> some_date; + +EXPLAIN (COSTS OFF) +SELECT some_date FROM date_support_test +WHERE daterange('2002-09-25'::DATE, 'Infinity', '[]') @> some_date; +SELECT some_date FROM date_support_test +WHERE daterange('2002-09-25'::DATE, 'Infinity', '[]') @> some_date; + +DROP TABLE date_support_test; + +-- test a custom range type with a non-default operator class +CREATE TYPE textrange_supp AS RANGE ( + SUBTYPE = text, + SUBTYPE_OPCLASS = text_pattern_ops +); + +CREATE TEMP TABLE text_support_test (t text COLLATE "C"); + +INSERT INTO text_support_test VALUES ('a'), ('c'), ('d'), ('ch'); + +EXPLAIN (COSTS OFF) +SELECT * FROM text_support_test WHERE t <@ textrange_supp('a', 'd'); +SELECT * FROM text_support_test WHERE t <@ textrange_supp('a', 'd'); + +DROP TABLE text_support_test; + +DROP TYPE textrange_supp;
On Wed, Jan 17, 2024 at 5:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But perhaps someone has an argument for a different rule? > > Anyway, pending discussion of that point, I think the code is good > to go. I don't like the test cases much though: they expend many more > cycles than necessary. You could prove the same points just by > looking at the expansion of expressions, eg. > your patch is far better! IMHO, worried about the support function, the transformed plan generates the wrong result, so we add the tests to make it bullet proof. Now I see your point. If the transformed plan is right, the whole added code should be fine. but keeping the textrange_supp related test should be a good idea. since we don't have SUBTYPE_OPCLASS related sql tests.
jian he <jian.universality@gmail.com> writes: > Now I see your point. If the transformed plan is right, the whole > added code should be fine. > but keeping the textrange_supp related test should be a good idea. > since we don't have SUBTYPE_OPCLASS related sql tests. Yeah, it's a little harder to make a table-less test for that case. I thought about using current_user or the like as a stable comparison value, but that introduces some doubt about what the collation would be. That test seems cheap enough as-is, since it's handling only a tiny amount of data. Committed. regards, tom lane
On Sun, 21 Jan 2024 at 00:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > jian he <jian.universality@gmail.com> writes: > > Now I see your point. If the transformed plan is right, the whole > > added code should be fine. > > but keeping the textrange_supp related test should be a good idea. > > since we don't have SUBTYPE_OPCLASS related sql tests. > > Yeah, it's a little harder to make a table-less test for that case. > I thought about using current_user or the like as a stable comparison > value, but that introduces some doubt about what the collation would > be. That test seems cheap enough as-is, since it's handling only a > tiny amount of data. > > Committed. I have updated the commitfest entry to Committed as the patch is committed. Regards, Vignesh