Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CAPpHfdvjtEWqjVcPd3-JQw8yCoppMXjK8kHnvinxBXGMZt-M_g@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (jian he <jian.universality@gmail.com>) |
List | pgsql-hackers |
Hi, Jian!
On Mon, Oct 28, 2024 at 9:19 AM jian he <jian.universality@gmail.com> wrote:
>
> * NOTE: returns NULL if clause is an OR or AND clause; it is the
> * responsibility of higher-level routines to cope with those.
> */
> static IndexClause *
> match_clause_to_indexcol(PlannerInfo *root,
> RestrictInfo *rinfo,
> int indexcol,
> IndexOptInfo *index)
>
> the above comments need a slight change.
>
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand = 1
> OR thousand = 3);
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: ((thousand = 1) OR (thousand = 3))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand in (1,3));
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> tenk2 index:
> Indexes:
> "tenk2_thous_tenthous" btree (thousand, tenthous)
>
> Looking at the above cases, I found out the "Recheck Cond" is
> different from "Index Cond".
> I wonder why there is a difference, or if they should be the same.
> then i come to:
> match_orclause_to_indexcol
>
> /*
> * Finally, build an IndexClause based on the SAOP node. Use
> * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
> * estimations because it may differ from the estimation made for an OR
> * clause. Although it is not a lossy expression, keep the old version of
> * rinfo in iclause->rinfo to detect duplicates and recheck the original
> * clause.
> */
> iclause = makeNode(IndexClause);
> iclause->rinfo = rinfo;
> iclause->indexquals = list_make1(make_simple_restrictinfo(root,
> &saopexpr->xpr));
> iclause->lossy = false;
> iclause->indexcol = indexcol;
> iclause->indexcols = NIL;
>
> looking at create_bitmap_scan_plan.
> I think "iclause->rinfo" itself won't be able to detect duplicates.
> since the upper code would mostly use "iclause->indexquals" for comparison?
>
>
> typedef struct IndexClause comments says:
> "
> * indexquals is a list of RestrictInfos for the directly-usable index
> * conditions associated with this IndexClause. In the simplest case
> * it's a one-element list whose member is iclause->rinfo. Otherwise,
> * it contains one or more directly-usable indexqual conditions extracted
> * from the given clause. The 'lossy' flag indicates whether the
> * indexquals are semantically equivalent to the original clause, or
> * represent a weaker condition.
> "
> should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
> since it meets the comment condition: "semantically equivalent to the
> original clause"
> or is the above comment slightly wrong?
>
> in match_orclause_to_indexcol
> i changed from
> iclause->rinfo = rinfo;
> to
> iclause->rinfo = make_simple_restrictinfo(root,
> &saopexpr->xpr);
>
> as expected. now the "Recheck Cond" is same as "Index Cond"
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> I am not sure of the implication of this change.
As comment says IndexClause.rinfo must be original restriction or join clause.
typedef struct IndexClause
{
pg_node_attr(no_copy_equal, no_read, no_query_jumble)
NodeTag type;
struct RestrictInfo *rinfo; /* original restriction or join clause */
I don't see any reason why should we violate that. Note that there are already cases when "Recheck Cond" doesn't match "Index Cond". For instance:
# explain select * from t where 100000 > i;
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=1860.66..7524.75 rows=99127 width=4)
Recheck Cond: (100000 > i)
-> Bitmap Index Scan on t_i_idx (cost=0.00..1835.88 rows=99127 width=0)
Index Cond: (i < 100000)
(4 rows)
Thus, this type of mismatch seems normal to me.
On Mon, Oct 28, 2024 at 9:19 AM jian he <jian.universality@gmail.com> wrote:
>
> * NOTE: returns NULL if clause is an OR or AND clause; it is the
> * responsibility of higher-level routines to cope with those.
> */
> static IndexClause *
> match_clause_to_indexcol(PlannerInfo *root,
> RestrictInfo *rinfo,
> int indexcol,
> IndexOptInfo *index)
>
> the above comments need a slight change.
>
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand = 1
> OR thousand = 3);
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: ((thousand = 1) OR (thousand = 3))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand in (1,3));
> QUERY PLAN
> -----------------------------------------------------------
> Bitmap Heap Scan on tenk2
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> tenk2 index:
> Indexes:
> "tenk2_thous_tenthous" btree (thousand, tenthous)
>
> Looking at the above cases, I found out the "Recheck Cond" is
> different from "Index Cond".
> I wonder why there is a difference, or if they should be the same.
> then i come to:
> match_orclause_to_indexcol
>
> /*
> * Finally, build an IndexClause based on the SAOP node. Use
> * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
> * estimations because it may differ from the estimation made for an OR
> * clause. Although it is not a lossy expression, keep the old version of
> * rinfo in iclause->rinfo to detect duplicates and recheck the original
> * clause.
> */
> iclause = makeNode(IndexClause);
> iclause->rinfo = rinfo;
> iclause->indexquals = list_make1(make_simple_restrictinfo(root,
> &saopexpr->xpr));
> iclause->lossy = false;
> iclause->indexcol = indexcol;
> iclause->indexcols = NIL;
>
> looking at create_bitmap_scan_plan.
> I think "iclause->rinfo" itself won't be able to detect duplicates.
> since the upper code would mostly use "iclause->indexquals" for comparison?
>
>
> typedef struct IndexClause comments says:
> "
> * indexquals is a list of RestrictInfos for the directly-usable index
> * conditions associated with this IndexClause. In the simplest case
> * it's a one-element list whose member is iclause->rinfo. Otherwise,
> * it contains one or more directly-usable indexqual conditions extracted
> * from the given clause. The 'lossy' flag indicates whether the
> * indexquals are semantically equivalent to the original clause, or
> * represent a weaker condition.
> "
> should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
> since it meets the comment condition: "semantically equivalent to the
> original clause"
> or is the above comment slightly wrong?
>
> in match_orclause_to_indexcol
> i changed from
> iclause->rinfo = rinfo;
> to
> iclause->rinfo = make_simple_restrictinfo(root,
> &saopexpr->xpr);
>
> as expected. now the "Recheck Cond" is same as "Index Cond"
> Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
> -> Bitmap Index Scan on tenk2_thous_tenthous
> Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> I am not sure of the implication of this change.
As comment says IndexClause.rinfo must be original restriction or join clause.
typedef struct IndexClause
{
pg_node_attr(no_copy_equal, no_read, no_query_jumble)
NodeTag type;
struct RestrictInfo *rinfo; /* original restriction or join clause */
I don't see any reason why should we violate that. Note that there are already cases when "Recheck Cond" doesn't match "Index Cond". For instance:
# explain select * from t where 100000 > i;
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=1860.66..7524.75 rows=99127 width=4)
Recheck Cond: (100000 > i)
-> Bitmap Index Scan on t_i_idx (cost=0.00..1835.88 rows=99127 width=0)
Index Cond: (i < 100000)
(4 rows)
Thus, this type of mismatch seems normal to me.
IndexClause.lossy should be false in our case (as it is). Lossy transformation happens when there are cases of false positives in the transformed clause. The comment gives an example of transformation "x LIKE 'foo%bar'" into "x >= 'foo' AND x < 'fop'". In this case, 'fooqux' would be case of false positive matching transformed clause but not matching the original clause. In our case, original and transformed clauses are equivalent. Therefore our transformation isn't lossy.
------
Regards,
Alexander Korotkov
Supabase
------
Regards,
Alexander Korotkov
Supabase
pgsql-hackers by date: