Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAPpHfdvjtEWqjVcPd3-JQw8yCoppMXjK8kHnvinxBXGMZt-M_g@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
Hi, Jian!

On Mon, Oct 28, 2024 at 9:19 AM jian he <jian.universality@gmail.com> wrote:
>
>  * NOTE:  returns NULL if clause is an OR or AND clause; it is the
>  * responsibility of higher-level routines to cope with those.
>  */
> static IndexClause *
> match_clause_to_indexcol(PlannerInfo *root,
>                          RestrictInfo *rinfo,
>                          int indexcol,
>                          IndexOptInfo *index)
>
> the above comments need a slight change.
>
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand = 1
> OR thousand = 3);
>                         QUERY PLAN
> -----------------------------------------------------------
>  Bitmap Heap Scan on tenk2
>    Recheck Cond: ((thousand = 1) OR (thousand = 3))
>    ->  Bitmap Index Scan on tenk2_thous_tenthous
>          Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand in (1,3));
>                         QUERY PLAN
> -----------------------------------------------------------
>  Bitmap Heap Scan on tenk2
>    Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
>    ->  Bitmap Index Scan on tenk2_thous_tenthous
>          Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> tenk2 index:
> Indexes:
>     "tenk2_thous_tenthous" btree (thousand, tenthous)
>
> Looking at the above cases, I found out the "Recheck Cond" is
> different from "Index Cond".
> I wonder why there is a difference, or if they should be the same.
> then i come to:
> match_orclause_to_indexcol
>
>     /*
>      * Finally, build an IndexClause based on the SAOP node. Use
>      * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
>      * estimations because it may differ from the estimation made for an OR
>      * clause. Although it is not a lossy expression, keep the old version of
>      * rinfo in iclause->rinfo to detect duplicates and recheck the original
>      * clause.
>      */
>     iclause = makeNode(IndexClause);
>     iclause->rinfo = rinfo;
>     iclause->indexquals = list_make1(make_simple_restrictinfo(root,
>                                                               &saopexpr->xpr));
>     iclause->lossy = false;
>     iclause->indexcol = indexcol;
>     iclause->indexcols = NIL;
>
> looking at create_bitmap_scan_plan.
> I think "iclause->rinfo" itself won't be able to detect duplicates.
> since the upper code would mostly use "iclause->indexquals" for comparison?
>
>
> typedef struct IndexClause comments says:
> "
>  * indexquals is a list of RestrictInfos for the directly-usable index
>  * conditions associated with this IndexClause.  In the simplest case
>  * it's a one-element list whose member is iclause->rinfo.  Otherwise,
>  * it contains one or more directly-usable indexqual conditions extracted
>  * from the given clause.  The 'lossy' flag indicates whether the
>  * indexquals are semantically equivalent to the original clause, or
>  * represent a weaker condition.
> "
> should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
> since it meets the comment condition: "semantically equivalent to the
> original clause"
> or is the above comment slightly wrong?
>
> in match_orclause_to_indexcol
> i changed from
> iclause->rinfo = rinfo;
> to
>  iclause->rinfo = make_simple_restrictinfo(root,
>                                                 &saopexpr->xpr);
>
> as expected. now the "Recheck Cond" is same as "Index Cond"
>    Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
>    ->  Bitmap Index Scan on tenk2_thous_tenthous
>          Index Cond: (thousand = ANY ('{1,3}'::integer[]))
>
> I am not sure of the implication of this change.

As comment says IndexClause.rinfo must be original restriction or join clause.

typedef struct IndexClause
{
    pg_node_attr(no_copy_equal, no_read, no_query_jumble)

    NodeTag     type;
    struct RestrictInfo *rinfo; /* original restriction or join clause */

I don't see any reason why should we violate that.  Note that there are already cases when "Recheck Cond" doesn't match "Index Cond".  For instance:

# explain select * from t where 100000 > i;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=1860.66..7524.75 rows=99127 width=4)
   Recheck Cond: (100000 > i)
   ->  Bitmap Index Scan on t_i_idx  (cost=0.00..1835.88 rows=99127 width=0)
         Index Cond: (i < 100000)
(4 rows)

Thus, this type of mismatch seems normal to me.

IndexClause.lossy should be false in our case (as it is).  Lossy transformation happens when there are cases of false positives in the transformed clause.  The comment gives an example of transformation  "x LIKE 'foo%bar'" into "x >= 'foo' AND x < 'fop'".  In this case, 'fooqux' would be case of false positive matching transformed clause but not matching the original clause.  In our case, original and transformed clauses are equivalent.  Therefore our transformation isn't lossy.
 
------
Regards,
Alexander Korotkov
Supabase

pgsql-hackers by date:

Previous
From: Mats Kindahl
Date:
Subject: Re: Potential ABI breakage in upcoming minor releases
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: FW: Building Postgres 17.0 with meson