Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAPpHfduOn0htAXp6h3gcv1hMUuoQSYfM-sk6TXKcM57C8B4Umw@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Andrei Lepikhov <lepihov@gmail.com>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
Hi, Andrei!

On Mon, Jan 27, 2025 at 10:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 1/25/25 12:04, Alexander Korotkov wrote:
> > On Wed, Jan 15, 2025 at 10:24 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> causes SEGFAULT during index keys evaluation. I haven't dived into it
> >> yet, but it seems quite a typical misstep and is not difficult to fix.
> >
> > Segfault appears to be caused by a typo.  Patch used parent rinfo
> > instead of child rinfo.  Fixed in the attached patch.
> Great!
> >
> > It appears that your first query also changed a plan after fixing
> > this.  Could you, please, provide another example of a regression for
> > short-circuit optimization, which is related to this patch?
> Yes, it may be caused by the current lazy InitPlan evaluation strategy,
> which would only happen if it was really needed.
>
> Examples:
> ---------
>
> EXPLAIN (ANALYZE, COSTS OFF, BUFFERS OFF, TIMING OFF)
> SELECT * FROM bitmap_split_or t1
> WHERE t1.a=2 AND (t1.b=2 OR t1.b = (
>     SELECT avg(x) FROM generate_series(1,1e6) AS x)::integer);
>
> without optimisation:
>
>   Index Scan using t_a_b_idx on bitmap_split_or t1 (actual rows=1 loops=1)
>     Index Cond: (a = 2)
>     Filter: ((b = 2) OR (b = ((InitPlan 1).col1)::integer))
>     InitPlan 1
>       ->  Aggregate (never executed)
>             ->  Function Scan on generate_series x (never executed)
>   Planning Time: 0.564 ms
>   Execution Time: 0.182 ms
>
> But having it as a part of an array, we forcedly evaluate it for (not
> 100% sure) more precise selectivity estimation:
>
>   Index Scan using t_a_b_idx on bitmap_split_or t1
>     (actual rows=1 loops=1)
>     Index Cond: ((a = 2) AND
>                  (b = ANY (ARRAY[2, ((InitPlan 1).col1)::integer])))
>     InitPlan 1
>       ->  Aggregate (actual rows=1 loops=1)
>             ->  Function Scan on generate_series x
>                 (actual rows=1000000 loops=1)
>   Planning Time: 0.927 ms
>   Execution Time: 489.933 ms
>
> This also means that if, before the patch, we executed a query
> successfully, after applying the patch, we sometimes may get the error:
> 'ERROR: more than one row returned by a subquery used as an expression'
> because of early InitPlan evaluation. See the example below:
>
> EXPLAIN (ANALYZE, COSTS OFF)
> SELECT * FROM bitmap_split_or t1
> WHERE t1.a=2 AND (t1.b=2 OR t1.b = (
>     SELECT random() FROM generate_series(1,1e6) AS x)::integer);
>
>   Index Scan using t_a_b_idx on bitmap_split_or t1
>     Index Cond: ((a = 2) AND (b = ANY (ARRAY[2, ((InitPlan
> 1).col1)::integer])))
>     InitPlan 1
>       ->  Function Scan on generate_series x
>
> I think optimisation should have never happened and this is another
> issue, isn't it?

Thank you for your examples.  The reason why these example works only with the patch is that you apply the cast outside of subquery.  This is because d4378c0005 requires OR argument to be either Cost or Param, but not a cast over the param.  Consider this example on master.

# EXPLAIN (ANALYZE, COSTS OFF, BUFFERS OFF, TIMING OFF)
SELECT * FROM bitmap_split_or t1
WHERE t1.a=2 AND (t1.b=2 OR t1.b = (
    SELECT avg(x)::integer FROM generate_series(1,1e6) AS x));
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using t_a_b_idx on bitmap_split_or t1 (actual rows=1 loops=1)
   Index Cond: ((a = 2) AND (b = ANY (ARRAY[2, (InitPlan 1).col1])))
   InitPlan 1
     ->  Aggregate (actual rows=1 loops=1)
           ->  Function Scan on generate_series x (actual rows=1000000 loops=1)
 Planning Time: 0.731 ms
 Execution Time: 577.953 ms
(7 rows)

I expressed my point on this in [1].  We generally greedy about index quals and there is no logic which prevent us from using a clause and index qual because of its cost.  And there are many cases when this causes regressions before d4378c0005.  One of examples from [1].

# explain analyze select * from t where i = 0 and j = (select slowfunc());
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=25000.01..25195.01 rows=1 width=8) (actual time=0.806..0.807 rows=0 loops=1)
   Filter: ((i = 0) AND (j = (InitPlan 1).col1))
   Rows Removed by Filter: 10000
   InitPlan 1
     ->  Result  (cost=0.00..25000.01 rows=1 width=4) (never executed)
 Planning Time: 0.165 ms
 Execution Time: 0.843 ms
(7 rows)


Links.

pgsql-hackers by date:

Previous
From: "Zhou, Zhiguo"
Date:
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Next
From: Laurenz Albe
Date:
Subject: Re: Disabling vacuum truncate for autovacuum