Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id e3338e82-a28d-4631-9eec-b9c0984b32d5@postgrespro.ru
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (jian he <jian.universality@gmail.com>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
On 13/2/2024 07:00, jian he wrote:
> + newa = makeNode(ArrayExpr);
> + /* array_collid will be set by parse_collate.c */
> + newa->element_typeid = scalar_type;
> + newa->array_typeid = array_type;
> + newa->multidims = false;
> + newa->elements = aexprs;
> + newa->location = -1;
> 
> I am confused by the comments `array_collid will be set by
> parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be 
enough to dive into details.

> if OR expression right arm is not plain Const, but with collation
> specification, eg.
> `where a  = 'a' collate "C" or a = 'b' collate "C";`
> 
> then the rightop is not Const, it will be CollateExpr, it will not be
> used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner 
cases of merging such expressions.

> 
> set enable_or_transformation to on;
> explain(timing off, analyze, costs off)
> select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
> = 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
> 35.376 ms
> 
> The time is the last result of the 10 iterations.
The reason here - parallel workers.
If you see into the plan you will find parallel workers without 
optimization and absence of them in the case of optimization:

Gather  (cost=1000.00..28685.37 rows=87037 width=12)
    (actual rows=90363 loops=1)
    Workers Planned: 2
    Workers Launched: 2
    ->  Parallel Seq Scan on test
          Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5) 
OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))

Seq Scan on test  (cost=0.02..20440.02 rows=90600 width=12)
          (actual rows=90363 loops=1)
    Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely) 
without transformation and 90363 (more precisely) with the transformation.
But if you play with parallel_tuple_cost and parallel_setup_cost, you 
will end up having these parallel workers:

  Gather  (cost=0.12..11691.03 rows=90600 width=12)
     (actual rows=90363 loops=1)
    Workers Planned: 2
    Workers Launched: 2
    ->  Parallel Seq Scan on test
          Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
          Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.

-- 
regards,
Andrei Lepikhov
Postgres Professional




pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Andrei Lepikhov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes