On 13/2/2024 07:00, jian he wrote:
> + newa = makeNode(ArrayExpr);
> + /* array_collid will be set by parse_collate.c */
> + newa->element_typeid = scalar_type;
> + newa->array_typeid = array_type;
> + newa->multidims = false;
> + newa->elements = aexprs;
> + newa->location = -1;
>
> I am confused by the comments `array_collid will be set by
> parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be
enough to dive into details.
> if OR expression right arm is not plain Const, but with collation
> specification, eg.
> `where a = 'a' collate "C" or a = 'b' collate "C";`
>
> then the rightop is not Const, it will be CollateExpr, it will not be
> used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner
cases of merging such expressions.
>
> set enable_or_transformation to on;
> explain(timing off, analyze, costs off)
> select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
> = 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
> 35.376 ms
>
> The time is the last result of the 10 iterations.
The reason here - parallel workers.
If you see into the plan you will find parallel workers without
optimization and absence of them in the case of optimization:
Gather (cost=1000.00..28685.37 rows=87037 width=12)
(actual rows=90363 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test
Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5)
OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))
Seq Scan on test (cost=0.02..20440.02 rows=90600 width=12)
(actual rows=90363 loops=1)
Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
Having 90600 tuples returned we estimate it into 87000 (less precisely)
without transformation and 90363 (more precisely) with the transformation.
But if you play with parallel_tuple_cost and parallel_setup_cost, you
will end up having these parallel workers:
Gather (cost=0.12..11691.03 rows=90600 width=12)
(actual rows=90363 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test
Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
Rows Removed by Filter: 303212
And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an
issue of parallel workers, not this specific optimization.
--
regards,
Andrei Lepikhov
Postgres Professional