"Atul Deopujari" <atuld@enterprisedb.com> writes:
> Yes, letting the planner make its own decision would seem best (in
> accordance with what we do for different join paths). But for large IN
> lists, a substantial part of the planner is spent in estimating the
> selectivity of the ScalarArrayExpr by calling scalararraysel. If we are
> not eliminating this step in processing the IN list then we are not
> doing any optimization. Asking the planner to do scalararraysel and also
> compute cost of any other way and choose between the two is asking
> planner to do more work.
So? Better planning usually involves more work. In any case the above
argument seems irrelevant, because making scalararraysel more
approximate and less expensive for long lists could be done
independently of anything else.
> Factors such as size of table, availability of index etc. would affect
> both the ways similarly. So, if we see a gain in the execution of the IN
> list due to an external factor then we will also see a similar gain in
> the execution of the transformed IN (VALUES(...)) clause.
Incorrect. There is more than one way to do a join, and the above
argument only applies if the VALUES case is planned as a nestloop with
inner indexscan, which indeed is isomorphic to the scalararrayop
implementation ... except that it has higher per-tuple overhead, and
therefore will consistently lose, disregarding artifacts of planning
costs such as how hard we try to estimate the result size. The case
where VALUES is actually a better plan is where the planner switches to
merge or hash join because there are too many values. In the current
implementation, the planner is incapable of generating those plan shapes
from a scalararrayop, and that's what I'd like to see fixed.
regards, tom lane