"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Is it true that the IN command is implemented sort of as a linked list
> linear time search? Is there any plan for a super-fast implementation of
> 'IN'?
This deserves a somewhat long-winded answer.
Postgres presently supports two kinds of IN (I'm not sure whether SQL92
allows any additional kinds):
1. Scalar-list IN: foo IN ('bar', 'baz', 'quux', ...)
2. Sub-select IN: foo IN (SELECT bar FROM ...)
In the scalar-list form, a variable is compared to an explicit list of
constants or expressions. This form is exactly equivalent tofoo = 'bar' OR foo = 'baz' OR foo = 'quux' OR ...
and is converted into that form by the parser. The planner is capable
of converting a WHERE clause of this kind into multiple passes of
indexscan, when foo is an indexed column and all the IN-list elements
are constants. Whether it actually will make that conversion depends
on the usual vagaries of pg_statistic entries, etc. But if it's a
unique or fairly-selective index, and there aren't a huge number of
entries in the IN list, a multiple indexscan should be a good plan.
In the sub-select form, we pretty much suck: for each tuple in the outer
query, we run the inner query until we find a matching value or the
inner query ends. This is basically a nested-loop scenario, with the
only (minimally) redeeming social value being that the planner realizes
it should pick a fast-start plan for the inner query. I think it should
be possible to convert this form into a modified kind of join (sort of
the reverse of an outer join: rather than at least one result per
lefthand row, at most one result per lefthand row), and then we could
use join methods that are more efficient than nested-loop. But no one's
tried to make that happen yet.
regards, tom lane