"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, May 08, 2006 at 10:42:21AM -0700, Mark Lewis wrote:
>> Doing a SELECT with a large list of variables inside an IN runs slowly
>> on every database we've tested. We've tested mostly in Oracle and
>> PostgreSQL, and both get very slow very quickly (actually Oracle refuses
>> to process the query at all after it gets too many bind parameters).
>>
>> In our case, we have a (potentially quite large) set of external values
>> that we want to look up in the database. We originally thought that
>> doing a single select with a large IN clause was the way to go, but then
>> we did some performance analysis on the optimal batch size (number of
>> items to include per IN clause), and discovered that for most databases,
>> the optimal batch size was 1. For PostgreSQL I think it was 2.
>>
>> The moral of the story is that you're probably better off running a
>> bunch of small selects than in trying to optimize things with one
>> gargantuan select.
> Ever experiment with loading the parameters into a temp table and
> joining to that?
Also, it might be worth re-testing that conclusion with PG CVS tip
(or 8.2 when it comes out). The reimplementation of IN as = ANY that
I did a couple months ago might well change the results.
regards, tom lane