Re: performance question (something to do w/ - Mailing list pgsql-performance

From Tom Lane
Subject Re: performance question (something to do w/
Date
Msg-id 28288.1147111196@sss.pgh.pa.us
Whole thread Raw
In response to Re: performance question (something to do w/  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Big IN() clauses etc : feature proposal  (PFC <lists@peufeu.com>)
List pgsql-performance
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, May 08, 2006 at 10:42:21AM -0700, Mark Lewis wrote:
>> Doing a SELECT with a large list of variables inside an IN runs slowly
>> on every database we've tested.  We've tested mostly in Oracle and
>> PostgreSQL, and both get very slow very quickly (actually Oracle refuses
>> to process the query at all after it gets too many bind parameters).
>>
>> In our case, we have a (potentially quite large) set of external values
>> that we want to look up in the database.  We originally thought that
>> doing a single select with a large IN clause was the way to go, but then
>> we did some performance analysis on the optimal batch size (number of
>> items to include per IN clause), and discovered that for most databases,
>> the optimal batch size was 1.  For PostgreSQL I think it was 2.
>>
>> The moral of the story is that you're probably better off running a
>> bunch of small selects than in trying to optimize things with one
>> gargantuan select.

> Ever experiment with loading the parameters into a temp table and
> joining to that?

Also, it might be worth re-testing that conclusion with PG CVS tip
(or 8.2 when it comes out).  The reimplementation of IN as = ANY that
I did a couple months ago might well change the results.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: performance question (something to do w/
Next
From: "Andrus"
Date:
Subject: Re: Query runs 38 seconds for small database!