Mark Lewis wrote:
> Doing a SELECT with a large list of variables inside an IN runs slowly
> on every database we've tested. We've tested mostly in Oracle and
> PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> to process the query at all after it gets too many bind parameters).
>
> In our case, we have a (potentially quite large) set of external values
> that we want to look up in the database. We originally thought that
> doing a single select with a large IN clause was the way to go, but then
> we did some performance analysis on the optimal batch size (number of
> items to include per IN clause), and discovered that for most databases,
> the optimal batch size was 1. For PostgreSQL I think it was 2.
So that is for parameterized queries (the batch size?).
In my case, I was concerned about latency between app and database
server, so I try to minimize the number of queries I send to the
database server. (My app servers can be anywhere, they /should/ be
close to the database server, but there are no guarantees and I can't
control it).
The last time I tested for optimal batch size using non-parameterized
queries with same-host database and app, I got a batch size of
approximately 700 IN list elements (again, not variables in that test).
That was on postgres 7.X.Y.
Guess I'll have to try a test where I turn the parameterized statements
into regular statements.
I'm pretty sure it would be a bad idea for me to send one IN list
element at a time in all cases. Even if the query query prep was fast,
the network latency could kill my app.
>
> The moral of the story is that you're probably better off running a
> bunch of small selects than in trying to optimize things with one
> gargantuan select.
The algorithm currently tries to ensure that IN-lists of not more than
700 elements are sent to the database server, and breaks them into
multiple queries. If it has to break it into at least 3 queries, it
uses parameterized statements for the first 2+ and then a
non-parameterized statement for the last one (which may have a different
number of IN list elements than the prior batches).