Re: performance question (something to do w/ parameterized - Mailing list pgsql-performance

From Jeffrey Tenny
Subject Re: performance question (something to do w/ parameterized
Date
Msg-id 445FA13E.2090304@comcast.net
Whole thread Raw
In response to Re: performance question (something to do w/  (Mark Lewis <mark.lewis@mir3.com>)
List pgsql-performance
Mark Lewis wrote:
> Doing a SELECT with a large list of variables inside an IN runs slowly
> on every database we've tested.  We've tested mostly in Oracle and
> PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> to process the query at all after it gets too many bind parameters).
>
> In our case, we have a (potentially quite large) set of external values
> that we want to look up in the database.  We originally thought that
> doing a single select with a large IN clause was the way to go, but then
> we did some performance analysis on the optimal batch size (number of
> items to include per IN clause), and discovered that for most databases,
> the optimal batch size was 1.  For PostgreSQL I think it was 2.

So that is for parameterized queries (the batch size?).

In my case, I was concerned about latency between app and database
server, so I try to minimize the number of queries I send to the
database server.  (My app servers can be anywhere, they /should/ be
close to the database server, but there are no guarantees and I can't
control it).

The last time I tested for optimal batch size using non-parameterized
queries with same-host database and app, I got a batch size of
approximately 700 IN list elements (again, not variables in that test).
That was on postgres 7.X.Y.

Guess I'll have to try a test where I turn the parameterized statements
into regular statements.

I'm pretty sure it would be a bad idea for me to send one IN list
element at a time in all cases.  Even if the query query prep was fast,
the network latency could kill my app.

>
> The moral of the story is that you're probably better off running a
> bunch of small selects than in trying to optimize things with one
> gargantuan select.

The algorithm currently tries to ensure that IN-lists of not more than
700 elements are sent to the database server, and breaks them into
multiple queries.  If it has to break it into at least 3 queries, it
uses parameterized statements for the first 2+ and then a
non-parameterized statement for the last one (which may have a different
number of IN list elements than the prior batches).

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Query runs 38 seconds for small database!
Next
From: Alvaro Herrera
Date:
Subject: Re: Query runs 38 seconds for small database!