Re: performance question (something to do w/ - Mailing list pgsql-performance

From Kenneth Marshall
Subject Re: performance question (something to do w/
Date
Msg-id 20060508180136.GK1862@it.is.rice.edu
Whole thread Raw
In response to Re: performance question (something to do w/  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-performance
On Mon, May 08, 2006 at 12:50:13PM -0500, Jim C. Nasby wrote:
> On Mon, May 08, 2006 at 10:42:21AM -0700, Mark Lewis wrote:
> > Doing a SELECT with a large list of variables inside an IN runs slowly
> > on every database we've tested.  We've tested mostly in Oracle and
> > PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> > to process the query at all after it gets too many bind parameters).
> >
> > In our case, we have a (potentially quite large) set of external values
> > that we want to look up in the database.  We originally thought that
> > doing a single select with a large IN clause was the way to go, but then
> > we did some performance analysis on the optimal batch size (number of
> > items to include per IN clause), and discovered that for most databases,
> > the optimal batch size was 1.  For PostgreSQL I think it was 2.
> >
> > The moral of the story is that you're probably better off running a
> > bunch of small selects than in trying to optimize things with one
> > gargantuan select.
>
> Ever experiment with loading the parameters into a temp table and
> joining to that?
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of broadcast)---------------------------

The DB use by the DSPAM software is very similar to your use case. The
fastest queries are made using the PostgreSQL generate_series functionality
to unwind the "IN *" to multiple single selects. Here is the lookup function
that they use:

create function lookup_tokens(integer,bigint[])
  returns setof dspam_token_data
  language plpgsql stable
  as '
declare
  v_rec record;
begin
  for v_rec in select * from dspam_token_data
                where uid=$1
                  and token in (select $2[i]
                                  from generate_series(array_lower($2,1),
                                                       array_upper($2,1)) s(i))
  loop
    return next v_rec;
  end loop;
  return;
end;';


You should be able to try something similar for your workload.

Ken Marshall

pgsql-performance by date:

Previous
From: Witold Strzelczyk
Date:
Subject: slow variable against int??
Next
From: "Adam Palmblad"
Date:
Subject: Dynamically loaded C function performance