Re: Bad performance of SELECT ... where id IN (...) - Mailing list pgsql-performance

From Kenneth Marshall
Subject Re: Bad performance of SELECT ... where id IN (...)
Date
Msg-id 20091009130905.GO23159@it.is.rice.edu
Whole thread Raw
In response to Re: Bad performance of SELECT ... where id IN (...)  (Xia Qingran <qingran.xia@gmail.com>)
List pgsql-performance
On Fri, Oct 09, 2009 at 08:31:54PM +0800, Xia Qingran wrote:
> On Mon, Oct 5, 2009 at 9:58 AM, Omar Kilani <omar.kilani@gmail.com> wrote:
> > Hi Xia,
> >
> > Try this patch:
> >
> > http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch
> >
> > It's a hack, but it works for us. I think you're probably spending
> > most of your query time planning, and this patch helps speed things up
> > 10x over here.
>
> Thanks!
> I am trying it.
>
> Regards,
>
> Xia Qingran
>

We have a similar situation when using DSPAM with a PostgreSQL
backend. In that case we used a function like the following to
speed up the lookups. I do not know if it would be useful in
your situation, but I thought I would post it for the group:

The original query was of the form:

SELECT uid, token, spam_hits, innocent_hits FROM dspam_token_data
WHERE uid = 'xxx' AND token IN (...);

The faster version of the query in the current code is:

SELECT * FROM lookup_tokens(%d, '{...});

where lookup_tokens is defined as follows:

create function lookup_tokens(integer,bigint[])
  returns setof dspam_token_data
  language plpgsql stable
  as '
declare
  v_rec record;
begin
  for v_rec in select * from dspam_token_data
                where uid=$1
                  and token in (select $2[i]
                                  from generate_series(array_lower($2,1),
                                                       array_upper($2,1)) s(i))
  loop
    return next v_rec;
  end loop;
  return;
end;';

Anyway, you may want to try a similar approach instead of the
posted code change.

Regards,
Ken

> >
> > Regards,
> > Omar
> >
> > On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran.xia@gmail.com> wrote:
> >> On Sat, Sep 26, 2009 at 10:59 PM, Craig James
> >> <craig_james@emolecules.com> wrote:
> >>>
> >>> If your user_id is always in a narrow range like this, or even in any range
> >>> that is a small fraction of the total, then add a range condition, like
> >>> this:
> >>>
> >>> select * from event where user_id <= 500 and user_id >= 0 and user_id in
> >>> (...)
> >>>
> >>> I did this exact same thing in my application and it worked well.
> >>>
> >>> Craig
> >>>
> >>
> >> It is a good idea. But In my application, most of the queries' user_id
> >> are random and difficult to range.
> >> Thanks anyway.
> >>
> >>
> >>
> >> --
> >> ?????????
> >> Xia Qingran
> >> qingran.xia@gmail.com
> >> Sent from Beijing, 11, China
> >> Charles de Gaulle ??- "The better I get to know men, the more I find
> >> myself loving dogs." -
> >> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >>
> >
>
>
>
> --
> ?????????
> Xia Qingran
> qingran.xia@gmail.com
> Sent from Beijing, 11, China
> Stephen Leacock  - "I detest life-insurance agents: they always argue
> that I shall some day die, which is not so." -
> http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

pgsql-performance by date:

Previous
From: Xia Qingran
Date:
Subject: Re: Bad performance of SELECT ... where id IN (...)
Next
From: Merlin Moncure
Date:
Subject: Re: High CPU load on Postgres Server during Peak times!!!!