Thread: Optimize query: time of "single * IN(many)" > time of "many * IN(single)"

Optimize query: time of "single * IN(many)" > time of "many * IN(single)"

From

"Paul Janssen"

Date:

08 January 2004, 06:45:43

Hello!

Can anyone help me out with the following situation:
   (a) a single query with 550 id's in the IN-clause resulting into 800+
seconds;
   (b) 550 queries with a single id in the IN-clause resulting into overall
time of <60 seconds;
The table consists of 950.000 records, and the resultset consists of 205.000
records.

  >> Why is there such an extreme difference in time?
  >> And is there a way to reduce the difference in time?
More information about the situation is below.

Thank you for your help and time! =)

Postgres-version
     7.3.1

The query is like:
     SELECT a_id, b_id, score, c_id, d_id
     FROM tbl_scores
     WHERE a_id IN(...)
     UNION
     SELECT a_id, b_id, score, c_id, d_id
     FROM tbl_scores_alike
     WHERE a_id IN(...)

Definition of tables is like:
___Fields
CREATE TABLE public.tbl_scores (
  id int4 DEFAULT nextval('"tbl_scores_id_seq"'::text) NOT NULL,
  a_id int4,
  b_id int4,
  score int4,   d_id int8,
  CONSTRAINT tbl_scores_pkey UNIQUE (id),
  dc date DEFAULT now(),
  c_id int4,
INITIALLY IMMEDIATE
) WITHOUT OIDS;
___Index
      CREATE INDEX tbl_scores_idx10 ON tbl_scores USING btree (a_id, b_id,
score, c_id, d_id);

Things that I tried to reduce the time of situation (a) - single * IN(many):
   * vacuum of the database; hardly any improvement.
   * selecting a single field in the resultset (a_id) instead of all fields;
hardly any improvement.
   * only querying one table, skipping the UNION; hardly any improvement;
   * ... what would you try?

_________________________________________________________________
MSN Zoeken helpt je om de gekste dingen te vinden! http://search.msn.nl

Re: Optimize query: time of "single * IN(many)" > time

From

Dave Smith

Date:

08 January 2004, 07:29:23

Firstly you should always provide an explain from your query before
posting to this list. I think the problem is that in <7.4 PG did not use
indexes for IN queries. This has been fixed in 7.4.

On Thu, 2004-01-08 at 05:44, Paul Janssen wrote:
> Hello!
>
> Can anyone help me out with the following situation:
>    (a) a single query with 550 id's in the IN-clause resulting into 800+
> seconds;
>    (b) 550 queries with a single id in the IN-clause resulting into overall
> time of <60 seconds;
> The table consists of 950.000 records, and the resultset consists of 205.000
> records.
>
>   >> Why is there such an extreme difference in time?
>   >> And is there a way to reduce the difference in time?
> More information about the situation is below.
>
> Thank you for your help and time! =)
>
> Postgres-version
>      7.3.1
>
> The query is like:
>      SELECT a_id, b_id, score, c_id, d_id
>      FROM tbl_scores
>      WHERE a_id IN(...)
>      UNION
>      SELECT a_id, b_id, score, c_id, d_id
>      FROM tbl_scores_alike
>      WHERE a_id IN(...)
>
> Definition of tables is like:
> ___Fields
> CREATE TABLE public.tbl_scores (
>   id int4 DEFAULT nextval('"tbl_scores_id_seq"'::text) NOT NULL,
>   a_id int4,
>   b_id int4,
>   score int4,   d_id int8,
>   CONSTRAINT tbl_scores_pkey UNIQUE (id),
>   dc date DEFAULT now(),
>   c_id int4,
> INITIALLY IMMEDIATE
> ) WITHOUT OIDS;
> ___Index
>       CREATE INDEX tbl_scores_idx10 ON tbl_scores USING btree (a_id, b_id,
> score, c_id, d_id);
>
> Things that I tried to reduce the time of situation (a) - single * IN(many):
>    * vacuum of the database; hardly any improvement.
>    * selecting a single field in the resultset (a_id) instead of all fields;
> hardly any improvement.
>    * only querying one table, skipping the UNION; hardly any improvement;
>    * ... what would you try?
>
> _________________________________________________________________
> MSN Zoeken helpt je om de gekste dingen te vinden! http://search.msn.nl
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: Optimize query: time of "single * IN(many)" > time

From

Alvaro Herrera

Date:

08 January 2004, 07:45:27

On Thu, Jan 08, 2004 at 06:28:14AM -0500, Dave Smith wrote:
> Firstly you should always provide an explain from your query before
> posting to this list.

You mean "while posting", because he can't possible provide the explain
before having the means to do so, can he? :-)

> I think the problem is that in <7.4 PG did not use
> indexes for IN queries. This has been fixed in 7.4.

I think what was done is to optimize queries like IN (SELECT ...) but
there has not been improvement for IN (1,2,3, ... 550) like he
appears to be doing.

Maybe something to try is putting the IDs in a (temp?) table and using
the first form.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)

Re: Optimize query: time of "single * IN(many)" > time of "many * IN(single)"

From

Tom Lane

Date:

08 January 2004, 11:02:09

"Paul Janssen" <postgresuser@hotmail.com> writes:
> Can anyone help me out with the following situation:
>    (a) a single query with 550 id's in the IN-clause resulting into 800+
> seconds;
>    (b) 550 queries with a single id in the IN-clause resulting into overall
> time of <60 seconds;
> The table consists of 950.000 records, and the resultset consists of 205.000
> records.

> Why is there such an extreme difference in time?

Most likely the planner is opting not to use an indexscan in the first
case.  Could we see EXPLAIN ANALYZE results for both cases?  Also, try
"SET enable_seqscan TO OFF" and then repeat EXPLAIN ANALYZE for case (a).

            regards, tom lane