Re: Mistake in my query or Index Scan on subquery failure? - Mailing list pgsql-general

From Nick Barr
Subject Re: Mistake in my query or Index Scan on subquery failure?
Date
Msg-id 40461DFD.40705@chuckie.co.uk
Whole thread Raw
In response to Mistake in my query or Index Scan on subquery failure? (7.4)  ("Gellert, Andre" <AGellert@ElectronicPartner.de>)
List pgsql-general
Gellert, Andre wrote:

>Hello,
>I have a problem with this simple query :
>
>explain select * from ref_artikel a where a.artnr in (  351275 , 351346 ,
>293082 ) LIMIT 20 OFFSET 0 ;
>                                                            QUERY PLAN
>
>----------------------------------------------------------------------------
>------------------------------------------------------
> Limit  (cost=0.00..9.06 rows=3 width=299)
>   ->  Index Scan using ref_artikel_pkey, ref_artikel_pkey, \
>       ref_artikel_pkey on ref_artikel a  (cost=0.00..9.06 rows=3 width=299)
>         Index Cond: ((artnr = 351275) OR (artnr = 351346) OR (artnr =
>293082))
>(3 rows)
>
>Is it okay , that the word ref_artikel_pkey will be repeated for each
>condition ?
>
>How can i optimize this, i have a number of up to 3000 "artnr" cond. to
>check for equality
>( and more tables joined over this query which i left away for better
>understanding ) ,
>is a  "(x=1) or (x=2)..." faster than "x in 1,2... " at this large number of
>checks ?
>
>Maybe this is the hash-joined bug marked for this 7.4 postgresql....
>I consider upgrading to 7.41 on another testing server....
>
>Thx for ideas,
>Andre
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
As far as I can see this is fine, especially for a small number of
values in the IN clause. If I understand you correctly the number of
values in the IN clause might extend to 3000. This would not be
particularly efficient and might end in a sequential scan.

For example,

explain select * from ref_artikel a where a.artnr in ( 351275 , 351346 ,
293082, ... <2997 more values> ) LIMIT 20 OFFSET 0 ;


A better method if you are going to use a lot of numbers would be something like the following:

-----
BEGIN;

CREATE TEMPORARY TABLE temp_num_2004_03_03 (temp_num int4);

INSERT INTO temp_num_2004_03_03 (351275);
...
INSERT INTO temp_num_2004_03_03 (293082);

CREATE INDEX temp_num_2004_03_03_temp_num_idx ON temp_num_2004_03_03 (temp_num);

SELECT * FROM ref_artikel a, temp_num_2004_03_03 b WHERE a.artnr=b.temp_num ORDER BY a.artnr LIMIT 20 OFFSET 0;

DROP INDEX temp_num_2004_03_03_temp_num_idx;
DROP TABLE temp_num_2004_03_03;

COMMIT/ABORT;
-----

This is especially true if you are going to use the set of numbers again in other queries that follow.

BTW, you will need to put an ORDER BY clause in your SELECT to guarentee the order of the rows that come back,
especiallywhen using the LIMIT/OFFSET clauses. 



Nick



pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Moving from MySQL to PGSQL....some questions (multilevel
Next
From: Paulovič Michal
Date:
Subject: Re: Moving from MySQL to PGSQL....some questions (multilevel