Re: alternate idioms for large "IN (...)" lists - Mailing list pgsql-sql

From Christopher Kings-Lynne
Subject Re: alternate idioms for large "IN (...)" lists
Date
Msg-id 007a01c20a5f$8481b8a0$6500a8c0@internal
Whole thread Raw
In response to alternate idioms for large "IN (...)" lists  (russm <russm@icorp.com.au>)
Responses Re: alternate idioms for large "IN (...)" lists
List pgsql-sql
Yep, Postgresql's IN implementation is very slow.  You are far better off
rewriting your query to use EXISTS or an OUTER JOIN.

See:
http://www3.us.postgresql.org/users-lounge/docs/7.2/postgres/functions-subqu
ery.html

eg.

SELECT attribute_id, COUNT(asset_id) FROM attribute_mapWHERE EXISTS (SELECT asset_id FROM <potentially several thousand
asset_id
values> WHERE asset_id=...)GROUP BY attribute_id
HAVING COUNT(asset_id) < <cutoff value>ORDER BY count DESCLIMIT <limit>

or something like:

SELECT attribute_id, COUNT(asset_id) FROM attribute_map atm RIGHT JOIN assets_map aam ON atm.asset_id=asm.asset_id
GROUP BY....blah blah blah...

Play around with these alternative syntaxes and stay away from putting
thousands of rows in IN() and you will be able to make your query run orders
of magnitude faster.

Chris

----- Original Message -----
From: "russm" <russm@icorp.com.au>
To: <pgsql-sql@postgresql.org>
Sent: Saturday, June 01, 2002 11:10 PM
Subject: [SQL] alternate idioms for large "IN (...)" lists


> I'm working on an application (originally written for Oracle) that is
being
> brought to it's knees by one commonly-executed query - finding the most
> common attributes for a given list of assets...
>
>
> SELECT attribute_id, COUNT(asset_id)
>   FROM attribute_map
>  WHERE asset_id IN ( <potentially several thousand asset_id values> )
>  GROUP BY attribute_id
> HAVING COUNT(asset_id) < <cutoff value>
>  ORDER BY count DESC
>  LIMIT <limit>
>
>
> where attribute_map is a many-to-many map of asset_id to attribute_id -
>
> demo=# \d attribute_map
>   Table "attribute_map"
>    Column     |  Type   | Modifiers
> --------------+---------+-----------
>  asset_id     | integer |
>  attribute_id | integer |
> Indexes: am_asset_id_idx,
>          am_attribute_id_idx
> Unique keys: am_asset_id_attribute_id_un
> Triggers: RI_ConstraintTrigger_26912,
>           RI_ConstraintTrigger_26906
>
>
> >From what i've read here, postgresql doesn't handle IN (...) queries
> especially efficiently, and it looks to me like I'm being bitten by that.
An
> EXPLAIN ANALYZE on that query with just under 40k rows in attribute_map
and
> 667 asset_id values in the IN (...) list returns
>
> NOTICE:  QUERY PLAN:
>
> Limit  (cost=64361.51..64361.51 rows=200 width=8) (actual
> time=24431.51..24431.80 rows=200 loops=1)
>   ->  Sort  (cost=64361.51..64361.51 rows=1647 width=8) (actual
> time=24431.50..24431.61 rows=201 loops=1)
>         ->  Aggregate  (cost=0.00..64273.49 rows=1647 width=8) (actual
> time=55.29..24324.75 rows=1747 loops=1)
>               ->  Group  (cost=0.00..64191.13 rows=16473 width=8) (actual
> time=2.38..24198.18 rows=21308 loops=1)
>                     ->  Index Scan using am_attribute_id_idx on
> attribute_map  (cost=0.00..64149.94 rows=16473 width=8) (actual
> time=2.37..24034.97 rows=21308 loops=1)
> Total runtime: 24433.20 msec
>
>
> which I'm assuming means that the backend is doing a seperate index lookup
> for each of the 667 asset_id values in the list.
>
> Are there any standard idioms in postgres for getting around the poor
> handling of IN (...) lists? Placing the list of asset_id values in a table
> and then joining against that runs in about 1/15 the time (postgres makes
a
> hash of the asset_id values and then hash joins against the attribute_map)
> but since my asset_id values come from outside the database that approach
> would require creating and managing a whole lot of temporary tables, which
> i'd rather avoid.
>
> Much obliged for any suggestions
>
> Russell
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>



pgsql-sql by date:

Previous
From: russm
Date:
Subject: alternate idioms for large "IN (...)" lists
Next
From: "Paul "
Date:
Subject: to_date()