Re: alternate idioms for large "IN (...)" lists - Mailing list pgsql-sql
From | Christopher Kings-Lynne |
---|---|
Subject | Re: alternate idioms for large "IN (...)" lists |
Date | |
Msg-id | 007a01c20a5f$8481b8a0$6500a8c0@internal Whole thread Raw |
In response to | alternate idioms for large "IN (...)" lists (russm <russm@icorp.com.au>) |
Responses |
Re: alternate idioms for large "IN (...)" lists
|
List | pgsql-sql |
Yep, Postgresql's IN implementation is very slow. You are far better off rewriting your query to use EXISTS or an OUTER JOIN. See: http://www3.us.postgresql.org/users-lounge/docs/7.2/postgres/functions-subqu ery.html eg. SELECT attribute_id, COUNT(asset_id) FROM attribute_mapWHERE EXISTS (SELECT asset_id FROM <potentially several thousand asset_id values> WHERE asset_id=...)GROUP BY attribute_id HAVING COUNT(asset_id) < <cutoff value>ORDER BY count DESCLIMIT <limit> or something like: SELECT attribute_id, COUNT(asset_id) FROM attribute_map atm RIGHT JOIN assets_map aam ON atm.asset_id=asm.asset_id GROUP BY....blah blah blah... Play around with these alternative syntaxes and stay away from putting thousands of rows in IN() and you will be able to make your query run orders of magnitude faster. Chris ----- Original Message ----- From: "russm" <russm@icorp.com.au> To: <pgsql-sql@postgresql.org> Sent: Saturday, June 01, 2002 11:10 PM Subject: [SQL] alternate idioms for large "IN (...)" lists > I'm working on an application (originally written for Oracle) that is being > brought to it's knees by one commonly-executed query - finding the most > common attributes for a given list of assets... > > > SELECT attribute_id, COUNT(asset_id) > FROM attribute_map > WHERE asset_id IN ( <potentially several thousand asset_id values> ) > GROUP BY attribute_id > HAVING COUNT(asset_id) < <cutoff value> > ORDER BY count DESC > LIMIT <limit> > > > where attribute_map is a many-to-many map of asset_id to attribute_id - > > demo=# \d attribute_map > Table "attribute_map" > Column | Type | Modifiers > --------------+---------+----------- > asset_id | integer | > attribute_id | integer | > Indexes: am_asset_id_idx, > am_attribute_id_idx > Unique keys: am_asset_id_attribute_id_un > Triggers: RI_ConstraintTrigger_26912, > RI_ConstraintTrigger_26906 > > > >From what i've read here, postgresql doesn't handle IN (...) queries > especially efficiently, and it looks to me like I'm being bitten by that. An > EXPLAIN ANALYZE on that query with just under 40k rows in attribute_map and > 667 asset_id values in the IN (...) list returns > > NOTICE: QUERY PLAN: > > Limit (cost=64361.51..64361.51 rows=200 width=8) (actual > time=24431.51..24431.80 rows=200 loops=1) > -> Sort (cost=64361.51..64361.51 rows=1647 width=8) (actual > time=24431.50..24431.61 rows=201 loops=1) > -> Aggregate (cost=0.00..64273.49 rows=1647 width=8) (actual > time=55.29..24324.75 rows=1747 loops=1) > -> Group (cost=0.00..64191.13 rows=16473 width=8) (actual > time=2.38..24198.18 rows=21308 loops=1) > -> Index Scan using am_attribute_id_idx on > attribute_map (cost=0.00..64149.94 rows=16473 width=8) (actual > time=2.37..24034.97 rows=21308 loops=1) > Total runtime: 24433.20 msec > > > which I'm assuming means that the backend is doing a seperate index lookup > for each of the 667 asset_id values in the list. > > Are there any standard idioms in postgres for getting around the poor > handling of IN (...) lists? Placing the list of asset_id values in a table > and then joining against that runs in about 1/15 the time (postgres makes a > hash of the asset_id values and then hash joins against the attribute_map) > but since my asset_id values come from outside the database that approach > would require creating and managing a whole lot of temporary tables, which > i'd rather avoid. > > Much obliged for any suggestions > > Russell > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >