Thread: Slow Queries with OR's?
I have a large postgresql 7.3 database, a number of the tables have over 1 million records, the largest having over 8 million. When I run the following query: SELECT attrib1 FROM table1 WHERE attrib2 IN ( '1', '3', '5', '7' ); It takes several minutes to complete. Looking more closely, internally this is converted to: ... WHERE attrib2 = '1' OR attrib2 = '3' OR attrib2= '5'... If I instead run the query: SELECT attrib1 FROM table1 WHERE attrib2 = '1' UNION SELECT attrib1 FROM table1 WHERE attrib2 = '3' UNION SELECT attrib1 FROM table1 WHERE attrib2 = '5' UNION SELECT attrib1 FROM table1 WHERE attrib2 = '7'; Then it return in seconds. Why does using IN (or OR) give such a performance hit? Is there any way I can resolve this, it seem odd to be forced to UNION everything, and may not always be practicle to do so. It certianly makes the query text much longer! thanks! M. _________________________________________________________________ Surf together with new Shared Browsing http://join.msn.com/?page=features/browse&pgmarket=en-gb&XAPID=74&DI=1059
On Tuesday 29 April 2003 14:31, Mad Hatter wrote: > Why does using IN (or OR) give such a performance hit? > Is there any way I can resolve this, it seem odd to be forced to UNION > everything, and may not always be practicle to do so. It certianly makes > the query text much longer! Try exists instead on in. It is known to be faster.. HTH Shridhar
On Tue, Apr 29, 2003 at 10:01:04AM +0100, Mad Hatter wrote: > Why does using IN (or OR) give such a performance hit? Because the implementation was not perfect ;-) I gather there is a fix for this in CVS tip, so you might want to give that a whirl to see if it solves your problem. (But don't use it for real systems yet!) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
I have installed a Postgresql 7.32 and a Postgis 0.7.3 on a Suse 8.0 and I have a problem when I try to create a GIST index CREATE INDEX indx_cname ON roads_geom USING GIST ( geom_colum GIST_GEOMETRY_OPS ); I got the following error: ERROR: DefineIndex: operator class "gist_geometry_ops" not supported by access method "gist" Do you have any idea??? I musy update the PostgreSQL with any patch?¿?¿? Thank a lot. Best Regards.
"Mad Hatter" <mad_hatter_@hotmail.com> writes: > When I run the following query: > SELECT attrib1 FROM table1 WHERE attrib2 IN ( '1', '3', '5', '7' ); Could we see the EXPLAIN ANALYZE output for that, as well as one of these? > SELECT attrib1 FROM table1 WHERE attrib2 = '1' ISTM you should be getting roughly equivalent plans, that is an indexscan for each of the constants. (The nearby comments about IN being slow apply to "foo IN (SELECT ...)" which this is not.) regards, tom lane
Apologies for loosing the thread. >"Mad Hatter" <mad_hatter_@hotmail.com> writes: >> When I run the following query: >> SELECT attrib1 FROM table1 WHERE attrib2 IN ( '1', '3', '5', '7' ); > >Could we see the EXPLAIN ANALYZE output for that, as well as one of >these? > > SELECT attrib1 FROM table1 WHERE attrib2 = '1' > >ISTM you should be getting roughly equivalent plans, that is an >indexscan for each of the constants. > >(The nearby comments about IN being slow apply to "foo IN (SELECT ...)" >which this is not.) > > regards, tom lane Tom, I acually tried to overly simplify things, I was doing IN(SELECT). Here's the output you requested. EXPLAIN ANALYZE SELECT msg_id, col_date, from_line, message FROM news_messages WHERE msg_id in ( SELECT msg_id FROM news_messages WHERE EXISTS ( SELECT msg_id FROM news_msg_id_ng_id_link WHERE EXISTS ( SELECT ng_id FROM news_ng_genre_link WHERE genre_id in( 1, 5 ) ) ) ) AND col_date >= '2003-01-01' AND col_date <= '2003-02-01' AND news_messages.subject ILIKE '%CISCO%' AND news_messages.message ILIKE '%router%' ORDER BY col_date DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ --------- Index Scan Backward using news_messages_c_date_index on news_messages ( cost=0.00..171844391116.25 rows=1 width=827) (actual time=213790.91.. 425350.45 rows=20 loops=1) Index Cond: ((col_date >= '2003-01-01 00:00:00'::timestamp without timezone) AND (col_date <= '2003-02-01 00:00:00'::timestamp without time zone)) Filter: ((subject ~~* '%CISCO%'::text) AND (message ~~* '%router%':: text)AND (subplan)) SubPlan -> Materialize (cost=892397.63..892397.63 rows=3924463 width=43)( actual time=9851.52..14640.18 rows=3622109 loops=20) -> Result (cost=0.00..892397.63 rows=3924463 width=43) (actualtime=21. 60..186933.79 rows=3924463 loops=1) One-Time Filter: $1 InitPlan -> Result (cost=0.00..165027.19 rows=8054019 width=44)(actual time=13. 17..13.17 rows=1 loops=1) One-Time Filter: $0 InitPlan -> Seq Scan on news_ng_genre_link(cost=0.00..18.84 rows=493 width=8) ( actual time=6.58..6.58 rows=1 loops=1) Filter: ((genre_id = 1) OR (genre_id = 5)) -> Seq Scan on news_msg_id_ng_id_link(cost=0.00..165027.19 rows= 8054019 width=44) (actual time=6.56..6.56 rows=1 loops=1) -> Seq Scan on news_messages (cost=0.00..892397.63 rows=3924463 width= 43) (actual time=8.42..178589.46 rows=3924463 loops=1) Total runtime: 425470.07 msec (16 rows) EXPLAIN ANALYZE SELECT ng_id FROM news_ng_genre_link WHERE genre_id in ( 1, 5 ); QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------- Seq Scan on news_ng_genre_link (cost=0.00..18.84 rows=493 width=8) ( actual time=0.09..12.94 rows=498 loops=1) Filter: ((genre_id = 1) OR (genre_id = 5)) Total runtime: 13.36 msec (3 rows)
Mr Mat psql-mail <psql-mail@freeuk.com> writes: > SELECT msg_id, col_date, from_line, message FROM news_messages > WHERE > msg_id in ( > SELECT msg_id FROM news_messages WHERE EXISTS ( > SELECT msg_id FROM news_msg_id_ng_id_link WHERE EXISTS ( > SELECT ng_id FROM news_ng_genre_link WHERE genre_id in( 1, 5 ) > ) > ) > ) > AND col_date >= '2003-01-01' > AND col_date <= '2003-02-01' > AND news_messages.subject ILIKE '%CISCO%' > AND news_messages.message ILIKE '%router%' ORDER BY col_date DESC; I'm not sure what you intend that "msg_id in (...)" to accomplish, but you could leave it out and get exactly the same results, so I suspect it is not doing what you intended. The EXISTS() clause is independent of the containing SELECT, therefore will be true (or not, but according to the explain analyze output it was true) at every row, therefore the result of the SELECT just below IN consists of every msg_id in news_messages, and so the IN succeeds --- slowly and painfully --- at every row. regards, tom lane
Tom - Thanks for your help - it's invaluable! I have now gone back to my original query shown below: I set EXPLAIN ANALYZE running on it over 12 hours ago and its still going! (Yesterdays EXPLAIN ANALYZE took tens of minutes) So I have included just the ouput from EXPLAIN. SELECT msg_id, from_line, col_date FROM news_messages WHERE msg_id IN( SELECT msg_id FROM news_msg_id_ng_id_link WHERE ng_id IN( SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (3,6) ) ) QUERY PLAN ------------------------------------------------------------------------ -------------------------- Seq Scan on news_messages (cost=0.00..298471193171502.50 rows=1962232 width=90) Filter: (subplan) SubPlan -> Materialize (cost=76054021.22..76054021.22 rows=4027010 width=44) -> Seq Scan on news_msg_id_ng_id_link (cost=0.00..76054021.22 rows= 4027010 width=44) Filter: (subplan) SubPlan -> Seq Scan on news_ng_genre_link (cost=0.00..18.84 rows=170 width=8) Filter: ((genre_id = 3) OR (genre_id = 6)) Any suggestions? Regards, The ever thankful.
Mr Mat psql-mail <psql-mail@freeuk.com> writes: > I have now gone back to my original query shown below: > SELECT msg_id, from_line, col_date FROM news_messages WHERE msg_id IN( > SELECT msg_id FROM news_msg_id_ng_id_link WHERE ng_id IN( > SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (3,6) > ) > ) I think all you can do with this is transform the outer IN into EXISTS and make sure there's an index on news_msg_id_ng_id_link.msg_id. See the FAQ item about slow IN's for details. 7.4 will do this a lot better, but for now you need a workaround... regards, tom lane
On Fri, May 02, 2003 at 09:54:19AM +0100, Mr Mat psql-mail wrote: > Tom - Thanks for your help - it's invaluable! > > I have now gone back to my original query shown below: > I set EXPLAIN ANALYZE running on it over 12 hours ago and its still > going! (Yesterdays EXPLAIN ANALYZE took tens of minutes) > So I have included just the ouput from EXPLAIN. > > > SELECT msg_id, from_line, col_date FROM news_messages WHERE msg_id IN( > SELECT msg_id FROM news_msg_id_ng_id_link WHERE ng_id IN( > SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (3,6) > ) > ) Question: How is that query different from: SELECT a.msg_id, a.from_line, a.col_date FROM news_messages a, news_msg_id_ng_id_link b, news_ng_genre_link c WHERE c.genre_id in (3,6) AND b.ng_id = c.ng_id AND a.msg_id = b.msg_id; (Maybe I'm missing something with NULLs ?) -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington