Thread: Slow Queries with OR's?

Slow Queries with OR's?

From
"Mad Hatter"
Date:
I have a large postgresql 7.3 database, a number of the tables have over 1
million records, the largest having over 8 million.

When I run the following query:

SELECT attrib1 FROM table1 WHERE attrib2 IN ( '1', '3', '5', '7' );

It takes several minutes to complete.
Looking more closely, internally this is converted to:
... WHERE attrib2 = '1' OR attrib2 = '3' OR attrib2= '5'...

If I instead run the query:

SELECT attrib1 FROM table1 WHERE attrib2 = '1'
UNION
SELECT attrib1 FROM table1 WHERE attrib2 = '3'
UNION
SELECT attrib1 FROM table1 WHERE attrib2 = '5'
UNION
SELECT attrib1 FROM table1 WHERE attrib2 = '7';

Then it return in seconds.

Why does using IN (or OR) give such a performance hit?
Is there any way I can resolve this, it seem odd to be forced to UNION
everything, and may not always be practicle to do so. It certianly makes the
query text much longer!

thanks!

M.


_________________________________________________________________
Surf together with new Shared Browsing
http://join.msn.com/?page=features/browse&pgmarket=en-gb&XAPID=74&DI=1059


Re: Slow Queries with OR's?

From
Shridhar Daithankar
Date:
On Tuesday 29 April 2003 14:31, Mad Hatter wrote:
> Why does using IN (or OR) give such a performance hit?
> Is there any way I can resolve this, it seem odd to be forced to UNION
> everything, and may not always be practicle to do so. It certianly makes
> the query text much longer!

Try exists instead on in. It is known to be faster..

HTH

 Shridhar


Re: Slow Queries with OR's?

From
Andrew Sullivan
Date:
On Tue, Apr 29, 2003 at 10:01:04AM +0100, Mad Hatter wrote:

> Why does using IN (or OR) give such a performance hit?

Because the implementation was not perfect ;-)

I gather there is a fix for this in CVS tip, so you might want to
give that a whirl to see if it solves your problem.  (But don't use
it for real systems yet!)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


create gist index

From
Jose Manuel Sanchez
Date:
I have installed a Postgresql 7.32 and a Postgis 0.7.3 on a Suse 8.0 and I
have a problem when I try to create a GIST index

CREATE INDEX indx_cname
ON roads_geom
    USING GIST ( geom_colum GIST_GEOMETRY_OPS );

I got the following error:

ERROR:  DefineIndex: operator class "gist_geometry_ops" not supported by
access method "gist"

Do you have any idea??? I musy update the PostgreSQL with any patch?¿?¿?


Thank a lot.

Best Regards.


Re: Slow Queries with OR's?

From
Tom Lane
Date:
"Mad Hatter" <mad_hatter_@hotmail.com> writes:
> When I run the following query:
> SELECT attrib1 FROM table1 WHERE attrib2 IN ( '1', '3', '5', '7' );

Could we see the EXPLAIN ANALYZE output for that, as well as one of
these?

> SELECT attrib1 FROM table1 WHERE attrib2 = '1'

ISTM you should be getting roughly equivalent plans, that is an
indexscan for each of the constants.

(The nearby comments about IN being slow apply to "foo IN (SELECT ...)"
which this is not.)

            regards, tom lane


Re: Slow Queries with OR's?

From
Mr Mat psql-mail
Date:
Apologies for loosing the thread.

>"Mad Hatter" <mad_hatter_@hotmail.com> writes:
>> When I run the following query:
>> SELECT attrib1 FROM table1 WHERE attrib2 IN ( '1', '3', '5', '7' );
>
>Could we see the EXPLAIN ANALYZE output for that, as well as one of
>these?
>
> SELECT attrib1 FROM table1 WHERE attrib2 = '1'
>
>ISTM you should be getting roughly equivalent plans, that is an
>indexscan for each of the constants.
>
>(The nearby comments about IN being slow apply to "foo IN (SELECT ...)"

>which this is not.)
>
>                       regards, tom lane




Tom, I acually tried to overly simplify things, I was doing IN(SELECT).


Here's the output you requested.

EXPLAIN ANALYZE
SELECT msg_id, col_date, from_line, message FROM news_messages
WHERE
msg_id in (
SELECT msg_id FROM news_messages WHERE EXISTS (
SELECT msg_id FROM news_msg_id_ng_id_link WHERE EXISTS (
SELECT ng_id FROM news_ng_genre_link WHERE genre_id in( 1, 5 )
)
)
)
AND col_date >= '2003-01-01'
AND col_date <= '2003-02-01'
AND news_messages.subject ILIKE '%CISCO%'
AND news_messages.message ILIKE '%router%' ORDER BY col_date DESC;
 QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
---------
Index Scan Backward using news_messages_c_date_index on news_messages (
cost=0.00..171844391116.25 rows=1 width=827) (actual time=213790.91..
425350.45 rows=20 loops=1)
Index Cond: ((col_date >= '2003-01-01 00:00:00'::timestamp without
timezone) AND (col_date <= '2003-02-01 00:00:00'::timestamp without
time zone))
Filter: ((subject ~~* '%CISCO%'::text) AND (message ~~* '%router%'::
text)AND (subplan))
SubPlan
->  Materialize  (cost=892397.63..892397.63 rows=3924463 width=43)(
actual time=9851.52..14640.18 rows=3622109 loops=20)
->  Result  (cost=0.00..892397.63 rows=3924463 width=43) (actualtime=21.
60..186933.79 rows=3924463 loops=1)
One-Time Filter: $1
InitPlan
->  Result  (cost=0.00..165027.19 rows=8054019 width=44)(actual time=13.
17..13.17 rows=1 loops=1)
One-Time Filter: $0
InitPlan
->  Seq Scan on news_ng_genre_link(cost=0.00..18.84 rows=493 width=8) (
actual time=6.58..6.58 rows=1 loops=1)
Filter: ((genre_id = 1) OR (genre_id = 5))
->  Seq Scan on news_msg_id_ng_id_link(cost=0.00..165027.19 rows=
8054019 width=44) (actual time=6.56..6.56 rows=1 loops=1)
->  Seq Scan on news_messages  (cost=0.00..892397.63 rows=3924463 width=
43) (actual time=8.42..178589.46 rows=3924463 loops=1)
Total runtime: 425470.07 msec
(16 rows)

EXPLAIN ANALYZE SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (
1, 5 );
QUERY PLAN
------------------------------------------------------------------------
----------------------------------------
Seq Scan on news_ng_genre_link  (cost=0.00..18.84 rows=493 width=8) (
actual
time=0.09..12.94 rows=498 loops=1)
Filter: ((genre_id = 1) OR (genre_id = 5))
Total runtime: 13.36 msec
(3 rows)


Re: Slow Queries with OR's?

From
Tom Lane
Date:
Mr Mat psql-mail <psql-mail@freeuk.com> writes:
> SELECT msg_id, col_date, from_line, message FROM news_messages
> WHERE
> msg_id in (
> SELECT msg_id FROM news_messages WHERE EXISTS (
> SELECT msg_id FROM news_msg_id_ng_id_link WHERE EXISTS (
> SELECT ng_id FROM news_ng_genre_link WHERE genre_id in( 1, 5 )
> )
> )
> )
> AND col_date >= '2003-01-01'
> AND col_date <= '2003-02-01'
> AND news_messages.subject ILIKE '%CISCO%'
> AND news_messages.message ILIKE '%router%' ORDER BY col_date DESC;

I'm not sure what you intend that "msg_id in (...)" to accomplish,
but you could leave it out and get exactly the same results, so I
suspect it is not doing what you intended.  The EXISTS() clause is
independent of the containing SELECT, therefore will be true (or not,
but according to the explain analyze output it was true) at every
row, therefore the result of the SELECT just below IN consists of
every msg_id in news_messages, and so the IN succeeds --- slowly
and painfully --- at every row.

            regards, tom lane


Re: Slow Queries with OR's?

From
Mr Mat psql-mail
Date:
Tom - Thanks for your help - it's invaluable!

I have now gone back to my original query shown below:
I set EXPLAIN ANALYZE running on it over 12 hours ago and its still
going! (Yesterdays EXPLAIN ANALYZE took tens of minutes)
So I have included just the ouput from EXPLAIN.


SELECT msg_id, from_line, col_date FROM news_messages WHERE msg_id IN(
SELECT msg_id FROM news_msg_id_ng_id_link WHERE ng_id IN(
SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (3,6)
)
)

QUERY PLAN
------------------------------------------------------------------------
--------------------------
Seq Scan on news_messages  (cost=0.00..298471193171502.50 rows=1962232
width=90)
Filter: (subplan)
SubPlan
->  Materialize  (cost=76054021.22..76054021.22 rows=4027010 width=44)
->  Seq Scan on news_msg_id_ng_id_link  (cost=0.00..76054021.22 rows=
4027010 width=44)
Filter: (subplan)
SubPlan
->  Seq Scan on news_ng_genre_link  (cost=0.00..18.84 rows=170 width=8)

Filter: ((genre_id = 3) OR (genre_id = 6))

Any suggestions?
Regards,
The ever thankful.


Re: Slow Queries with OR's?

From
Tom Lane
Date:
Mr Mat psql-mail <psql-mail@freeuk.com> writes:
> I have now gone back to my original query shown below:

> SELECT msg_id, from_line, col_date FROM news_messages WHERE msg_id IN(
> SELECT msg_id FROM news_msg_id_ng_id_link WHERE ng_id IN(
> SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (3,6)
> )
> )

I think all you can do with this is transform the outer IN into EXISTS
and make sure there's an index on news_msg_id_ng_id_link.msg_id.  See
the FAQ item about slow IN's for details.

7.4 will do this a lot better, but for now you need a workaround...

            regards, tom lane


Re: Slow Queries with OR's?

From
Martijn van Oosterhout
Date:
On Fri, May 02, 2003 at 09:54:19AM +0100, Mr Mat psql-mail wrote:
> Tom - Thanks for your help - it's invaluable!
>
> I have now gone back to my original query shown below:
> I set EXPLAIN ANALYZE running on it over 12 hours ago and its still
> going! (Yesterdays EXPLAIN ANALYZE took tens of minutes)
> So I have included just the ouput from EXPLAIN.
>
>
> SELECT msg_id, from_line, col_date FROM news_messages WHERE msg_id IN(
> SELECT msg_id FROM news_msg_id_ng_id_link WHERE ng_id IN(
> SELECT ng_id FROM news_ng_genre_link WHERE genre_id in (3,6)
> )
> )

Question: How is that query different from:

SELECT a.msg_id, a.from_line, a.col_date
FROM news_messages a, news_msg_id_ng_id_link b, news_ng_genre_link c
WHERE c.genre_id in (3,6)
AND b.ng_id = c.ng_id
AND a.msg_id = b.msg_id;

(Maybe I'm missing something with NULLs ?)
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment